postgresql

mirror of https://git.postgresql.org/git/postgresql.git synced 2024-10-04 23:47:01 +02:00

Author	SHA1	Message	Date
Heikki Linnakangas	7931622d1d	Fix name of argument to pg_stat_file. It's called "missing_ok" in the docs and in the C code. I refrained from doing a catversion bump for this, because the name of an input argument is just documentation, it has no effect on any callers. Michael Paquier	2015-07-02 12:15:13 +03:00
Fujii Masao	fb174687f7	Make use of xlog_internal.h's macros in WAL-related utilities. Commit `179cdd09` added macros to check if a filename is a WAL segment or other such file. However there were still some instances of the strlen + strspn combination to check for that in WAL-related utilities like pg_archivecleanup. Those checks can be replaced with the macros. This patch makes use of the macros in those utilities and which would make the code a bit easier to read. Back-patch to 9.5. Michael Paquier	2015-07-02 10:35:38 +09:00
Tom Lane	cf8d65de10	Stamp HEAD as 9.6devel. Let the hacking begin ...	2015-06-30 14:01:15 -04:00
Heikki Linnakangas	302ac7f271	Add assertion to check the special size is sane before dereferencing it. This seems useful to catch errors of the sort I just fixed, where PageGetSpecialPointer is called before initializing the page.	2015-06-30 13:44:04 +03:00
Tom Lane	f78329d594	Stamp 9.5alpha1.	2015-06-29 15:42:18 -04:00
Tom Lane	cbc8d65639	Code + docs review for escaping of option values (commit `11a020eb6`). Avoid memory leak from incorrect choice of how to free a StringInfo (resetStringInfo doesn't do it). Now that pg_split_opts doesn't scribble on the optstr, mark that as "const" for clarity. Attach the commentary in protocol.sgml to the right place, and add documentation about the user-visible effects of this change on postgres' -o option and libpq's PGOPTIONS option.	2015-06-29 12:42:52 -04:00
Andres Freund	07cb8b02ab	Replace ia64 S_UNLOCK compiler barrier with a full memory barrier. _Asm_sched_fence() is just a compiler barrier, not a memory barrier. But spinlock release on IA64 needs, at the very least, release semantics. Use a full barrier instead. This might be the cause for the occasional failures on buildfarm member anole. Discussion: 20150629101108.GB17640@alap3.anarazel.de	2015-06-29 14:53:32 +02:00
Tom Lane	62d16c7fc5	Improve design and implementation of pg_file_settings view. As first committed, this view reported on the file contents as they were at the last SIGHUP event. That's not as useful as reporting on the current contents, and what's more, it didn't work right on Windows unless the current session had serviced at least one SIGHUP. Therefore, arrange to re-read the files when pg_show_all_settings() is called. This requires only minor refactoring so that we can pass changeVal = false to set_config_option() so that it won't actually apply any changes locally. In addition, add error reporting so that errors that would prevent the configuration files from being loaded, or would prevent individual settings from being applied, are visible directly in the view. This makes the view usable for pre-testing whether edits made in the config files will have the desired effect, before one actually issues a SIGHUP. I also added an "applied" column so that it's easy to identify entries that are superseded by later entries; this was the main use-case for the original design, but it seemed unnecessarily hard to use for that. Also fix a 9.4.1 regression that allowed multiple entries for a PGC_POSTMASTER variable to cause bogus complaints in the postmaster log. (The issue here was that commit `bf007a27ac` unintentionally reverted `3e3f65973a`, which suppressed any duplicate entries within ParseConfigFp. However, since the original coding of the pg_file_settings view depended on such suppression not happening, we couldn't have fixed this issue now without first doing something with pg_file_settings. Now we suppress duplicates by marking them "ignored" within ProcessConfigFileInternal, which doesn't hide them in the view.) Lesser changes include: Drive the view directly off the ConfigVariable list, instead of making a basically-equivalent second copy of the data. There's no longer any need to hang onto the data permanently, anyway. Convert show_all_file_settings() to do its work in one call and return a tuplestore; this avoids risks associated with assuming that the GUC state will hold still over the course of query execution. (I think there were probably latent bugs here, though you might need something like a cursor on the view to expose them.) Arrange to run SIGHUP processing in a short-lived memory context, to forestall process-lifespan memory leaks. (There is one known leak in this code, in ProcessConfigDirectory; it seems minor enough to not be worth back-patching a specific fix for.) Remove mistaken assignment to ConfigFileLineno that caused line counting after an include_dir directive to be completely wrong. Add missed failure check in AlterSystemSetConfigFile(). We don't really expect ParseConfigFp() to fail, but that's not an excuse for not checking.	2015-06-28 18:06:14 -04:00
Heikki Linnakangas	cb2acb1081	Add missing_ok option to the SQL functions for reading files. This makes it possible to use the functions without getting errors, if there is a chance that the file might be removed or renamed concurrently. pg_rewind needs to do just that, although this could be useful for other purposes too. (The changes to pg_rewind to use these functions will come in a separate commit.) The read_binary_file() function isn't very well-suited for extensions.c's purposes anymore, if it ever was. So bite the bullet and make a copy of it in extension.c, tailored for that use case. This seems better than the accidental code reuse, even if it's a some more lines of code. Michael Paquier, with plenty of kibitzing by me.	2015-06-28 21:35:46 +03:00
Kevin Grittner	604e99396d	Add opaque declaration of HTAB to tqual.h. Commit `b89e151054` added the ResolveCminCmaxDuringDecoding declaration to tqual.h, which uses an HTAB parameter, without declaring HTAB. It accidentally fails to fail to build with current sources because a declaration happens to be included, directly or indirectly, in all source files that currently use tqual.h before tqual.h is first included, but we shouldn't count on that. Since an opaque declaration is enough here, just use that, as was done in snapmgr.h. Backpatch to 9.4, where the HTAB reference was added to tqual.h.	2015-06-27 09:55:06 -05:00
Alvaro Herrera	7d60b2af34	Fix DDL command collection for TRANSFORM Commit `b488c580ae`, which added the DDL command collection feature, neglected to update the code that commit `cac7658205` had previously added two weeks earlier for the TRANSFORM feature. Reported by Michael Paquier.	2015-06-26 18:17:54 -03:00
Robert Haas	8f15f74a44	Be more conservative about removing tablespace "symlinks". Don't apply rmtree(), which will gleefully remove an entire subtree, and don't even apply unlink() unless it's symlink or a directory, the only things that we expect to find. Amit Kapila, with minor tweaks by me, per extensive discussions involving Andrew Dunstan, Fujii Masao, and Heikki Linnakangas, at least some of whom also reviewed the code.	2015-06-26 15:53:13 -04:00
Robert Haas	5ca611841b	Improve handling of CustomPath/CustomPlan(State) children. Allow CustomPath to have a list of paths, CustomPlan a list of plans, and CustomPlanState a list of planstates known to the core system, so that custom path/plan providers can more reasonably use this infrastructure for nodes with multiple children. KaiGai Kohei, per a design suggestion from Tom Lane, with some further kibitzing by me.	2015-06-26 09:40:47 -04:00
Tom Lane	5d1ff6bd55	Fix the logic for putting relations into the relcache init file. Commit `f3b5565dd4` was a couple of bricks shy of a load; specifically, it missed putting pg_trigger_tgrelid_tgname_index into the relcache init file, because that index is not used by any syscache. However, we have historically nailed that index into cache for performance reasons. The upshot was that load_relcache_init_file always decided that the init file was busted and silently ignored it, resulting in a significant hit to backend startup speed. To fix, reinstantiate RelationIdIsInInitFile() as a wrapper around RelationSupportsSysCache(), which can know about additional relations that should be in the init file despite being unknown to syscache.c. Also install some guards against future mistakes of this type: make write_relcache_init_file Assert that all nailed relations get written to the init file, and make load_relcache_init_file emit a WARNING if it takes the "wrong number of nailed relations" exit path. Now that we remove the init files during postmaster startup, that case should never occur in the field, even if we are starting a minor-version update that added or removed rels from the nailed set. So the warning shouldn't ever be seen by end users, but it will show up in the regression tests if somebody breaks this logic. Back-patch to all supported branches, like the previous commit.	2015-06-25 14:39:05 -04:00
Andrew Dunstan	41d798a139	Fix comment in fmgr.h to refer to actual function used. FunctionLookup() is long gone if it ever existed, and fmgr_info() is what's now used, so the comments now reflect that.	2015-06-15 23:21:03 -04:00
Fujii Masao	b5fe62038f	Make postmaster restart archiver soon after it dies, even during recovery. After the archiver dies, postmaster tries to start a new one immediately. But previously this could happen only while server was running normally even though archiving was enabled always (i.e., archive_mode was set to always). So the archiver running during recovery could not restart soon after it died. This is an oversight in commit `ffd3774`. This commit changes reaper(), postmaster's signal handler to cleanup after a child process dies, so that it tries to a new archiver even during recovery if necessary. Patch by me. Review by Alvaro Herrera.	2015-06-12 23:11:51 +09:00
Andrew Dunstan	908e234733	Rename jsonb - text[] operator to #- to avoid ambiguity. Following recent discussion on -hackers. The underlying function is also renamed to jsonb_delete_path. The regression tests now don't need ugly type casts to avoid the ambiguity, so they are also removed. Catalog version bumped.	2015-06-11 10:06:58 -04:00
Kevin Grittner	870681017a	Fix typo in comment. Backpatch to 9.4 to minimize possible conflicts.	2015-06-10 17:03:56 -05:00
Fujii Masao	ea9c4c1e4a	Fix typo in comment. David Rowley	2015-06-10 15:26:02 +09:00
Tom Lane	f3b5565dd4	Use a safer method for determining whether relcache init file is stale. When we invalidate the relcache entry for a system catalog or index, we must also delete the relcache "init file" if the init file contains a copy of that rel's entry. The old way of doing this relied on a specially maintained list of the OIDs of relations present in the init file: we made the list either when reading the file in, or when writing the file out. The problem is that when writing the file out, we included only rels present in our local relcache, which might have already suffered some deletions due to relcache inval events. In such cases we correctly decided not to overwrite the real init file with incomplete data --- but we still used the incomplete initFileRelationIds list for the rest of the current session. This could result in wrong decisions about whether the session's own actions require deletion of the init file, potentially allowing an init file created by some other concurrent session to be left around even though it's been made stale. Since we don't support changing the schema of a system catalog at runtime, the only likely scenario in which this would cause a problem in the field involves a "vacuum full" on a catalog concurrently with other activity, and even then it's far from easy to provoke. Remarkably, this has been broken since 2002 (in commit `7863404417`), but we had never seen a reproducible test case until recently. If it did happen in the field, the symptoms would probably involve unexpected "cache lookup failed" errors to begin with, then "could not open file" failures after the next checkpoint, as all accesses to the affected catalog stopped working. Recovery would require manually removing the stale "pg_internal.init" file. To fix, get rid of the initFileRelationIds list, and instead consult syscache.c's list of relations used in catalog caches to decide whether a relation is included in the init file. This should be a tad more efficient anyway, since we're replacing linear search of a list with ~100 entries with a binary search. It's a bit ugly that the init file contents are now so directly tied to the catalog caches, but in practice that won't make much difference. Back-patch to all supported branches.	2015-06-07 15:32:09 -04:00
Tom Lane	3f59be836c	Fix planner's cost estimation for SEMI/ANTI joins with inner indexscans. When the inner side of a nestloop SEMI or ANTI join is an indexscan that uses all the join clauses as indexquals, it can be presumed that both matched and unmatched outer rows will be processed very quickly: for matched rows, we'll stop after fetching one row from the indexscan, while for unmatched rows we'll have an indexscan that finds no matching index entries, which should also be quick. The planner already knew about this, but it was nonetheless charging for at least one full run of the inner indexscan, as a consequence of concerns about the behavior of materialized inner scans --- but those concerns don't apply in the fast case. If the inner side has low cardinality (many matching rows) this could make an indexscan plan look far more expensive than it actually is. To fix, rearrange the work in initial_cost_nestloop/final_cost_nestloop so that we don't add the inner scan cost until we've inspected the indexquals, and then we can add either the full-run cost or just the first tuple's cost as appropriate. Experimentation with this fix uncovered another problem: add_path and friends were coded to disregard cheap startup cost when considering parameterized paths. That's usually okay (and desirable, because it thins the path herd faster); but in this fast case for SEMI/ANTI joins, it could result in throwing away the desired plain indexscan path in favor of a bitmap scan path before we ever get to the join costing logic. In the many-matching-rows cases of interest here, a bitmap scan will do a lot more work than required, so this is a problem. To fix, add a per-relation flag consider_param_startup that works like the existing consider_startup flag, but applies to parameterized paths, and set it for relations that are the inside of a SEMI or ANTI join. To make this patch reasonably safe to back-patch, care has been taken to avoid changing the planner's behavior except in the very narrow case of SEMI/ANTI joins with inner indexscans. There are places in compare_path_costs_fuzzily and add_path_precheck that are not terribly consistent with the new approach, but changing them will affect planner decisions at the margins in other cases, so we'll leave that for a HEAD-only fix. Back-patch to 9.3; before that, the consider_startup flag didn't exist, meaning that the second aspect of the patch would be too invasive. Per a complaint from Peter Holzer and analysis by Tomas Vondra.	2015-06-03 11:59:10 -04:00
Andrew Dunstan	37def42245	Rename jsonb_replace to jsonb_set and allow it to add new values The function is given a fourth parameter, which defaults to true. When this parameter is true, if the last element of the path is missing in the original json, jsonb_set creates it in the result and assigns it the new value. If it is false then the function does nothing unless all elements of the path are present, including the last. Based on some original code from Dmitry Dolgov, heavily modified by me. Catalog version bumped.	2015-05-31 20:34:10 -04:00
Tom Lane	1c8c656b3c	Check that all aliases of a built-in function have same leakproof property. opr_sanity.sql has a test checking that relevant properties of built-in functions match when the same C function is referenced by multiple pg_proc entries. The test neglected to check proleakproof, though, and when I added that condition it exposed that xideqint4 hadn't been updated to match xideq. So fix that as well, and in consequence bump catversion. This isn't very critical, so no need to worry about fixing back branches.	2015-05-29 13:26:21 -04:00
Tom Lane	da33a3894e	Revert exporting of internal GUC variable "data_directory". This undoes a poorly-thought-out choice in commit `970a18687f`, namely to export guc.c's internal variable data_directory. The authoritative variable so far as C code is concerned is DataDir; there is no reason for anything except specific bits of GUC code to look at the GUC variable. After yesterday's commits fixing the fsync-on-restart patch, the only remaining misuse of data_directory was in AlterSystemSetConfigFile(), which would be much better off just using a relative path anyhow: it's less code and it doesn't break if the DBA moves the data directory of a running system, which is a case we've taken some pains over in the past. This is mostly cosmetic, so no need for a back-patch (and I'd be hesitant to remove a global variable in stable branches anyway).	2015-05-29 11:57:33 -04:00
Tom Lane	d8179b001a	Fix fsync-at-startup code to not treat errors as fatal. Commit `2ce439f337` introduced a rather serious regression, namely that if its scan of the data directory came across any un-fsync-able files, it would fail and thereby prevent database startup. Worse yet, symlinks to such files also caused the problem, which meant that crash restart was guaranteed to fail on certain common installations such as older Debian. After discussion, we agreed that (1) failure to start is worse than any consequence of not fsync'ing is likely to be, therefore treat all errors in this code as nonfatal; (2) we should not chase symlinks other than those that are expected to exist, namely pg_xlog/ and tablespace links under pg_tblspc/. The latter restriction avoids possibly fsync'ing a much larger part of the filesystem than intended, if the user has left random symlinks hanging about in the data directory. This commit takes care of that and also does some code beautification, mainly moving the relevant code into fd.c, which seems a much better place for it than xlog.c, and making sure that the conditional compilation for the pre_sync_fname pass has something to do with whether pg_flush_data works. I also relocated the call site in xlog.c down a few lines; it seems a bit silly to be doing this before ValidateXLOGDirectoryStructure(). The similar logic in initdb.c ought to be made to match this, but that change is noncritical and will be dealt with separately. Back-patch to all active branches, like the prior commit. Abhijit Menon-Sen and Tom Lane	2015-05-28 17:33:03 -04:00
Bruce Momjian	befa3e648c	Revert 9.5 pgindent changes to atomics directory files This is because there are many __asm__ blocks there that pgindent messes up. Also configure pgindent to skip that directory in the future.	2015-05-24 21:45:01 -04:00
Tom Lane	23116d5437	Add a bit more commentary about regex's colormap tree data structure. Per an off-list question from Piotr Stefaniak.	2015-05-24 12:40:38 -04:00
Tom Lane	91e79260f6	Remove no-longer-required function declarations. Remove a bunch of "extern Datum foo(PG_FUNCTION_ARGS);" declarations that are no longer needed now that PG_FUNCTION_INFO_V1(foo) provides that. Some of these were evidently missed in commit `e7128e8dbb`, but others were cargo-culted in in code added since then. Possibly that can be blamed in part on the fact that we'd not fixed relevant documentation examples, which I've now done.	2015-05-24 12:20:23 -04:00
Bruce Momjian	807b9e0dff	pgindent run for 9.5	2015-05-23 21:35:49 -04:00
Tom Lane	821b821a24	Still more fixes for lossy-GiST-distance-functions patch. Fix confusion in documentation, substantial memory leakage if float8 or float4 are pass-by-reference, and assorted comments that were obsoleted by commit `98edd617f3`.	2015-05-23 15:22:25 -04:00
Andres Freund	631d749007	Remove the new UPSERT command tag and use INSERT instead. Previously, INSERT with ON CONFLICT DO UPDATE specified used a new command tag -- UPSERT. It was introduced out of concern that INSERT as a command tag would be a misrepresentation for ON CONFLICT DO UPDATE, as some affected rows may actually have been updated. Alvaro Herrera noticed that the implementation of that new command tag was incomplete; in subsequent discussion we concluded that having it doesn't provide benefits that are in line with the compatibility breaks it requires. Catversion bump due to the removal of PlannedStmt->isUpsert. Author: Peter Geoghegan Discussion: 20150520215816.GI5885@postgresql.org	2015-05-23 00:58:45 +02:00
Andrew Dunstan	5302760a50	Unpack jbvBinary objects passed to pushJsonbValue pushJsonbValue was accepting jbvBinary objects passed as WJB_ELEM or WJB_VALUE data. While this succeeded, when those objects were later encountered in attempting to convert the result to Jsonb, errors occurred. With this change we ghuarantee that a JSonbValue constructed from calls to pushJsonbValue does not contain any jbvBinary objects. This cures a problem observed with jsonb_delete. This means callers of pushJsonbValue no longer need to perform this unpacking themselves. A subsequent patch will perform some cleanup in that area. The error was not triggered by any 9.4 code, but this is a publicly visible routine, and so the error could be exercised by third party code, therefore backpatch to 9.4. Bug report from Peter Geoghegan, fix by me.	2015-05-22 10:21:41 -04:00
Heikki Linnakangas	7cbee7c0a1	At promotion, don't leave behind a partial segment on the old timeline. With commit `de768844`, a copy of the partial segment was archived with the .partial suffix, but the original file was still left in pg_xlog, so it didn't actually solve the problems with archiving the partial segment that it was supposed to solve. With this patch, the partial segment is renamed rather than copied, so we only archive it with the .partial suffix. Also be more robust in detecting if the last segment is already being archived. Previously I used XLogArchiveIsBusy() for that, but that's not quite right. With archive_mode='always', there might be a .ready file for it, and we don't want to rename it to .partial in that case. The old segment is needed until we're fully committed to the new timeline, i.e. until we've written the end-of-recovery WAL record and updated the min recovery point and timeline in the control file. So move the renaming later in the startup sequence, after all that's been done.	2015-05-22 11:04:33 +03:00
Tom Lane	c5dd8ead40	More fixes for lossy-GiST-distance-functions patch. Paul Ramsey reported that commit `35fcb1b3d0` induced a core dump on commuted ORDER BY expressions, because it was assuming that the indexorderby expression could be found verbatim in the relevant equivalence class, but it wasn't there. We really don't need anything that complicated anyway; for the data types likely to be used for index ORDER BY operators in the foreseeable future, the exprType() of the ORDER BY expression will serve fine. (The case where we'd have to work harder is where the ORDER BY expression's result is only binary-compatible with the declared input type of the ordering operator; long before worrying about that, one would need to get rid of GiST's hard-wired assumption that said datatype is float8.) Aside from fixing that crash and adding a regression test for the case, I did some desultory code review: nodeIndexscan.c was likewise overthinking how hard it ought to work to identify the datatype of the ORDER BY expressions. Add comments explaining how come nodeIndexscan.c can get away with simplifying assumptions about NULLS LAST ordering and no backward scan. Revert no-longer-needed changes of find_ec_member_for_tle(); while the new definition was no worse than the old, it wasn't better either, and it might cause back-patching pain. Revert entirely bogus additions to genam.h.	2015-05-21 19:47:48 -04:00
Tom Lane	d4b538ea36	Improve packing/alignment annotation for ItemPointerData. We want this struct to be exactly a series of 3 int16 words, no more and no less. Historically, at least, some ARM compilers preferred to pad it to 8 bytes unless coerced. Our old way of doing that was just to use __attribute__((packed)), but as pointed out by Piotr Stefaniak, that does too much: it also licenses the compiler to give the struct only byte-alignment. We don't want that because it adds access overhead, possibly quite significant overhead. According to the GCC manual, what we want requires also specifying __attribute__((align(2))). It's not entirely clear if all the relevant compilers accept this pragma as well, but we can hope the buildfarm will tell us if not. We can also add a static assertion that should fire if the compiler padded the struct. Since the combination of these pragmas should define exactly what we want on any compiler that accepts them, let's try using them wherever we think they exist, not only for __arm__. (This is likely to expose that the conditional definitions in c.h are inadequate, but finding that out would be a good thing.) The immediate motivation for this is that the current definition of ExecRowMark allows its curCtid field to be misaligned. It is not clear whether there are any other uses of ItemPointerData with a similar hazard. We could change the definition of ExecRowMark if this doesn't work, but it would be far better to have a future-proof fix. Piotr Stefaniak, some further hacking by me	2015-05-21 17:21:46 -04:00
Heikki Linnakangas	fa60fb63e5	Fix more typos in comments. Patch by CharSyam, plus a few more I spotted with grep.	2015-05-20 19:45:43 +03:00
Heikki Linnakangas	4fc72cc7bb	Collection of typo fixes. Use "a" and "an" correctly, mostly in comments. Two error messages were also fixed (they were just elogs, so no translation work required). Two function comments in pg_proc.h were also fixed. Etsuro Fujita reported one of these, but I found a lot more with grep. Also fix a few other typos spotted while grepping for the a/an typos. For example, "consists out of ..." -> "consists of ...". Plus a "though"/ "through" mixup reported by Euler Taveira. Many of these typos were in old code, which would be nice to backpatch to make future backpatching easier. But much of the code was new, and I didn't feel like crafting separate patches for each branch. So no backpatching.	2015-05-20 16:56:22 +03:00
Tom Lane	0c071936e9	Revert error-throwing wrappers for the printf family of functions. This reverts commit `16304a0134`, except for its changes in src/port/snprintf.c; as well as commit `cac18a76bb` which is no longer needed. Fujii Masao reported that the previous commit caused failures in psql on OS X, since if one exits the pager program early while viewing a query result, psql sees an EPIPE error from fprintf --- and the wrapper function thought that was reason to panic. (It's a bit surprising that the same does not happen on Linux.) Further discussion among the security list concluded that the risk of other such failures was far too great, and that the one-size-fits-all approach to error handling embodied in the previous patch is unlikely to be workable. This leaves us again exposed to the possibility of the type of failure envisioned in CVE-2015-3166. However, that failure mode is strictly hypothetical at this point: there is no concrete reason to believe that an attacker could trigger information disclosure through the supposed mechanism. In the first place, the attack surface is fairly limited, since so much of what the backend does with format strings goes through stringinfo.c or psprintf(), and those already had adequate defenses. In the second place, even granting that an unprivileged attacker could control the occurrence of ENOMEM with some precision, it's a stretch to believe that he could induce it just where the target buffer contains some valuable information. So we concluded that the risk of non-hypothetical problems induced by the patch greatly outweighs the security risks. We will therefore revert, and instead undertake closer analysis to identify specific calls that may need hardening, rather than attempt a universal solution. We have kept the portion of the previous patch that improved snprintf.c's handling of errors when it calls the platform's sprintf(). That seems to be an unalloyed improvement. Security: CVE-2015-3166	2015-05-19 18:19:38 -04:00
Andres Freund	0740cbd759	Refactor ON CONFLICT index inference parse tree representation. Defer lookup of opfamily and input type of a of a user specified opclass until the optimizer selects among available unique indexes; and store the opclass in the parse analyzed tree instead. The primary reason for doing this is that for rule deparsing it's easier to use the opclass than the previous representation. While at it also rename a variable in the inference code to better fit it's purpose. This is separate from the actual fixes for deparsing to make review easier.	2015-05-19 21:21:27 +02:00
Tom Lane	0b28ea79c0	Avoid collation dependence in indexes of system catalogs. No index in template0 should have collation-dependent ordering, especially not indexes on shared catalogs. For most textual columns we avoid this issue by using type "name" (which sorts per strcmp()). However there are a few indexed columns that we'd prefer to use "text" for, and for that, the default opclass text_ops is unsafe. Fortunately, text_pattern_ops is safe (it sorts per memcmp()), and it has no real functional disadvantage for our purposes. So change the indexes on pg_seclabel.provider and pg_shseclabel.provider to use text_pattern_ops. In passing, also mark pg_replication_origin.roname as using text_pattern_ops --- for some reason it was labeled varchar_pattern_ops which is just wrong, even though it accidentally worked. Add regression test queries to catch future errors of these kinds. We still can't do anything about the misdeclared pg_seclabel and pg_shseclabel indexes in back branches :-(	2015-05-19 11:47:42 -04:00
Tom Lane	afee04352b	Revert "Change pg_seclabel.provider and pg_shseclabel.provider to type "name"." This reverts commit `b82a7be603`. There is a better (less invasive) way to fix it, which I will commit next.	2015-05-19 10:40:04 -04:00
Tom Lane	b82a7be603	Change pg_seclabel.provider and pg_shseclabel.provider to type "name". These were "text", but that's a bad idea because it has collation-dependent ordering. No index in template0 should have collation-dependent ordering, especially not indexes on shared catalogs. There was general agreement that provider names don't need to be longer than other identifiers, so we can fix this at a small waste of table space by changing from text to name. There's no way to fix the problem in the back branches, but we can hope that security labels don't yet have widespread-enough usage to make it urgent to fix. There needs to be a regression sanity test to prevent us from making this same mistake again; but before putting that in, we'll need to get rid of similar brain fade in the recently-added pg_replication_origin catalog. Note: for lack of a suitable testing environment, I've not really exercised this change. I trust the buildfarm will show up any mistakes.	2015-05-18 20:07:53 -04:00
Tom Lane	4db485e75b	Put back a backwards-compatible version of sampling support functions. Commit `83e176ec18` removed the longstanding support functions for block sampling without any consideration of the impact this would have on third-party FDWs. The new API is not notably more functional for FDWs than the old, so forcing them to change doesn't seem like a good thing. We can provide the old API as a wrapper (more or less) around the new one for a minimal amount of extra code.	2015-05-18 18:34:37 -04:00
Noah Misch	16304a0134	Add error-throwing wrappers for the printf family of functions. All known standard library implementations of these functions can fail with ENOMEM. A caller neglecting to check for failure would experience missing output, information exposure, or a crash. Check return values within wrappers and code, currently just snprintf.c, that bypasses the wrappers. The wrappers do not return after an error, so their callers need not check. Back-patch to 9.0 (all supported versions). Popular free software standard library implementations do take pains to bypass malloc() in simple cases, but they risk ENOMEM for floating point numbers, positional arguments, large field widths, and large precisions. No specification demands such caution, so this commit regards every call to a printf family function as a potential threat. Injecting the wrappers implicitly is a compromise between patch scope and design goals. I would prefer to edit each call site to name a wrapper explicitly. libpq and the ECPG libraries would, ideally, convey errors to the caller rather than abort(). All that would be painfully invasive for a back-patched security fix, hence this compromise. Security: CVE-2015-3166	2015-05-18 10:02:31 -04:00
Noah Misch	cac18a76bb	Permit use of vsprintf() in PostgreSQL code. The next commit needs it. Back-patch to 9.0 (all supported versions).	2015-05-18 10:02:31 -04:00
Tom Lane	424661913c	Fix failure to copy IndexScan.indexorderbyops in copyfuncs.c. This oversight results in a crash at executor startup if the plan has been copied. outfuncs.c was missed as well. While we could probably have taught both those files to cope with the originally chosen representation of an Oid array, it would have been painful, not least because there'd be no easy way to verify the array length. An Oid List is far easier to work with. And AFAICS, there is no particular notational benefit to using an array rather than a list in the existing parts of the patch either. So just change it to a list. Error in commit `35fcb1b3d0`, which is new, so no need for back-patch.	2015-05-17 21:22:12 -04:00
Andres Freund	f3d3118532	Support GROUPING SETS, CUBE and ROLLUP. This SQL standard functionality allows to aggregate data by different GROUP BY clauses at once. Each grouping set returns rows with columns grouped by in other sets set to NULL. This could previously be achieved by doing each grouping as a separate query, conjoined by UNION ALLs. Besides being considerably more concise, grouping sets will in many cases be faster, requiring only one scan over the underlying data. The current implementation of grouping sets only supports using sorting for input. Individual sets that share a sort order are computed in one pass. If there are sets that don't share a sort order, additional sort & aggregation steps are performed. These additional passes are sourced by the previous sort step; thus avoiding repeated scans of the source data. The code is structured in a way that adding support for purely using hash aggregation or a mix of hashing and sorting is possible. Sorting was chosen to be supported first, as it is the most generic method of implementation. Instead of, as in an earlier versions of the patch, representing the chain of sort and aggregation steps as full blown planner and executor nodes, all but the first sort are performed inside the aggregation node itself. This avoids the need to do some unusual gymnastics to handle having to return aggregated and non-aggregated tuples from underlying nodes, as well as having to shut down underlying nodes early to limit memory usage. The optimizer still builds Sort/Agg node to describe each phase, but they're not part of the plan tree, but instead additional data for the aggregation node. They're a convenient and preexisting way to describe aggregation and sorting. The first (and possibly only) sort step is still performed as a separate execution step. That retains similarity with existing group by plans, makes rescans fairly simple, avoids very deep plans (leading to slow explains) and easily allows to avoid the sorting step if the underlying data is sorted by other means. A somewhat ugly side of this patch is having to deal with a grammar ambiguity between the new CUBE keyword and the cube extension/functions named cube (and rollup). To avoid breaking existing deployments of the cube extension it has not been renamed, neither has cube been made a reserved keyword. Instead precedence hacking is used to make GROUP BY cube(..) refer to the CUBE grouping sets feature, and not the function cube(). To actually group by a function cube(), unlikely as that might be, the function name has to be quoted. Needs a catversion bump because stored rules may change. Author: Andrew Gierth and Atri Sharma, with contributions from Andres Freund Reviewed-By: Andres Freund, Noah Misch, Tom Lane, Svenne Krap, Tomas Vondra, Erik Rijkers, Marti Raudsepp, Pavel Stehule Discussion: CAOeZVidmVRe2jU6aMk_5qkxnB7dfmPROzM7Ur8JPW5j8Y5X-Lw@mail.gmail.com	2015-05-16 03:46:31 +02:00
Alvaro Herrera	b0b7be6133	Add BRIN infrastructure for "inclusion" opclasses This lets BRIN be used with R-Tree-like indexing strategies. Also provided are operator classes for range types, box and inet/cidr. The infrastructure provided here should be sufficient to create operator classes for similar datatypes; for instance, opclasses for PostGIS geometries should be doable, though we didn't try to implement one. (A box/point opclass was also submitted, but we ripped it out before commit because the handling of floating point comparisons in existing code is inconsistent and would generate corrupt indexes.) Author: Emre Hasegeli. Cosmetic changes by me Review: Andreas Karlsson	2015-05-15 18:05:22 -03:00
Alvaro Herrera	26df7066cc	Move strategy numbers to include/access/stratnum.h For upcoming BRIN opclasses, it's convenient to have strategy numbers defined in a single place. Since there's nothing appropriate, create it. The StrategyNumber typedef now lives there, as well as existing strategy numbers for B-trees (from skey.h) and R-tree-and-friends (from gist.h). skey.h is forced to include stratnum.h because of the StrategyNumber typedef, but gist.h is not; extensions that currently rely on gist.h for rtree strategy numbers might need to add a new A few .c files can stop including skey.h and/or gist.h, which is a nice side benefit. Per discussion: https://www.postgresql.org/message-id/20150514232132.GZ2523@alvh.no-ip.org Authored by Emre Hasegeli and Álvaro. (It's not clear to me why bootscanner.l has any #include lines at all.)	2015-05-15 17:03:16 -03:00
Simon Riggs	f6d208d6e5	TABLESAMPLE, SQL Standard and extensible Add a TABLESAMPLE clause to SELECT statements that allows user to specify random BERNOULLI sampling or block level SYSTEM sampling. Implementation allows for extensible sampling functions to be written, using a standard API. Basic version follows SQLStandard exactly. Usable concrete use cases for the sampling API follow in later commits. Petr Jelinek Reviewed by Michael Paquier and Simon Riggs	2015-05-15 14:37:10 -04:00
Heikki Linnakangas	ffd37740ee	Add archive_mode='always' option. In 'always' mode, the standby independently archives all files it receives from the primary. Original patch by Fujii Masao, docs and review by me.	2015-05-15 18:55:24 +03:00
Heikki Linnakangas	98edd617f3	Fix datatype confusion with the new lossy GiST distance functions. We can only support a lossy distance function when the distance function's datatype is comparable with the original ordering operator's datatype. The distance function always returns a float8, so we are limited to float8, and float4 (by a hard-coded cast of the float8 to float4). In light of this limitation, it seems like a good idea to have a separate 'recheck' flag for the ORDER BY expressions, so that if you have a non-lossy distance function, it still works with lossy quals. There are cases like that with the build-in or contrib opclasses, but it's plausible. There was a hidden assumption that the ORDER BY values returned by GiST match the original ordering operator's return type, but there are plenty of examples where that's not true, e.g. in btree_gist and pg_trgm. As long as the distance function is not lossy, we can tolerate that and just not return the distance to the executor (or rather, always return NULL). The executor doesn't need the distances if there are no lossy results. There was another little bug: the recheck variable was not initialized before calling the distance function. That revealed the bigger issue, as the executor tried to reorder tuples that didn't need reordering, and that failed because of the datatype mismatch.	2015-05-15 18:09:31 +03:00
Heikki Linnakangas	35fcb1b3d0	Allow GiST distance function to return merely a lower-bound. The distance function can now set *recheck = false, like index quals. The executor will then re-check the ORDER BY expressions, and use a queue to reorder the results on the fly. This makes it possible to do kNN-searches on polygons and circles, which don't store the exact value in the index, but just a bounding box. Alexander Korotkov and me	2015-05-15 14:26:51 +03:00
Fujii Masao	ecd222e770	Support VERBOSE option in REINDEX command. When this option is specified, a progress report is printed as each index is reindexed. Per discussion, we agreed on the following syntax for the extensibility of the options. REINDEX (flexible options) { INDEX \| ... } name Sawada Masahiko. Reviewed by Robert Haas, Fabrízio Mello, Alvaro Herrera, Kyotaro Horiguchi, Jim Nasby and me. Discussion: CAD21AoA0pK3YcOZAFzMae+2fcc3oGp5zoRggDyMNg5zoaWDhdQ@mail.gmail.com	2015-05-15 20:09:57 +09:00
Tom Lane	7730f48ede	Teach UtfToLocal/LocalToUtf to support algorithmic encoding conversions. Until now, these functions have only supported encoding conversions using lookup tables, which is fine as long as there's not too many code points to convert. However, GB18030 expects all 1.1 million Unicode code points to be convertible, which would require a ridiculously-sized lookup table. Fortunately, a large fraction of those conversions can be expressed through arithmetic, ie the conversions are one-to-one in certain defined ranges. To support that, provide a callback function that is used after consulting the lookup tables. (This patch doesn't actually change anything about the GB18030 conversion behavior, just provide infrastructure for fixing it.) Since this requires changing the APIs of UtfToLocal/LocalToUtf anyway, take the opportunity to rearrange their argument lists into what seems to me a saner order. And beautify the call sites by using lengthof() instead of error-prone sizeof() arithmetic. In passing, also mark all the lookup tables used by these calls "const". This moves an impressive amount of stuff into the text segment, at least on my machine, and is safer anyhow.	2015-05-14 22:27:12 -04:00
Simon Riggs	83e176ec18	Separate block sampling functions Refactoring ahead of tablesample patch Requested and reviewed by Michael Paquier Petr Jelinek	2015-05-15 04:02:54 +02:00
Peter Eisentraut	a486e35706	Add pg_settings.pending_restart column with input from David G. Johnston, Robert Haas, Michael Paquier	2015-05-14 20:08:51 -04:00
Tom Lane	1dc5ebc907	Support "expanded" objects, particularly arrays, for better performance. This patch introduces the ability for complex datatypes to have an in-memory representation that is different from their on-disk format. On-disk formats are typically optimized for minimal size, and in any case they can't contain pointers, so they are often not well-suited for computation. Now a datatype can invent an "expanded" in-memory format that is better suited for its operations, and then pass that around among the C functions that operate on the datatype. There are also provisions (rudimentary as yet) to allow an expanded object to be modified in-place under suitable conditions, so that operations like assignment to an element of an array need not involve copying the entire array. The initial application for this feature is arrays, but it is not hard to foresee using it for other container types like JSON, XML and hstore. I have hopes that it will be useful to PostGIS as well. In this initial implementation, a few heuristics have been hard-wired into plpgsql to improve performance for arrays that are stored in plpgsql variables. We would like to generalize those hacks so that other datatypes can obtain similar improvements, but figuring out some appropriate APIs is left as a task for future work. (The heuristics themselves are probably not optimal yet, either, as they sometimes force expansion of arrays that would be better left alone.) Preliminary performance testing shows impressive speed gains for plpgsql functions that do element-by-element access or update of large arrays. There are other cases that get a little slower, as a result of added array format conversions; but we can hope to improve anything that's annoyingly bad. In any case most applications should see a net win. Tom Lane, reviewed by Andres Freund	2015-05-14 12:08:49 -04:00
Andrew Dunstan	5c7df74204	Fix some errors from jsonb functions patch. The catalog version should have been bumped, and the alternative regression result file was not up to date with the name of jsonb_pretty.	2015-05-12 16:54:38 -04:00
Andrew Dunstan	c6947010ce	Additional functions and operators for jsonb jsonb_pretty(jsonb) produces nicely indented json output. jsonb \|\| jsonb concatenates two jsonb values. jsonb - text removes a key and its associated value from the json jsonb - int removes the designated array element jsonb - text[] removes a key and associated value or array element at the designated path jsonb_replace(jsonb,text[],jsonb) replaces the array element designated by the path or the value associated with the key designated by the path with the given value. Original work by Dmitry Dolgov, adapted and reworked for PostgreSQL core by Andrew Dunstan, reviewed and tidied up by Petr Jelinek.	2015-05-12 15:52:45 -04:00
Tom Lane	afb9249d06	Add support for doing late row locking in FDWs. Previously, FDWs could only do "early row locking", that is lock a row as soon as it's fetched, even though local restriction/join conditions might discard the row later. This patch adds callbacks that allow FDWs to do late locking in the same way that it's done for regular tables. To make use of this feature, an FDW must support the "ctid" column as a unique row identifier. Currently, since ctid has to be of type TID, the feature is of limited use, though in principle it could be used by postgres_fdw. We may eventually allow FDWs to specify another data type for ctid, which would make it possible for more FDWs to use this feature. This commit does not modify postgres_fdw to use late locking. We've tested some prototype code for that, but it's not in committable shape, and besides it's quite unclear whether it actually makes sense to do late locking against a remote server. The extra round trips required are likely to outweigh any benefit from improved concurrency. Etsuro Fujita, reviewed by Ashutosh Bapat, and hacked up a lot by me	2015-05-12 14:10:17 -04:00
Andrew Dunstan	72d422a522	Map basebackup tablespaces using a tablespace_map file Windows can't reliably restore symbolic links from a tar format, so instead during backup start we create a tablespace_map file, which is used by the restoring postgres to create the correct links in pg_tblspc. The backup protocol also now has an option to request this file to be included in the backup stream, and this is used by pg_basebackup when operating in tar mode. This is done on all platforms, not just Windows. This means that pg_basebackup will not not work in tar mode against 9.4 and older servers, as this protocol option isn't implemented there. Amit Kapila, reviewed by Dilip Kumar, with a little editing from me.	2015-05-12 09:29:10 -04:00
Alvaro Herrera	b488c580ae	Allow on-the-fly capture of DDL event details This feature lets user code inspect and take action on DDL events. Whenever a ddl_command_end event trigger is installed, DDL actions executed are saved to a list which can be inspected during execution of a function attached to ddl_command_end. The set-returning function pg_event_trigger_ddl_commands can be used to list actions so captured; it returns data about the type of command executed, as well as the affected object. This is sufficient for many uses of this feature. For the cases where it is not, we also provide a "command" column of a new pseudo-type pg_ddl_command, which is a pointer to a C structure that can be accessed by C code. The struct contains all the info necessary to completely inspect and even reconstruct the executed command. There is no actual deparse code here; that's expected to come later. What we have is enough infrastructure that the deparsing can be done in an external extension. The intention is that we will add some deparsing code in a later release, as an in-core extension. A new test module is included. It's probably insufficient as is, but it should be sufficient as a starting point for a more complete and future-proof approach. Authors: Álvaro Herrera, with some help from Andres Freund, Ian Barwick, Abhijit Menon-Sen. Reviews by Andres Freund, Robert Haas, Amit Kapila, Michael Paquier, Craig Ringer, David Steele. Additional input from Chris Browne, Dimitri Fontaine, Stephen Frost, Petr Jelínek, Tom Lane, Jim Nasby, Steven Singer, Pavel Stěhule. Based on original work by Dimitri Fontaine, though I didn't use his code. Discussion: https://www.postgresql.org/message-id/m2txrsdzxa.fsf@2ndQuadrant.fr https://www.postgresql.org/message-id/20131108153322.GU5809@eldon.alvh.no-ip.org https://www.postgresql.org/message-id/20150215044814.GL3391@alvh.no-ip.org	2015-05-11 19:14:31 -03:00
Tom Lane	1a8a4e5cde	Code review for foreign/custom join pushdown patch. Commit `e7cb7ee145` included some design decisions that seem pretty questionable to me, and there was quite a lot of stuff not to like about the documentation and comments. Clean up as follows: * Consider foreign joins only between foreign tables on the same server, rather than between any two foreign tables with the same underlying FDW handler function. In most if not all cases, the FDW would simply have had to apply the same-server restriction itself (far more expensively, both for lack of caching and because it would be repeated for each combination of input sub-joins), or else risk nasty bugs. Anyone who's really intent on doing something outside this restriction can always use the set_join_pathlist_hook. * Rename fdw_ps_tlist/custom_ps_tlist to fdw_scan_tlist/custom_scan_tlist to better reflect what they're for, and allow these custom scan tlists to be used even for base relations. * Change make_foreignscan() API to include passing the fdw_scan_tlist value, since the FDW is required to set that. Backwards compatibility doesn't seem like an adequate reason to expect FDWs to set it in some ad-hoc extra step, and anyway existing FDWs can just pass NIL. * Change the API of path-generating subroutines of add_paths_to_joinrel, and in particular that of GetForeignJoinPaths and set_join_pathlist_hook, so that various less-used parameters are passed in a struct rather than as separate parameter-list entries. The objective here is to reduce the probability that future additions to those parameter lists will result in source-level API breaks for users of these hooks. It's possible that this is even a small win for the core code, since most CPU architectures can't pass more than half a dozen parameters efficiently anyway. I kept root, joinrel, outerrel, innerrel, and jointype as separate parameters to reduce code churn in joinpath.c --- in particular, putting jointype into the struct would have been problematic because of the subroutines' habit of changing their local copies of that variable. * Avoid ad-hocery in ExecAssignScanProjectionInfo. It was probably all right for it to know about IndexOnlyScan, but if the list is to grow we should refactor the knowledge out to the callers. * Restore nodeForeignscan.c's previous use of the relcache to avoid extra GetFdwRoutine lookups for base-relation scans. * Lots of cleanup of documentation and missed comments. Re-order some code additions into more logical places.	2015-05-10 14:36:36 -04:00
Andrew Dunstan	cb9fa802b3	Add new OID alias type regnamespace Catalog version bumped Kyotaro HORIGUCHI	2015-05-09 13:36:52 -04:00
Andrew Dunstan	0c90f6769d	Add new OID alias type regrole The new type has the scope of whole the database cluster so it doesn't behave the same as the existing OID alias types which have database scope, concerning object dependency. To avoid confusion constants of the new type are prohibited from appearing where dependencies are made involving it. Also, add a note to the docs about possible MVCC violation and optimization issues, which are general over the all reg* types. Kyotaro Horiguchi	2015-05-09 13:06:49 -04:00
Stephen Frost	4b342fb591	Bump catversion for pg_file_settings Pointed out by Andres (thanks!) Apologies for not including it in the initial patch.	2015-05-08 19:14:32 -04:00
Stephen Frost	a97e0c3354	Add pg_file_settings view and function The function and view added here provide a way to look at all settings in postgresql.conf, any #include'd files, and postgresql.auto.conf (which is what backs the ALTER SYSTEM command). The information returned includes the configuration file name, line number in that file, sequence number indicating when the parameter is loaded (useful to see if it is later masked by another definition of the same parameter), parameter name, and what it is set to at that point. This information is updated on reload of the server. This is unfiltered, privileged, information and therefore access is restricted to superusers through the GRANT system. Author: Sawada Masahiko, various improvements by me. Reviewers: David Steele	2015-05-08 19:09:26 -04:00
Heikki Linnakangas	de7688442f	At promotion, archive last segment from old timeline with .partial suffix. Previously, we would archive the possible-incomplete WAL segment with its normal filename, but that causes trouble if the server owning that timeline is still running, and tries to archive the same segment later. It's not nice for the standby to trip up the master's archival like that. And it's pretty confusing, anyway, to have an incomplete segment in the archive that's indistinguishable from a normal, complete segment. To avoid such confusion, add a .partial suffix to the file. Or to be more precise, make a copy of the old segment under the .partial suffix, and archive that instead of the original file. pg_receivexlog also uses the .partial suffix for the same purpose, to tell apart incompletely streamed files from complete ones. There is no automatic mechanism to use the .partial files at recovery, so they will go unused, unless the administrator manually copies to them to the pg_xlog directory (and removes the .partial suffix). Recovery won't normally need the WAL - when recovering to the new timeline, it will find the same WAL on the first segment on the new timeline instead - but it nevertheless feels better to archive the file with the .partial suffix, for debugging purposes if nothing else.	2015-05-08 21:59:01 +03:00
Heikki Linnakangas	179cdd0981	Add macros to check if a filename is a WAL segment or other such file. We had many instances of the strlen + strspn combination to check for that. This makes the code a bit easier to read.	2015-05-08 21:58:57 +03:00
Andres Freund	e8898e9169	Minor ON CONFLICT related comments and doc fixes. Geoff Winkless, Stephen Frost, Peter Geoghegan and me.	2015-05-08 19:24:14 +02:00
Robert Haas	53bb309d2d	Teach autovacuum about multixact member wraparound. The logic introduced in commit `b69bf30b9b` and repaired in commits `669c7d20e6` and `7be47c56af` helps to ensure that we don't overwrite old multixact member information while it is still needed, but a user who creates many large multixacts can still exhaust the member space (and thus start getting errors) while autovacuum stands idly by. To fix this, progressively ramp down the effective value (but not the actual contents) of autovacuum_multixact_freeze_max_age as member space utilization increases. This makes autovacuum more aggressive and also reduces the threshold for a manual VACUUM to perform a full-table scan. This patch leaves unsolved the problem of ensuring that emergency autovacuums are triggered even when autovacuum=off. We'll need to fix that via a separate patch. Thomas Munro and Robert Haas	2015-05-08 12:53:00 -04:00
Andres Freund	168d5805e4	Add support for INSERT ... ON CONFLICT DO NOTHING/UPDATE. The newly added ON CONFLICT clause allows to specify an alternative to raising a unique or exclusion constraint violation error when inserting. ON CONFLICT refers to constraints that can either be specified using a inference clause (by specifying the columns of a unique constraint) or by naming a unique or exclusion constraint. DO NOTHING avoids the constraint violation, without touching the pre-existing row. DO UPDATE SET ... [WHERE ...] updates the pre-existing tuple, and has access to both the tuple proposed for insertion and the existing tuple; the optional WHERE clause can be used to prevent an update from being executed. The UPDATE SET and WHERE clauses have access to the tuple proposed for insertion using the "magic" EXCLUDED alias, and to the pre-existing tuple using the table name or its alias. This feature is often referred to as upsert. This is implemented using a new infrastructure called "speculative insertion". It is an optimistic variant of regular insertion that first does a pre-check for existing tuples and then attempts an insert. If a violating tuple was inserted concurrently, the speculatively inserted tuple is deleted and a new attempt is made. If the pre-check finds a matching tuple the alternative DO NOTHING or DO UPDATE action is taken. If the insertion succeeds without detecting a conflict, the tuple is deemed inserted. To handle the possible ambiguity between the excluded alias and a table named excluded, and for convenience with long relation names, INSERT INTO now can alias its target table. Bumps catversion as stored rules change. Author: Peter Geoghegan, with significant contributions from Heikki Linnakangas and Andres Freund. Testing infrastructure by Jeff Janes. Reviewed-By: Heikki Linnakangas, Andres Freund, Robert Haas, Simon Riggs, Dean Rasheed, Stephen Frost and many others.	2015-05-08 05:43:10 +02:00
Andres Freund	2c8f4836db	Represent columns requiring insert and update privileges indentently. Previously, relation range table entries used a single Bitmapset field representing which columns required either UPDATE or INSERT privileges, despite the fact that INSERT and UPDATE privileges are separately cataloged, and may be independently held. As statements so far required either insert or update privileges but never both, that was sufficient. The required permission could be inferred from the top level statement run. The upcoming INSERT ... ON CONFLICT UPDATE feature needs to independently check for both privileges in one statement though, so that is not sufficient anymore. Bumps catversion as stored rules change. Author: Peter Geoghegan Reviewed-By: Andres Freund	2015-05-08 00:20:46 +02:00
Alvaro Herrera	db5f98ab4f	Improve BRIN infra, minmax opclass and regression test The minmax opclass was using the wrong support functions when cross-datatypes queries were run. Instead of trying to fix the pg_amproc definitions (which apparently is not possible), use the already correct pg_amop entries instead. This requires jumping through more hoops (read: extra syscache lookups) to obtain the underlying functions to execute, but it is necessary for correctness. Author: Emre Hasegeli, tweaked by Álvaro Review: Andreas Karlsson Also change BrinOpcInfo to record each stored type's typecache entry instead of just the OID. Turns out that the full type cache is necessary in brin_deform_tuple: the original code used the indexed type's byval and typlen properties to extract the stored tuple, which is correct in Minmax; but in other implementations that want to store something different, that's wrong. The realization that this is a bug comes from Emre also, but I did not use his patch. I also adopted Emre's regression test code (with smallish changes), which is more complete.	2015-05-07 13:02:22 -03:00
Robert Haas	1998261034	Avoid using a C++ keyword as a structure member name. Per request from Peter Eisentraut.	2015-05-05 22:41:03 -04:00
Alvaro Herrera	3b6db1f445	Add geometry/range functions to support BRIN inclusion This commit adds the following functions: box(point) -> box bound_box(box, box) -> box inet_same_family(inet, inet) -> bool inet_merge(inet, inet) -> cidr range_merge(anyrange, anyrange) -> anyrange The first of these is also used to implement a new assignment cast from point to box. These functions are the first part of a base to implement an "inclusion" operator class for BRIN, for multidimensional data types. Author: Emre Hasegeli Reviewed by: Andreas Karlsson	2015-05-05 15:22:24 -03:00
Tom Lane	2503982be4	Improve procost estimates for some text search functions. The text search functions that involve parsing raw text into lexemes are remarkably CPU-intensive, so estimating them at the same cost as most other built-in functions seems like a mistake; moreover, doing so turns out to discourage the optimizer from using functional indexes on these functions. After some debate, we've agreed to raise procost from 1 to 100 for to_tsvector(), plainto_tsvector(), to_tsquery(), ts_headline(), ts_match_tt(), and ts_match_tq(), which are all the text search functions that parse raw text. Also increase procost for the 2-argument form of ts_rewrite() (tsquery_rewrite_query); while this function doesn't do text parsing, it does execute a user-supplied SQL query, so its previous procost of 1 is clearly a drastic underestimate. It seems reasonable to assign it the same cost we assign to PL functions by default, so 100 is the number here too. I did not bother bumping catversion for this change, since it does not break catalog compatibility with the server executable nor result in any regression test changes. Per complaint from Andrew Gierth and subsequent discussion.	2015-05-04 15:38:57 -04:00
Robert Haas	2ce439f337	Recursively fsync() the data directory after a crash. Otherwise, if there's another crash, some writes from after the first crash might make it to disk while writes from before the crash fail to make it to disk. This could lead to data corruption. Back-patch to all supported versions. Abhijit Menon-Sen, reviewed by Andres Freund and slightly revised by me.	2015-05-04 14:13:53 -04:00
Robert Haas	e7cb7ee145	Allow FDWs and custom scan providers to replace joins with scans. Foreign data wrappers can use this capability for so-called "join pushdown"; that is, instead of executing two separate foreign scans and then joining the results locally, they can generate a path which performs the join on the remote server and then is scanned locally. This commit does not extend postgres_fdw to take advantage of this capability; it just provides the infrastructure. Custom scan providers can use this in a similar way. Previously, it was only possible for a custom scan provider to scan a single relation. Now, it can scan an entire join tree, provided of course that it knows how to produce the same results that the join would have produced if executed normally. KaiGai Kohei, reviewed by Shigeru Hanada, Ashutosh Bapat, and me.	2015-05-01 08:50:35 -04:00
Andres Freund	2b22795b32	Copy editing of the replication origins patch. Michael Paquier and myself.	2015-05-01 12:22:13 +02:00
Andres Freund	1db12da85b	Fix unaligned memory access in xlog parsing due to replication origin patch. ParseCommitRecord() accessed xl_xact_origin directly. But the chunks in the commit record's data only have 4 byte alignment, whereas xl_xact_origin's members require 8 byte alignment on some platforms. Update comments to make not of that and copy the record to stack local storage before reading. With help from Stefan Kaltenbrunner in pinning down the buildfarm and verifying the fix.	2015-05-01 11:36:14 +02:00
Robert Haas	924bcf4f16	Create an infrastructure for parallel computation in PostgreSQL. This does four basic things. First, it provides convenience routines to coordinate the startup and shutdown of parallel workers. Second, it synchronizes various pieces of state (e.g. GUCs, combo CID mappings, transaction snapshot) from the parallel group leader to the worker processes. Third, it prohibits various operations that would result in unsafe changes to that state while parallelism is active. Finally, it propagates events that would result in an ErrorResponse, NoticeResponse, or NotifyResponse message being sent to the client from the parallel workers back to the master, from which they can then be sent on to the client. Robert Haas, Amit Kapila, Noah Misch, Rushabh Lathia, Jeevan Chalke. Suggestions and review from Andres Freund, Heikki Linnakangas, Noah Misch, Simon Riggs, Euler Taveira, and Jim Nasby.	2015-04-30 15:02:14 -04:00
Andres Freund	e0f26fc765	Correct replication origin's use of UINT16_MAX to PG_UINT16_MAX. We can't rely on UINT16_MAX being present, which is why we introduced PG_UINT16_MAX... Buildfarm animal bowerbird via Andrew Gierth.	2015-04-30 00:19:36 +02:00
Andres Freund	5aa2350426	Introduce replication progress tracking infrastructure. When implementing a replication solution ontop of logical decoding, two related problems exist: * How to safely keep track of replication progress * How to change replication behavior, based on the origin of a row; e.g. to avoid loops in bi-directional replication setups The solution to these problems, as implemented here, consist out of three parts: 1) 'replication origins', which identify nodes in a replication setup. 2) 'replication progress tracking', which remembers, for each replication origin, how far replay has progressed in a efficient and crash safe manner. 3) The ability to filter out changes performed on the behest of a replication origin during logical decoding; this allows complex replication topologies. E.g. by filtering all replayed changes out. Most of this could also be implemented in "userspace", e.g. by inserting additional rows contain origin information, but that ends up being much less efficient and more complicated. We don't want to require various replication solutions to reimplement logic for this independently. The infrastructure is intended to be generic enough to be reusable. This infrastructure also replaces the 'nodeid' infrastructure of commit timestamps. It is intended to provide all the former capabilities, except that there's only 2^16 different origins; but now they integrate with logical decoding. Additionally more functionality is accessible via SQL. Since the commit timestamp infrastructure has also been introduced in 9.5 (commit `73c986add`) changing the API is not a problem. For now the number of origins for which the replication progress can be tracked simultaneously is determined by the max_replication_slots GUC. That GUC is not a perfect match to configure this, but there doesn't seem to be sufficient reason to introduce a separate new one. Bumps both catversion and wal page magic. Author: Andres Freund, with contributions from Petr Jelinek and Craig Ringer Reviewed-By: Heikki Linnakangas, Petr Jelinek, Robert Haas, Steve Singer Discussion: 20150216002155.GI15326@awork2.anarazel.de, 20140923182422.GA15776@alap3.anarazel.de, 20131114172632.GE7522@alap2.anarazel.de	2015-04-29 19:30:53 +02:00
Stephen Frost	dcbf5948e1	Improve qual pushdown for RLS and SB views The original security barrier view implementation, on which RLS is built, prevented all non-leakproof functions from being pushed down to below the view, even when the function was not receiving any data from the view. This optimization improves on that situation by, instead of checking strictly for non-leakproof functions, it checks for Vars being passed to non-leakproof functions and allows functions which do not accept arguments or whose arguments are not from the current query level (eg: constants can be particularly useful) to be pushed down. As discussed, this does mean that a function which is pushed down might gain some idea that there are rows meeting a certain criteria based on the number of times the function is called, but this isn't a particularly new issue and the documentation in rules.sgml already addressed similar covert-channel risks. That documentation is updated to reflect that non-leakproof functions may be pushed down now, if they meet the above-described criteria. Author: Dean Rasheed, with a bit of rework to make things clearer, along with comment and documentation updates from me.	2015-04-27 12:29:42 -04:00
Andres Freund	6aab1f45ac	Fix various typos and grammar errors in comments. Author: Dmitriy Olshevskiy Discussion: 553D00A6.4090205@bk.ru	2015-04-26 18:42:31 +02:00
Peter Eisentraut	cac7658205	Add transforms feature This provides a mechanism for specifying conversions between SQL data types and procedural languages. As examples, there are transforms for hstore and ltree for PL/Perl and PL/Python. reviews by Pavel Stěhule and Andres Freund	2015-04-26 10:33:14 -04:00
Stephen Frost	e89bd02f58	Perform RLS WITH CHECK before constraints, etc The RLS capability is built on top of the WITH CHECK OPTION system which was added for auto-updatable views, however, unlike WCOs on views (which are mandated by the SQL spec to not fire until after all other constraints and checks are done), it makes much more sense for RLS checks to happen earlier than constraint and uniqueness checks. This patch reworks the structure which holds the WCOs a bit to be explicitly either VIEW or RLS checks and the RLS-related checks are done prior to the constraint and uniqueness checks. This also allows better error reporting as we are now reporting when a violation is due to a WITH CHECK OPTION and when it's due to an RLS policy violation, which was independently noted by Craig Ringer as being confusing. The documentation is also updated to include a paragraph about when RLS WITH CHECK handling is performed, as there have been a number of questions regarding that and the documentation was previously silent on the matter. Author: Dean Rasheed, with some kabitzing and comment changes by me.	2015-04-24 20:34:26 -04:00
Heikki Linnakangas	62420ae7d6	Move functions related to index maintenance to separate source file. There is enough code here to deserve a file of their own, not be buried in the middle of execUtils.c.	2015-04-24 09:33:23 +03:00
Stephen Frost	0bf22e0c8b	RLS fixes, new hooks, and new test module In prepend_row_security_policies(), defaultDeny was always true, so if there were any hook policies, the RLS policies on the table would just get discarded. Fixed to start off with defaultDeny as false and then properly set later if we detect that only the default deny policy exists for the internal policies. The infinite recursion detection in fireRIRrules() didn't properly manage the activeRIRs list in the case of WCOs, so it would incorrectly report infinite recusion if the same relation with RLS appeared more than once in the rtable, for example "UPDATE t ... FROM t ...". Further, the RLS expansion code in fireRIRrules() was handling RLS in the main loop through the rtable, which lead to RTEs being visited twice if they contained sublink subqueries, which prepend_row_security_policies() attempted to handle by exiting early if the RTE already had securityQuals. That doesn't work, however, since if the query involved a security barrier view on top of a table with RLS, the RTE would already have securityQuals (from the view) by the time fireRIRrules() was invoked, and so the table's RLS policies would be ignored. This is fixed in fireRIRrules() by handling RLS in a separate loop at the end, after dealing with any other sublink subqueries, thus ensuring that each RTE is only visited once for RLS expansion. The inheritance planner code didn't correctly handle non-target relations with RLS, which would get turned into subqueries during planning. Thus an update of the form "UPDATE t1 ... FROM t2 ..." where t1 has inheritance and t2 has RLS quals would fail. Fix by making sure to copy in and update the securityQuals when they exist for non-target relations. process_policies() was adding WCOs to non-target relations, which is unnecessary, and could lead to a lot of wasted time in the rewriter and the planner. Fix by only adding WCO policies when working on the result relation. Also in process_policies, we should be copying the USING policies to the WITH CHECK policies on a per-policy basis, fix by moving the copying up into the per-policy loop. Lastly, as noted by Dean, we were simply adding policies returned by the hook provided to the list of quals being AND'd, meaning that they would actually restrict records returned and there was no option to have internal policies and hook-based policies work together permissively (as all internal policies currently work). Instead, explicitly add support for both permissive and restrictive policies by having a hook for each and combining the results appropriately. To ensure this is all done correctly, add a new test module (test_rls_hooks) to test the various combinations of internal, permissive, and restrictive hook policies. Largely from Dean Rasheed (thanks!): CAEZATCVmFUfUOwwhnBTcgi6AquyjQ0-1fyKd0T3xBWJvn+xsFA@mail.gmail.com Author: Dean Rasheed, though I added the new hooks and test module.	2015-04-22 12:01:06 -04:00
Andres Freund	cef939c347	Rename pg_replication_slot's new active_in to active_pid. In `d811c037ce` active_in was added but discussion since showed that active_pid is preferred as a name. Discussion: CAMsr+YFKgZca5_7_ouaMWxA5PneJC9LNViPzpDHusaPhU9pA7g@mail.gmail.com	2015-04-22 09:43:40 +02:00
Andres Freund	d811c037ce	Add 'active_in' column to pg_replication_slots. Right now it is visible whether a replication slot is active in any session, but not in which. Adding the active_in column, containing the pid of the backend having acquired the slot, makes it much easier to associate pg_replication_slots entries with the corresponding pg_stat_replication/pg_stat_activity row. This should have been done from the start, but I (Andres) dropped the ball there somehow. Author: Craig Ringer, revised by me Discussion: CAMsr+YFKgZca5_7_ouaMWxA5PneJC9LNViPzpDHusaPhU9pA7g@mail.gmail.com	2015-04-21 11:51:06 +02:00
Bruce Momjian	f92fc4c95d	pg_upgrade: binary_upgrade_create_empty_extension() is strict Was broken by commit `30982be4e5`. Patch by Jeff Janes	2015-04-17 20:08:42 -04:00
Peter Eisentraut	30982be4e5	Integrate pg_upgrade_support module into backend Previously, these functions were created in a schema "binary_upgrade", which was deleted after pg_upgrade was finished. Because we don't want to keep that schema around permanently, move them to pg_catalog but rename them with a binary_upgrade_... prefix. The provided functions are only small wrappers around global variables that were added specifically for pg_upgrade use, so keeping the module separate does not create any modularity. The functions still check that they are only called in binary upgrade mode, so it is not possible to call these during normal operation. Reviewed-by: Michael Paquier <michael.paquier@gmail.com>	2015-04-14 19:26:37 -04:00
Heikki Linnakangas	b73e7a0716	Oops, fix misspelled #endif I hope this fixes the Windows builfarm failures.	2015-04-14 22:00:52 +03:00
Heikki Linnakangas	3dc2d62d04	Use Intel SSE 4.2 CRC instructions where available. Modern x86 and x86-64 processors with SSE 4.2 support have special instructions, crc32b and crc32q, for calculating CRC-32C. They greatly speed up CRC calculation. Whether the instructions can be used or not depends on the compiler and the target architecture. If generation of SSE 4.2 instructions is allowed for the target (-msse4.2 flag on gcc and clang), use them. If they are not allowed by default, but the compiler supports the -msse4.2 flag to enable them, compile just the CRC-32C function with -msse4.2 flag, and check at runtime whether the processor we're running on supports it. If it doesn't, fall back to the slicing-by-8 algorithm. (With the common defaults on current operating systems, the runtime-check variant is what you get in practice.) Abhijit Menon-Sen, heavily modified by me, reviewed by Andres Freund.	2015-04-14 17:05:03 +03:00
Heikki Linnakangas	4f700bcd20	Reorganize our CRC source files again. Now that we use CRC-32C in WAL and the control file, the "traditional" and "legacy" CRC-32 variants are not used in any frontend programs anymore. Move the code for those back from src/common to src/backend/utils/hash. Also move the slicing-by-8 implementation (back) to src/port. This is in preparation for next patch that will add another implementation that uses Intel SSE 4.2 instructions to calculate CRC-32C, where available.	2015-04-14 17:03:42 +03:00
Heikki Linnakangas	b2a5545bd6	Don't archive bogus recycled or preallocated files after timeline switch. After a timeline switch, we would leave behind recycled WAL segments that are in the future, but on the old timeline. After promotion, and after they become old enough to be recycled again, we would notice that they don't have a .ready or .done file, create a .ready file for them, and archive them. That's bogus, because the files contain garbage, recycled from an older timeline (or prealloced as zeros). We shouldn't archive such files. This could happen when we're following a timeline switch during replay, or when we switch to new timeline at end-of-recovery. To fix, whenever we switch to a new timeline, scan the data directory for WAL segments on the old timeline, but with a higher segment number, and remove them. Those don't belong to our timeline history, and are most likely bogus recycled or preallocated files. They could also be valid files that we streamed from the primary ahead of time, but in any case, they're not needed to recover to the new timeline.	2015-04-13 16:53:49 +03:00
Magnus Hagander	9029f4b374	Add system view pg_stat_ssl This view shows information about all connections, such as if the connection is using SSL, which cipher is used, and which client certificate (if any) is used. Reviews by Alex Shulgin, Heikki Linnakangas, Andres Freund & Michael Paquier	2015-04-12 19:07:46 +02:00
Alvaro Herrera	27846f02c1	Optimize locking a tuple already locked by another subxact Locking and updating the same tuple repeatedly led to some strange multixacts being created which had several subtransactions of the same parent transaction holding locks of the same strength. However, once a subxact of the current transaction holds a lock of a given strength, it's not necessary to acquire the same lock again. This made some coding patterns much slower than required. The fix is twofold. First we change HeapTupleSatisfiesUpdate to return HeapTupleBeingUpdated for the case where the current transaction is already a single-xid locker for the given tuple; it used to return HeapTupleMayBeUpdated for that case. The new logic is simpler, and the change to pgrowlocks is a testament to that: previously we needed to check for the single-xid locker separately in a very ugly way. That test is simpler now. As fallout from the HTSU change, some of its callers need to be amended so that tuple-locked-by-own-transaction is taken into account in the BeingUpdated case rather than the MayBeUpdated case. For many of them there is no difference; but heap_delete() and heap_update now check explicitely and do not grab tuple lock in that case. The HTSU change also means that routine MultiXactHasRunningRemoteMembers introduced in commit `11ac4c73cb` is no longer necessary and can be removed; the case that used to require it is now handled naturally as result of the changes to heap_delete and heap_update. The second part of the fix to the performance issue is to adjust heap_lock_tuple to avoid the slowness: 1. Previously we checked for the case that our own transaction already held a strong enough lock and returned MayBeUpdated, but only in the multixact case. Now we do it for the plain Xid case as well, which saves having to LockTuple. 2. If the current transaction is the only locker of the tuple (but with a lock not as strong as what we need; otherwise it would have been caught in the check mentioned above), we can skip sleeping on the multixact, and instead go straight to create an updated multixact with the additional lock strength. 3. Most importantly, make sure that both the single-xid-locker case and the multixact-locker case optimization are applied always. We do this by checking both in a single place, rather than them appearing in two separate portions of the routine -- something that is made possible by the HeapTupleSatisfiesUpdate API change. Previously we would only check for the single-xid case when HTSU returned MayBeUpdated, and only checked for the multixact case when HTSU returned BeingUpdated. This was at odds with what HTSU actually returned in one case: if our own transaction was locker in a multixact, it returned MayBeUpdated, so the optimization never applied. This is what led to the large multixacts in the first place. Per bug report #8470 by Oskari Saarenmaa.	2015-04-10 13:47:15 -03:00
Alvaro Herrera	e9a077cad3	pg_event_trigger_dropped_objects: add is_temp column It now also reports temporary objects dropped that are local to the backend. Previously we weren't reporting any temp objects because it was deemed unnecessary; but as it turns out, it is necessary if we want to keep close track of DDL command execution inside one session. Temp objects are reported as living in schema pg_temp, which works because such a schema-qualification always refers to the temp objects of the current session.	2015-04-06 11:40:55 -03:00
Alvaro Herrera	4ff695b17d	Add log_min_autovacuum_duration per-table option This is useful to control autovacuum log volume, for situations where monitoring only a set of tables is necessary. Author: Michael Paquier Reviewed by: A team led by Naoya Anzai (also including Akira Kurosawa, Taiki Kondo, Huong Dangminh), Fujii Masao.	2015-04-03 11:55:50 -03:00
Fujii Masao	8c8a886268	Add palloc_extended for frontend and backend. This commit also adds pg_malloc_extended for frontend. These interfaces can be used to control at a lower level memory allocation using an interface similar to MemoryContextAllocExtended. For example, the callers can specify MCXT_ALLOC_NO_OOM if they want to suppress the "out of memory" error while allocating the memory and handle a NULL return value. Michael Paquier, reviewed by me.	2015-04-03 17:36:12 +09:00
Robert Haas	abd94bcac4	Use abbreviated keys for faster sorting of numeric datums. Andrew Gierth, reviewed by Peter Geoghegan, with further tweaks by me.	2015-04-02 14:04:26 -04:00
Andres Freund	62e2a8dc2c	Define integer limits independently from the system definitions. In `83ff1618` we defined integer limits iff they're not provided by the system. That turns out not to be the greatest idea because there's different ways some datatypes can be represented. E.g. on OSX PG's 64bit datatype will be a 'long int', but OSX unconditionally uses 'long long'. That disparity then can lead to warnings, e.g. around printf formats. One way to fix that would be to back int64 using stdint.h's int64_t. While a good idea it's not that easy to implement. We would e.g. need to include stdint.h in our external headers, which we don't today. Also computing the correct int64 printf formats in that case is nontrivial. Instead simply prefix the integer limits with PG_ and define them unconditionally. I've adjusted all the references to them in code, but not the ones in comments; the latter seems unnecessary to me. Discussion: 20150331141423.GK4878@alap3.anarazel.de	2015-04-02 17:43:35 +02:00
Robert Haas	4cd639baf4	Revert "psql: fix \connect with URIs and conninfo strings" This reverts commit `fcef161729`, about which both the buildfarm and my local machine are very unhappy.	2015-04-02 10:10:22 -04:00
Alvaro Herrera	fcef161729	psql: fix \connect with URIs and conninfo strings psql was already accepting conninfo strings as the first parameter in \connect, but the way it worked wasn't sane; some of the other parameters would get the previous connection's values, causing it to connect to a completely unexpected server or, more likely, not finding any server at all because of completely wrong combinations of parameters. Fix by explicitely checking for a conninfo-looking parameter in the dbname position; if one is found, use its complete specification rather than mix with the other arguments. Also, change tab-completion to not try to complete conninfo/URI-looking "dbnames" and document that conninfos are accepted as first argument. There was a weak consensus to backpatch this, because while the behavior of using the dbname as a conninfo is nowhere documented for \connect, it is reasonable to expect that it works because it does work in many other contexts. Therefore this is backpatched all the way back to 9.0. To implement this, routines previously private to libpq have been duplicated so that psql can decide what looks like a conninfo/URI string. In back branches, just duplicate the same code all the way back to 9.2, where URIs where introduced; 9.0 and 9.1 have a simpler version. In master, the routines are moved to src/common and renamed. Author: David Fetter, Andrew Dunstan. Some editorialization by me (probably earning a Gierth's "Sloppy" badge in the process.) Reviewers: Andrew Gierth, Erik Rijkers, Pavel Stěhule, Stephen Frost, Robert Haas, Andrew Dunstan.	2015-04-01 20:00:07 -03:00
Heikki Linnakangas	f770870d9e	Move inet/cidr GiST opclass functions to correct place in header file. They were accidentally placed under the GIN heading. Andreas Karlsson	2015-04-01 19:20:45 +03:00
Andrew Dunstan	fa1e5afa8a	Run pg_upgrade and pg_resetxlog with restricted token on Windows As with initdb these programs need to run with a restricted token, and if they don't pg_upgrade will fail when run as a user with Adminstrator privileges. Backpatch to all live branches. On the development branch the code is reorganized so that the restricted token code is now in a single location. On the stable bramches a less invasive change is made by simply copying the relevant code to pg_upgrade.c and pg_resetxlog.c. Patches and bug report from Muhammad Asif Naeem, reviewed by Michael Paquier, slightly edited by me.	2015-03-30 17:07:52 -04:00
Alvaro Herrera	97690ea6e8	Change array_offset to return subscripts, not offsets ... and rename it and its sibling array_offsets to array_position and array_positions, to account for the changed behavior. Having the functions return subscripts better matches existing practice, and is better suited to using the result value as a subscript into the array directly. For one-based arrays, the new definition is identical to what was originally committed. (We use the term "subscript" in the documentation, which is what we use whenever we talk about arrays; but the functions themselves are named using the word "position" to match the standard-defined POSITION() functions.) Author: Pavel Stěhule Behavioral problem noted by Dean Rasheed.	2015-03-30 16:13:21 -03:00
Heikki Linnakangas	0633a60f4d	Add index-only scan support to range type GiST opclass. Andreas Karlsson	2015-03-30 13:22:38 +03:00
Heikki Linnakangas	3a20b0e7b6	Add index-only scan support to inet GiST opclass. Andreas Karlsson	2015-03-28 15:11:53 +02:00
Heikki Linnakangas	55b59eda13	Fix GiST index-only scans for opclasses with different storage type. We cannot use the index's tuple descriptor directly to describe the index tuples returned in an index-only scan. That's because the index might use a different datatype for the values stored on disk than the type originally indexed. As long as they were both pass-by-ref, it worked, but will not work for pass-by-value types of different sizes. I noticed this as a crash when I started hacking a patch to add fetch methods to btree_gist.	2015-03-26 23:07:52 +02:00
Tom Lane	785941cdc3	Tweak __attribute__-wrapping macros for better pgindent results. This improves on commit `bbfd7edae5` by making two simple changes: * pg_attribute_noreturn now takes parentheses, ie pg_attribute_noreturn(). Likewise pg_attribute_unused(), pg_attribute_packed(). This reduces pgindent's tendency to misformat declarations involving them. * attributes are now always attached to function declarations, not definitions. Previously some places were taking creative shortcuts, which were not merely candidates for bad misformatting by pgindent but often were outright wrong anyway. (It does little good to put a noreturn annotation where callers can't see it.) In any case, if we would like to believe that these macros can be used with non-gcc compilers, we should avoid gratuitous variance in usage patterns. I also went through and manually improved the formatting of a lot of declarations, and got rid of excessively repetitive (and now obsolete anyway) comments informing the reader what pg_attribute_printf is for.	2015-03-26 14:03:25 -04:00
Heikki Linnakangas	d04c8ed904	Add support for index-only scans in GiST. This adds a new GiST opclass method, 'fetch', which is used to reconstruct the original Datum from the value stored in the index. Also, the 'canreturn' index AM interface function gains a new 'attno' argument. That makes it possible to use index-only scans on a multi-column index where some of the opclasses support index-only scans but some do not. This patch adds support in the box and point opclasses. Other opclasses can added later as follow-on patches (btree_gist would be particularly interesting). Anastasia Lubennikova, with additional fixes and modifications by me.	2015-03-26 19:12:00 +02:00
Heikki Linnakangas	8fa393a6d7	Minor cleanup of GiST code, for readability. Remove the gistcentryinit function, inlining the relevant part of it into the only caller.	2015-03-26 19:11:54 +02:00
Tatsuo Ishii	656ea810e5	Make SyncRepWakeQueue to a static function It is only used in src/backend/replication/syncrep.c. Back-patch to all supported branches except 9.1 which declares the function as static.	2015-03-26 10:34:08 +09:00
Andres Freund	83ff1618bc	Centralize definition of integer limits. Several submitted and even committed patches have run into the problem that C89, our baseline, does not provide minimum/maximum values for various integer datatypes. C99's stdint.h does, but we can't rely on it. Several parts of the code defined limits locally, so instead centralize the definitions to c.h. This patch also changes the more obvious usages of literal limit values; there's more places that could be changed, but it's less clear whether it's beneficial to change those. Author: Andrew Gierth Discussion: 87619tc5wc.fsf@news-spur.riddles.org.uk	2015-03-25 22:39:42 +01:00
Alvaro Herrera	bdc3d7fa23	Return ObjectAddress in many ALTER TABLE sub-routines Since commit `a2e35b53c3`, most CREATE and ALTER commands return the ObjectAddress of the affected object. This is useful for event triggers to try to figure out exactly what happened. This patch extends this idea a bit further to cover ALTER TABLE as well: an auxiliary ObjectAddress is returned for each of several subcommands of ALTER TABLE. This makes it possible to decode with precision what happened during execution of any ALTER TABLE command; for instance, which constraint was added by ALTER TABLE ADD CONSTRAINT, or which parent got dropped from the parents list by ALTER TABLE NO INHERIT. As with the previous patch, there is no immediate user-visible change here. This is all really just continuing what `c504513f83` started. Reviewed by Stephen Frost.	2015-03-25 17:17:56 -03:00
Kevin Grittner	2ed5b87f96	Reduce pinning and buffer content locking for btree scans. Even though the main benefit of the Lehman and Yao algorithm for btrees is that no locks need be held between page reads in an index search, we were holding a buffer pin on each leaf page after it was read until we were ready to read the next one. The reason was so that we could treat this as a weak lock to create an "interlock" with vacuum's deletion of heap line pointers, even though our README file pointed out that this was not necessary for a scan using an MVCC snapshot. The main goal of this patch is to reduce the blocking of vacuum processes by in-progress btree index scans (including a cursor which is idle), but the code rearrangement also allows for one less buffer content lock to be taken when a forward scan steps from one page to the next, which results in a small but consistent performance improvement in many workloads. This patch leaves behavior unchanged for some cases, which can be addressed separately so that each case can be evaluated on its own merits. These unchanged cases are when a scan uses a non-MVCC snapshot, an index-only scan, and a scan of a btree index for which modifications are not WAL-logged. If later patches allow all of these cases to drop the buffer pin after reading a leaf page, then the btree vacuum process can be simplified; it will no longer need the "super-exclusive" lock to delete tuples from a page. Reviewed by Heikki Linnakangas and Kyotaro Horiguchi	2015-03-25 14:24:43 -05:00
Alvaro Herrera	8217fb1441	Add OID output argument to DefineTSConfiguration ... which is set to the OID of a copied text search config, whenever the COPY clause is used. This is in the spirit of commit `a2e35b53c3`.	2015-03-25 15:57:08 -03:00
Tom Lane	cb1ca4d800	Allow foreign tables to participate in inheritance. Foreign tables can now be inheritance children, or parents. Much of the system was already ready for this, but we had to fix a few things of course, mostly in the area of planner and executor handling of row locks. As side effects of this, allow foreign tables to have NOT VALID CHECK constraints (and hence to accept ALTER ... VALIDATE CONSTRAINT), and to accept ALTER SET STORAGE and ALTER SET WITH/WITHOUT OIDS. Continuing to disallow these things would've required bizarre and inconsistent special cases in inheritance behavior. Since foreign tables don't enforce CHECK constraints anyway, a NOT VALID one is a complete no-op, but that doesn't mean we shouldn't allow it. And it's possible that some FDWs might have use for SET STORAGE or SET WITH OIDS, though doubtless they will be no-ops for most. An additional change in support of this is that when a ModifyTable node has multiple target tables, they will all now be explicitly identified in EXPLAIN output, for example: Update on pt1 (cost=0.00..321.05 rows=3541 width=46) Update on pt1 Foreign Update on ft1 Foreign Update on ft2 Update on child3 -> Seq Scan on pt1 (cost=0.00..0.00 rows=1 width=46) -> Foreign Scan on ft1 (cost=100.00..148.03 rows=1170 width=46) -> Foreign Scan on ft2 (cost=100.00..148.03 rows=1170 width=46) -> Seq Scan on child3 (cost=0.00..25.00 rows=1200 width=46) This was done mainly to provide an unambiguous place to attach "Remote SQL" fields, but it is useful for inherited updates even when no foreign tables are involved. Shigeru Hanada and Etsuro Fujita, reviewed by Ashutosh Bapat and Kyotaro Horiguchi, some additional hacking by me	2015-03-22 13:53:21 -04:00
Bruce Momjian	1c7087af42	Add TOAST table to pg_shseclabel for long label use Report by Andres Freund	2015-03-21 22:14:49 -04:00
Bruce Momjian	34afbba84e	Use mmap MAP_NOSYNC option to limit shared memory writes mmap() is rarely used for shared memory, but when it is, this option is useful, particularly on the BSDs. Patch by Sean Chittenden	2015-03-21 22:06:19 -04:00
Andres Freund	959277a4f5	Use 128-bit math to accelerate some aggregation functions. On platforms where we support 128bit integers, use them to implement faster transition functions for sum(int8), avg(int8), var_(int2/int4),stdev_(int2/int4). Where not supported continue to use numeric as a transition type. In some synthetic benchmarks this has been shown to provide significant speedups. Bumps catversion. Discussion: 544BB5F1.50709@proxel.se Author: Andreas Karlsson Reviewed-By: Peter Geoghegan, Petr Jelinek, Andres Freund, Oskari Saarenmaa, David Rowley	2015-03-20 10:29:32 +01:00
Andres Freund	8122e1437e	Add, optional, support for 128bit integers. We will, for the foreseeable future, not expose 128 bit datatypes to SQL. But being able to use 128bit math will allow us, in a later patch, to use 128bit accumulators for some aggregates; leading to noticeable speedups over using numeric. So far we only detect a gcc/clang extension that supports 128bit math, but no 128bit literals, and no *printf support. We might want to expand this in the future to further compilers; if there are any that that provide similar support. Discussion: 544BB5F1.50709@proxel.se Author: Andreas Karlsson, with significant editorializing by me Reviewed-By: Peter Geoghegan, Oskari Saarenmaa	2015-03-20 10:26:17 +01:00
Robert Haas	12968cf408	Add flags argument to dsm_create. Right now, there's only one flag, DSM_CREATE_NULL_IF_MAXSEGMENTS, which suppresses the error that would normally be thrown when the maximum number of segments already exists, instead returning NULL. It might be useful to add more flags in the future, such as one to ignore allocation errors, but I haven't done that here.	2015-03-19 13:03:03 -04:00
Alvaro Herrera	13dbc7a824	array_offset() and array_offsets() These functions return the offset position or positions of a value in an array. Author: Pavel Stěhule Reviewed by: Jim Nasby	2015-03-18 16:01:34 -03:00
Alvaro Herrera	0d83138974	Rationalize vacuuming options and parameters We were involving the parser too much in setting up initial vacuuming parameters. This patch moves that responsibility elsewhere to simplify code, and also to make future additions easier. To do this, create a new struct VacuumParams which is filled just prior to vacuum execution, instead of at parse time; for user-invoked vacuuming this is set up in a new function ExecVacuum, while autovacuum sets it up by itself. While at it, add a new member VACOPT_SKIPTOAST to enum VacuumOption, only set by autovacuum, which is used to disable vacuuming of the toast table instead of the old do_toast parameter; this relieves the argument list of vacuum() and some callees a bit. This partially makes up for having added more arguments in an effort to avoid having autovacuum from constructing a VacuumStmt parse node. Author: Michael Paquier. Some tweaks by Álvaro Reviewed by: Robert Haas, Stephen Frost, Álvaro Herrera	2015-03-18 11:52:33 -03:00
Alvaro Herrera	a61fd5334e	Support opfamily members in get_object_address In the spirit of `890192e99a` and `4464303405`: have get_object_address understand individual pg_amop and pg_amproc objects. There is no way to refer to such objects directly in the grammar -- rather, they are almost always considered an integral part of the opfamily that contains them. (The only case that deals with them individually is ALTER OPERATOR FAMILY ADD/DROP, which carries the opfamily address separately and thus does not need it to be part of each added/dropped element's address.) In event triggers it becomes possible to become involved with individual amop/amproc elements, and this commit enables pg_get_object_address to do so as well. To make the overall coding simpler, this commit also slightly changes the get_object_address representation for opclasses and opfamilies: instead of having the AM name in the objargs array, I moved it as the first element of the objnames array. This enables the new code to use objargs for the type names used by pg_amop and pg_amproc. Reviewed by: Stephen Frost	2015-03-16 12:06:34 -03:00
Tom Lane	7b8b8a4331	Improve representation of PlanRowMark. This patch fixes two inadequacies of the PlanRowMark representation. First, that the original LockingClauseStrength isn't stored (and cannot be inferred for foreign tables, which always get ROW_MARK_COPY). Since some PlanRowMarks are created out of whole cloth and don't actually have an ancestral RowMarkClause, this requires adding a dummy LCS_NONE value to enum LockingClauseStrength, which is fairly annoying but the alternatives seem worse. This fix allows getting rid of the use of get_parse_rowmark() in FDWs (as per the discussion around commits `462bd95705` and `8ec8760fc8`), and it simplifies some things elsewhere. Second, that the representation assumed that all child tables in an inheritance hierarchy would use the same RowMarkType. That's true today but will soon not be true. We add an "allMarkTypes" field that identifies the union of mark types used in all a parent table's children, and use that where appropriate (currently, only in preprocess_targetlist()). In passing fix a couple of minor infelicities left over from the SKIP LOCKED patch, notably that _outPlanRowMark still thought waitPolicy is a bool. Catversion bump is required because the numeric values of enum LockingClauseStrength can appear in on-disk rules. Extracted from a much larger patch to support foreign table inheritance; it seemed worth breaking this out, since it's a separable concern. Shigeru Hanada and Etsuro Fujita, somewhat modified by me	2015-03-15 18:41:47 -04:00
Tom Lane	9fac5fd741	Move LockClauseStrength, LockWaitPolicy into new file nodes/lockoptions.h. Commit `df630b0dd5` moved enum LockWaitPolicy into its very own header file utils/lockwaitpolicy.h, which does not seem like a great idea from here. First, it's still a node-related declaration, and second, a file named like that can never sensibly be used for anything else. I do not think we want to encourage a one-typedef-per-header-file approach. The upcoming foreign table inheritance patch was doubling down on this bad idea by moving enum LockClauseStrength into its own can-never-be-used-for-anything-else file. Instead, let's put them both in a file named nodes/lockoptions.h. (They do seem to need a separate header file because we need them in both parsenodes.h and plannodes.h, and we don't want either of those including the other. Past practice might suggest adding them to nodes/nodes.h, but they don't seem sufficiently globally useful to justify that.) Committed separately since there's no functional change here, just some header-file refactoring.	2015-03-15 15:19:04 -04:00
Andres Freund	4f1b890b13	Merge the various forms of transaction commit & abort records. Since `465883b0a` two versions of commit records have existed. A compact version that was used when no cache invalidations, smgr unlinks and similar were needed, and a full version that could deal with all that. Additionally the full version was embedded into twophase commit records. That resulted in a measurable reduction in the size of the logged WAL in some workloads. But more recently additions like logical decoding, which e.g. needs information about the database something was executed on, made it applicable in fewer situations. The static split generally made it hard to expand the commit record, because concerns over the size made it hard to add anything to the compact version. Additionally it's not particularly pretty to have twophase.c insert RM_XACT records. Rejigger things so that the commit and abort records only have one form each, including the twophase equivalents. The presence of the various optional (in the sense of not being in every record) pieces is indicated by a bits in the 'xinfo' flag. That flag previously was not included in compact commit records. To prevent an increase in size due to its presence, it's only included if necessary; signalled by a bit in the xl_info bits available for xact.c, similar to heapam.c's XLOG_HEAP_OPMASK/XLOG_HEAP_INIT_PAGE. Twophase commit/aborts are now the same as their normal counterparts. The original transaction's xid is included in an optional data field. This means that commit records generally are smaller, except in the case of a transaction with subtransactions, but no other special cases; the increase there is four bytes, which seems acceptable given that the more common case of not having subtransactions shrank. The savings are especially measurable for twophase commits, which previously always used the full version; but will in practice only infrequently have required that. The motivation for this work are not the space savings and and deduplication though; it's that it makes it easier to extend commit records with additional information. That's just a few lines of code now; without impacting the common case where that information is not needed. Discussion: 20150220152150.GD4149@awork2.anarazel.de, 235610.92468.qm%40web29004.mail.ird.yahoo.com Reviewed-By: Heikki Linnakangas, Simon Riggs	2015-03-15 17:37:07 +01:00
Tom Lane	91f4a5a976	Build src/port/dirmod.c only on Windows. Since commit `ba7c5975ad`, port/dirmod.c has contained only Windows-specific functions. Most platforms don't seem to mind uselessly building an empty file, but OS X for one issues warnings. Hence, treat dirmod.c as a Windows-specific file selected by configure rather than one that's always built. We can revert this change if dirmod.c ever gains any non-Windows functionality again. Back-patch to 9.4 where the mentioned commit appeared.	2015-03-14 14:08:45 -04:00
Tom Lane	f4abd0241d	Support flattening of empty-FROM subqueries and one-row VALUES tables. We can't handle this in the general case due to limitations of the planner's data representations; but we can allow it in many useful cases, by being careful to flatten only when we are pulling a single-row subquery up into a FROM (or, equivalently, inner JOIN) node that will still have at least one remaining relation child. Per discussion of an example from Kyotaro Horiguchi.	2015-03-11 23:18:03 -04:00
Tom Lane	b55722692b	Improve planner's cost estimation in the presence of semijoins. If we have a semijoin, say SELECT * FROM x WHERE x1 IN (SELECT y1 FROM y) and we're estimating the cost of a parameterized indexscan on x, the number of repetitions of the indexscan should not be taken as the size of y; it'll really only be the number of distinct values of y1, because the only valid plan with y on the outside of a nestloop would require y to be unique-ified before joining it to x. Most of the time this doesn't make that much difference, but sometimes it can lead to drastically underestimating the cost of the indexscan and hence choosing a bad plan, as pointed out by David Kubečka. Fixing this is a bit difficult because parameterized indexscans are costed out quite early in the planning process, before we have the information that would be needed to call estimate_num_groups() and thereby estimate the number of distinct values of the join column(s). However we can move the code that extracts a semijoin RHS's unique-ification columns, so that it's done in initsplan.c rather than on-the-fly in create_unique_path(). That shouldn't make any difference speed-wise and it's really a bit cleaner too. The other bit of information we need is the size of the semijoin RHS, which is easy if it's a single relation (we make those estimates before considering indexscan costs) but problematic if it's a join relation. The solution adopted here is just to use the product of the sizes of the join component rels. That will generally be an overestimate, but since estimate_num_groups() only uses this input as a clamp, an overestimate shouldn't hurt us too badly. In any case we don't allow this new logic to produce a value larger than we would have chosen before, so that at worst an overestimate leaves us no wiser than we were before.	2015-03-11 21:21:00 -04:00
Alvaro Herrera	4464303405	Support default ACLs in get_object_address In the spirit of `890192e99a`, this time add support for the things living in the pg_default_acl catalog. These are not really "objects", but they show up as such in event triggers. There is no "DROP DEFAULT PRIVILEGES" or similar command, so it doesn't look like the new representation given would be useful anywhere else, so I didn't try to use it outside objectaddress.c. (That might be a bug in itself, but that would be material for another commit.) Reviewed by Stephen Frost.	2015-03-11 19:23:47 -03:00
Alvaro Herrera	890192e99a	Support user mappings in get_object_address Since commit `72dd233d3e` we were trying to obtain object addressing information in sql_drop event triggers, but that caused failures when the drops involved user mappings. This addition enables that to work again. Naturally, pg_get_object_address can work with these objects now, too. I toyed with the idea of removing DropUserMappingStmt as a node and using DropStmt instead in the DropUserMappingStmt grammar production, but that didn't go very well: for one thing the messages thrown by the specific code are specialized (you get "server not found" if you specify the wrong server, instead of a generic "user mapping for ... not found" which you'd get it we were to merge this with RemoveObjects --- unless we added even more special cases). For another thing, it would require to pass RoleSpec nodes through the objname/objargs representation used by RemoveObjects, which works in isolation, but gets messy when pg_get_object_address is involved. So I dropped this part for now. Reviewed by Stephen Frost.	2015-03-11 17:04:27 -03:00
Tom Lane	c6b3c939b7	Make operator precedence follow the SQL standard more closely. While the SQL standard is pretty vague on the overall topic of operator precedence (because it never presents a unified BNF for all expressions), it does seem reasonable to conclude from the spec for <boolean value expression> that OR has the lowest precedence, then AND, then NOT, then IS tests, then the six standard comparison operators, then everything else (since any non-boolean operator in a WHERE clause would need to be an argument of one of these). We were only sort of on board with that: most notably, while "<" ">" and "=" had properly low precedence, "<=" ">=" and "<>" were treated as generic operators and so had significantly higher precedence. And "IS" tests were even higher precedence than those, which is very clearly wrong per spec. Another problem was that "foo NOT SOMETHING bar" constructs, such as "x NOT LIKE y", were treated inconsistently because of a bison implementation artifact: they had the documented precedence with respect to operators to their right, but behaved like NOT (i.e., very low priority) with respect to operators to their left. Fixing the precedence issues is just a small matter of rearranging the precedence declarations in gram.y, except for the NOT problem, which requires adding an additional lookahead case in base_yylex() so that we can attach a different token precedence to NOT LIKE and allied two-word operators. The bulk of this patch is not the bug fix per se, but adding logic to parse_expr.c to allow giving warnings if an expression has changed meaning because of these precedence changes. These warnings are off by default and are enabled by the new GUC operator_precedence_warning. It's believed that very few applications will be affected by these changes, but it was agreed that a warning mechanism is essential to help debug any that are.	2015-03-11 13:22:52 -04:00
Robert Haas	e529cd4ffa	Suggest to the user the column they may have meant to reference. Error messages informing the user that no such column exists can sometimes provoke a perplexed response. This often happens due to a subtle typo in the column name or, perhaps less likely, in the alias name. To speed discovery of what the real issue is in such cases, we'll now search the range table for approximate matches. If there are one or two such matches that are good enough to think that they might be what the user intended to type, and better than all other approximate matches, we'll issue a hint suggesting that the user might have intended to reference those columns. Peter Geoghegan and Robert Haas	2015-03-11 10:44:04 -04:00
Andres Freund	bbfd7edae5	Add macros wrapping all usage of gcc's __attribute__. Until now __attribute__() was defined to be empty for all compilers but gcc. That's problematic because it prevents using it in other compilers; which is necessary e.g. for atomics portability. It's also just generally dubious to do so in a header as widely included as c.h. Instead add pg_attribute_format_arg, pg_attribute_printf, pg_attribute_noreturn macros which are implemented in the compilers that understand them. Also add pg_attribute_noreturn and pg_attribute_packed, but don't provide fallbacks, since they can affect functionality. This means that external code that, possibly unwittingly, relied on __attribute__ defined to be empty on !gcc compilers may now run into warnings or errors on those compilers. But there shouldn't be many occurances of that and it's hard to work around... Discussion: 54B58BA3.8040302@ohmu.fi Author: Oskari Saarenmaa, with some minor changes by me.	2015-03-11 14:30:01 +01:00
Fujii Masao	57aa5b2bb1	Add GUC to enable compression of full page images stored in WAL. When newly-added GUC parameter, wal_compression, is on, the PostgreSQL server compresses a full page image written to WAL when full_page_writes is on or during a base backup. A compressed page image will be decompressed during WAL replay. Turning this parameter on can reduce the WAL volume without increasing the risk of unrecoverable data corruption, but at the cost of some extra CPU spent on the compression during WAL logging and on the decompression during WAL replay. This commit changes the WAL format (so bumping WAL version number) so that the one-byte flag indicating whether a full page image is compressed or not is included in its header information. This means that the commit increases the WAL volume one-byte per a full page image even if WAL compression is not used at all. We can save that one-byte by borrowing one-bit from the existing field like hole_offset in the header and using it as the flag, for example. But which would reduce the code readability and the extensibility of the feature. Per discussion, it's not worth paying those prices to save only one-byte, so we decided to add the one-byte flag to the header. This commit doesn't introduce any new compression algorithm like lz4. Currently a full page image is compressed using the existing PGLZ algorithm. Per discussion, we decided to use it at least in the first version of the feature because there were no performance reports showing that its compression ratio is unacceptably lower than that of other algorithm. Of course, in the future, it's worth considering the support of other compression algorithm for the better compression. Rahila Syed and Michael Paquier, reviewed in various versions by myself, Andres Freund, Robert Haas, Abhijit Menon-Sen and many others.	2015-03-11 15:52:24 +09:00
Tom Lane	2fbb286647	Clean up the mess from => patch. Commit `865f14a2d3` was quite a few bricks shy of a load: psql, ecpg, and plpgsql were all left out-of-step with the core lexer. Of these only the last was likely to be a fatal problem; but still, a minimal amount of grepping, or even just reading the comments adjacent to the places that were changed, would have found the other places that needed to be changed.	2015-03-10 11:48:38 -04:00
Alvaro Herrera	e491bd2ee3	Move BRIN page type to page's last two bytes ... which is the usual convention among AMs, so that pg_filedump and similar utilities can tell apart pages of different AMs. It was also the intent of the original code, but I failed to realize that alignment considerations would move the whole thing to the previous-to-last word in the page. The new definition of the associated macro makes surrounding code a bit leaner, too. Per note from Heikki at http://www.postgresql.org/message-id/546A16EF.9070005@vmware.com	2015-03-10 12:27:15 -03:00
Alvaro Herrera	4f3924d9cd	Keep CommitTs module in sync in standby and master We allow this module to be turned off on restarts, so a restart time check is enough to activate or deactivate the module; however, if there is a standby replaying WAL emitted from a master which is restarted, but the standby isn't, the state in the standby becomes inconsistent and can easily be crashed. Fix by activating and deactivating the module during WAL replay on parameter change as well as on system start. Problem reported by Fujii Masao in http://www.postgresql.org/message-id/CAHGQGwFhJ3CnHo1CELEfay18yg_RA-XZT-7D8NuWUoYSZ90r4Q@mail.gmail.com Author: Petr Jelínek	2015-03-09 17:44:00 -03:00
Alvaro Herrera	31eae6028e	Allow CURRENT/SESSION_USER to be used in certain commands Commands such as ALTER USER, ALTER GROUP, ALTER ROLE, GRANT, and the various ALTER OBJECT / OWNER TO, as well as ad-hoc clauses related to roles such as the AUTHORIZATION clause of CREATE SCHEMA, the FOR clause of CREATE USER MAPPING, and the FOR ROLE clause of ALTER DEFAULT PRIVILEGES can now take the keywords CURRENT_USER and SESSION_USER as user specifiers in place of an explicit user name. This commit also fixes some quite ugly handling of special standards- mandated syntax in CREATE USER MAPPING, which in particular would fail to work in presence of a role named "current_user". The special role specifiers PUBLIC and NONE also have more consistent handling now. Also take the opportunity to add location tracking to user specifiers. Authors: Kyotaro Horiguchi. Heavily reworked by Álvaro Herrera. Reviewed by: Rushabh Lathia, Adam Brightwell, Marti Raudsepp.	2015-03-09 15:41:54 -03:00
Heikki Linnakangas	f1fd515b39	Move WAL-related definitions from dbcommands.h to separate header file. This makes it easier to write frontend programs that needs to understand the WAL record format of CREATE/DROP DATABASE. dbcommands.h cannot easily be #included in a frontend program, because it pulls in other header files that need backend stuff, but the new dbcommands_xlog.h header file has fewer dependencies.	2015-03-09 15:50:49 +02:00
Fujii Masao	828599acec	Fix typo in comment.	2015-03-09 14:39:46 +09:00
Tom Lane	01cca2c1b1	Remove struct PQArgBlock from server-side header libpq/libpq.h. This struct is purely a client-side artifact. Perhaps there was once reason for the server to know it, but any such reason is lost in the mists of time. We certainly don't need two independent declarations of it.	2015-03-08 13:42:59 -04:00

1 2 3 4 5 ...

6940 Commits