postgresql

Commit Graph

Author	SHA1	Message	Date
Tom Lane	851a26e266	While vacuuming a large table, update upper-level FSM data every so often. VACUUM updates leaf-level FSM entries immediately after cleaning the corresponding heap blocks. fsmpage.c updates the intra-page search trees on the leaf-level FSM pages when this happens, but it does not touch the upper-level FSM pages, so that the released space might not actually be findable by searchers. Previously, updating the upper-level pages happened only at the conclusion of the VACUUM run, in a single FreeSpaceMapVacuum() call. This is bad because the VACUUM might get canceled before ever reaching that point, so that from the point of view of searchers no space has been freed at all, leading to table bloat. We can improve matters by updating the upper pages immediately after each cycle of index-cleaning and heap-cleaning, processing just the FSM pages corresponding to the range of heap blocks we have now fully cleaned. This adds a small amount of extra work, since the FSM pages leading down to each range boundary will be touched twice, but it's pretty negligible compared to everything else going on in a large VACUUM. If there are no indexes, VACUUM doesn't work in cycles but just cleans each heap page on first visit. In that case we just arbitrarily update upper FSM pages after each 8GB of heap. That maintains the goal of not letting all this work slide until the very end, and it doesn't seem worth expending extra complexity on a case that so seldom occurs in practice. In either case, the FSM is fully up to date before any attempt is made to truncate the relation, so that the most likely scenario for VACUUM cancellation no longer results in out-of-date upper FSM pages. When we do successfully truncate, adjusting the FSM to reflect that is now fully handled within FreeSpaceMapTruncateRel. Claudio Freire, reviewed by Masahiko Sawada and Jing Wang, some additional tweaks by me Discussion: https://postgr.es/m/CAGTBQpYR0uJCNTt3M5GOzBRHo+-GccNO1nCaQ8yEJmZKSW5q1A@mail.gmail.com	2018-03-29 11:29:54 -04:00
Teodor Sigaev	c0cbe00fee	Add casts from jsonb Add explicit cast from scalar jsonb to all numeric and bool types. It would be better to have cast from scalar jsonb to text too but there is already a cast from jsonb to text as just text representation of json. There is no way to have two different casts for the same type's pair. Bump catalog version Author: Anastasia Lubennikova with editorization by Nikita Glukhov and me Review by: Aleksander Alekseev, Nikita Glukhov, Darafei Praliaskouski Discussion: https://www.postgresql.org/message-id/flat/0154d35a-24ae-f063-5273-9ffcdf1c7f2e@postgrespro.ru	2018-03-29 16:33:56 +03:00
Magnus Hagander	669820a3d9	Fix typo in comment Arthur Zakirov, confirmed by Thomas Munro	2018-03-29 11:42:32 +02:00
Peter Eisentraut	056a5a3f63	Allow committing inside cursor loop Previously, committing or aborting inside a cursor loop was prohibited because that would close and remove the cursor. To allow that, automatically convert such cursors to holdable cursors so they survive commits or rollbacks. Portals now have a new state "auto-held", which means they have been converted automatically from pinned. An auto-held portal is kept on transaction commit or rollback, but is still removed when returning to the main loop on error. This supports all languages that have cursor loop constructs: PL/pgSQL, PL/Python, PL/Perl. Reviewed-by: Ildus Kurbangaliev <i.kurbangaliev@postgrespro.ru>	2018-03-28 19:03:26 -04:00
Bruce Momjian	a2894cce54	C comment: fix typo, log -> lag Reported-by: atorikoshi Discussion: https://postgr.es/m/b61f2ab9-c0e0-d33d-ce3f-42a228025681@lab.ntt.co.jp Author: atorikoshi	2018-03-28 18:23:47 -04:00
Andres Freund	a0a08c1d85	Fix mistakes in the just added JIT docs. Reported-By: Lukas Fittl Author: Andres Freund	2018-03-28 15:07:08 -07:00
Andres Freund	e6c039d13e	Add documentation for the JIT feature. As promised in earlier commits, this adds documentation about the new build options, the new GUCs, about the planner logic when JIT is used, and the benefits of JIT in general. Also adds a more implementation oriented README. I'm sure we're going to want to expand this further, but I think this is a reasonable start. Author: Andres Freund, with contributions by Thomas Munro Reviewed-By: Thomas Munro Discussion: https://postgr.es/m/20170901064131.tazjxwus3k2w3ybh@alap3.anarazel.de	2018-03-28 14:22:42 -07:00
Andres Freund	1f0c6a9e7d	Add EXPLAIN support for JIT. This just shows a few details about JITing, e.g. how many functions have been JITed, and how long that took. To avoid noise in regression tests with functions sometimes being JITed in --with-llvm builds, disable display when COSTS OFF is specified. Author: Andres Freund Discussion: https://postgr.es/m/20170901064131.tazjxwus3k2w3ybh@alap3.anarazel.de	2018-03-28 13:26:51 -07:00
Andres Freund	9370462e9a	Add inlining support to LLVM JIT provider. This provides infrastructure to allow JITed code to inline code implemented in C. This e.g. can be postgres internal functions or extension code. This already speeds up long running queries, by allowing the LLVM optimizer to optimize across function boundaries. The optimization potential currently doesn't reach its full potential because LLVM cannot optimize the FunctionCallInfoData argument fully away, because it's allocated on the heap rather than the stack. Fixing that is beyond what's realistic for v11. To be able to do that, use CLANG to convert C code to LLVM bitcode, and have LLVM build a summary for it. That bitcode can then be used to to inline functions at runtime. For that the bitcode needs to be installed. Postgres bitcode goes into $pkglibdir/bitcode/postgres, extensions go into equivalent directories. PGXS has been modified so that happens automatically if postgres has been compiled with LLVM support. Currently this isn't the fastest inline implementation, modules are reloaded from disk during inlining. That's to work around an apparent LLVM bug, triggering an apparently spurious error in LLVM assertion enabled builds. Once that is resolved we can remove the superfluous read from disk. Docs will follow in a later commit containing docs for the whole JIT feature. Author: Andres Freund Discussion: https://postgr.es/m/20170901064131.tazjxwus3k2w3ybh@alap3.anarazel.de	2018-03-28 13:19:08 -07:00
Fujii Masao	266b6acb31	Make pg_rewind skip files and directories that are removed during server start. The target cluster that was rewound needs to perform recovery from the checkpoint created at failover, which leads it to remove or recreate some files and directories that may have been copied from the source cluster. So pg_rewind can skip synchronizing such files and directories, and which reduces the amount of data transferred during a rewind without changing the usefulness of the operation. Author: Michael Paquier Reviewed-by: Anastasia Lubennikova, Stephen Frost and me Discussion: https://postgr.es/m/20180205071022.GA17337@paquier.xyz	2018-03-29 04:56:52 +09:00
Peter Eisentraut	d92bc83c48	PL/pgSQL: Nested CALL with transactions So far, a nested CALL or DO in PL/pgSQL would not establish a context where transaction control statements were allowed. This fixes that by handling CALL and DO specially in PL/pgSQL, passing the atomic/nonatomic execution context through and doing the required management around transaction boundaries. Reviewed-by: Tomas Vondra <tomas.vondra@2ndquadrant.com>	2018-03-28 13:31:27 -04:00
Tom Lane	c2d4eb1b1f	Fix actual and potential double-frees around tuplesort usage. tuplesort_gettupleslot() passed back tuples allocated in the tuplesort's own memory context, even when the caller was responsible to free them. This created a double-free hazard, because some callers might destroy the tuplesort object (via tuplesort_end) before trying to clean up the last returned tuple. To avoid this, change the API to specify that the tuple is allocated in the caller's memory context. v10 and HEAD already did things that way, but in 9.5 and 9.6 this is a live bug that can demonstrably cause crashes with some grouping-set usages. In 9.5 and 9.6, this requires doing an extra tuple copy in some cases, which is unfortunate. But the amount of refactoring needed to avoid it seems excessive for a back-patched change, especially since the cases where an extra copy happens are less performance-critical. Likewise change tuplesort_getdatum() to return pass-by-reference Datums in the caller's context not the tuplesort's context. There seem to be no live bugs among its callers, but clearly the same sort of situation could happen in future. For other tuplesort fetch routines, continue to allocate the memory in the tuplesort's context. This is a little inconsistent with what we now do for tuplesort_gettupleslot() and tuplesort_getdatum(), but that's preferable to adding new copy overhead in the back branches where it's clearly unnecessary. These other fetch routines provide the weakest possible guarantees about tuple memory lifespan from v10 on, anyway, so this actually seems more consistent overall. Adjust relevant comments to reflect these API redefinitions. Arguably, we should change the pre-9.5 branches as well, but since there are no known failure cases there, it seems not worth the risk. Peter Geoghegan, per report from Bernd Helmle. Reviewed by Kyotaro Horiguchi; thanks also to Andreas Seltenreich for extracting a self-contained test case. Discussion: https://postgr.es/m/1512661638.9720.34.camel@oopsware.de	2018-03-28 13:26:57 -04:00
Simon Riggs	1eb6d6527a	Store 2PC GID in commit/abort WAL recs for logical decoding Store GID of 2PC in commit/abort WAL records when wal_level = logical. This allows logical decoding to send the SAME gid to subscribers across restarts of logical replication. Track relica origin replay progress for 2PC. (Edited from patch 0003 in the logical decoding 2PC series.) Authors: Nikhil Sontakke, Stas Kelvich Reviewed-by: Simon Riggs, Andres Freund	2018-03-28 17:42:50 +01:00
Andres Freund	f4f5845b31	Quick adaption of JIT tuple deforming to the fast default patch. Instead using memset to set tts_isnull, call the new slot_getmissingattrs(). Also fix a bug (= instead of >=) in the code generation. Normally = is correct, but when repeatedly deforming fields not in a tuple (e.g. deform up to natts + 1 and then natts + 2) >= is needed. Discussion: https://postgr.es/m/20180328010053.i2qvsuuusst4lgmc@alap3.anarazel.de	2018-03-27 21:03:10 -07:00
Andrew Dunstan	16828d5c02	Fast ALTER TABLE ADD COLUMN with a non-NULL default Currently adding a column to a table with a non-NULL default results in a rewrite of the table. For large tables this can be both expensive and disruptive. This patch removes the need for the rewrite as long as the default value is not volatile. The default expression is evaluated at the time of the ALTER TABLE and the result stored in a new column (attmissingval) in pg_attribute, and a new column (atthasmissing) is set to true. Any existing row when fetched will be supplied with the attmissingval. New rows will have the supplied value or the default and so will never need the attmissingval. Any time the table is rewritten all the atthasmissing and attmissingval settings for the attributes are cleared, as they are no longer needed. The most visible code change from this is in heap_attisnull, which acquires a third TupleDesc argument, allowing it to detect a missing value if there is one. In many cases where it is known that there will not be any (e.g. catalog relations) NULL can be passed for this argument. Andrew Dunstan, heavily modified from an original patch from Serge Rielau. Reviewed by Tom Lane, Andres Freund, Tomas Vondra and David Rowley. Discussion: https://postgr.es/m/31e2e921-7002-4c27-59f5-51f08404c858@2ndQuadrant.com	2018-03-28 10:43:52 +10:30
Tom Lane	442accc3fe	Allow memory contexts to have both fixed and variable ident strings. Originally, we treated memory context names as potentially variable in all cases, and therefore always copied them into the context header. Commit `9fa6f00b1` rethought this a little bit and invented a distinction between fixed and variable names, skipping the copy step for the former. But we can make things both simpler and more useful by instead allowing there to be two parts to a context's identification, a fixed "name" and an optional, variable "ident". The name supplied in the context create call is now required to be a compile-time-constant string in all cases, as it is never copied but just pointed to. The "ident" string, if wanted, is supplied later. This is needed because typically we want the ident to be stored inside the context so that it's cleaned up automatically on context deletion; that means it has to be copied into the context before we can set the pointer. The cost of this approach is basically just an additional pointer field in struct MemoryContextData, which isn't much overhead, and is bought back entirely in the AllocSet case by not needing a headerSize field anymore, since we no longer have to cope with variable header length. In addition, we can simplify the internal interfaces for memory context creation still further, saving a few cycles there. And it's no longer true that a custom identifier disqualifies a context from participating in aset.c's freelist scheme, so possibly there's some win on that end. All the places that were using non-compile-time-constant context names are adjusted to put the variable info into the "ident" instead. This allows more effective identification of those contexts in many cases; for example, subsidary contexts of relcache entries are now identified by both type (e.g. "index info") and relname, where before you got only one or the other. Contexts associated with PL function cache entries are now identified more fully and uniformly, too. I also arranged for plancache contexts to use the query source string as their identifier. This is basically free for CachedPlanSources, as they contained a copy of that string already. We pay an extra pstrdup to do it for CachedPlans. That could perhaps be avoided, but it would make things more fragile (since the CachedPlanSource is sometimes destroyed first). I suspect future improvements in error reporting will require CachedPlans to have a copy of that string anyway, so it's not clear that it's worth moving mountains to avoid it now. This also changes the APIs for context statistics routines so that the context-specific routines no longer assume that output goes straight to stderr, nor do they know all details of the output format. This is useful immediately to reduce code duplication, and it also allows for external code to do something with stats output that's different from printing to stderr. The reason for pushing this now rather than waiting for v12 is that it rethinks some of the API changes made by commit `9fa6f00b1`. Seems better for extension authors to endure just one round of API changes not two. Discussion: https://postgr.es/m/CAB=Je-FdtmFZ9y9REHD7VsSrnCkiBhsA4mdsLKSPauwXtQBeNA@mail.gmail.com	2018-03-27 16:46:51 -04:00
Simon Riggs	c203d6cf81	Allow HOT updates for some expression indexes If the value of an index expression is unchanged after UPDATE, allow HOT updates where previously we disallowed them, giving a significant performance boost in those cases. Particularly useful for indexes such as JSON->>field where the JSON value changes but the indexed value does not. Submitted as "surjective indexes" patch, now enabled by use of new "recheck_on_update" parameter. Author: Konstantin Knizhnik Reviewer: Simon Riggs, with much wordsmithing and some cleanup	2018-03-27 19:57:02 +01:00
Teodor Sigaev	920a5e500a	Skip temp tables from basebackup. Do not store temp tables in basebackup, they will not be visible anyway, so, there are not reasons to store them. Author: David Steel Reviewed by: me Discussion: https://www.postgresql.org/message-id/flat/5ea4d26a-a453-c1b7-eff9-5a3ef8f8aceb@pgmasters.net	2018-03-27 16:14:40 +03:00
Teodor Sigaev	3ad55863e9	Add predicate locking for GiST Add page-level predicate locking, due to gist's code organization, patch seems close to trivial: add check before page changing, add predicate lock before page scanning. Although choosing right place to check is not simple: it should not be called during index build, it should support insertion of new downlink and so on. Author: Shubham Barai with editorization by me and Alexander Korotkov Reviewed by: Alexander Korotkov, Andrey Borodin, me Discussion: https://www.postgresql.org/message-id/flat/CALxAEPtdcANpw5ePU3LvnTP8HCENFw6wygupQAyNBgD-sG3h0g@mail.gmail.com	2018-03-27 15:43:19 +03:00
Andres Freund	4b9094eb6e	Adapt to LLVM 7+ Orc API changes. This is mostly done to be able to validate features and fixes submitted to LLVM. Given the size of these changes that seems acceptable. Author: Andres Freund	2018-03-26 16:04:53 -07:00
Andres Freund	071371bc43	LLVMJIT: Free created module in LLVM < 5. Due to the differing APIs between versions, I forgot to deallocate the generated module in older LLVM versions, leading to a memory leak. Author: Andres Freund	2018-03-26 16:04:39 -07:00
Andres Freund	96b5eac918	Correct some typos in the new JIT code. Author: Thomas Munro	2018-03-26 12:58:17 -07:00
Andres Freund	32af96b2b1	JIT tuple deforming in LLVM JIT provider. Performing JIT compilation for deforming gains performance benefits over unJITed deforming from compile-time knowledge of the tuple descriptor. Fixed column widths, NOT NULLness, etc can be taken advantage of. Right now the JITed deforming is only used when deforming tuples as part of expression evaluation (and obviously only if the descriptor is known). It's likely to be beneficial in other cases, too. By default tuple deforming is JITed whenever an expression is JIT compiled. There's a separate boolean GUC controlling it, but that's expected to be primarily useful for development and benchmarking. Docs will follow in a later commit containing docs for the whole JIT feature. Author: Andres Freund Discussion: https://postgr.es/m/20170901064131.tazjxwus3k2w3ybh@alap3.anarazel.de	2018-03-26 12:57:19 -07:00
Alvaro Herrera	530bcf7581	Fix thinko in comment The listed numbers disagreed with the ones being used in the symbols; but instead of just fixing the numbers in the comment, use the symbolic name instead, which seems clearer. This has been wrong all along, so apply back to 9.5 where BRIN was introduced. Reported-by: Tomas Vondra Discussion: https://postgr.es/m/5ff514f2-8b1e-6366-b11c-8e2ed442562d@2ndquadrant.com	2018-03-26 12:03:42 -03:00
Alvaro Herrera	555ee77a96	Handle INSERT .. ON CONFLICT with partitioned tables Commit `eb7ed3f306` enabled unique constraints on partitioned tables, but one thing that was not working properly is INSERT/ON CONFLICT. This commit introduces a new node keeps state related to the ON CONFLICT clause per partition, and fills it when that partition is about to be used for tuple routing. Author: Amit Langote, Álvaro Herrera Reviewed-by: Etsuro Fujita, Pavan Deolasee Discussion: https://postgr.es/m/20180228004602.cwdyralmg5ejdqkq@alvherre.pgsql	2018-03-26 10:43:54 -03:00
Alvaro Herrera	1b89c2188b	Fix typo	2018-03-26 09:56:41 -03:00
Andrew Dunstan	2b27273435	Optimize btree insertions for common case of increasing values Remember the last page of an index insert if it's the rightmost leaf page. If the next entry belongs on and can fit in the remembered page, insert the new entry there as long as we can get a lock on the page. Otherwise, fall back on the more expensive method of searching for the right place to insert the entry. This provides a performance improvement for the common case where an index entry is for monotonically increasing or nearly monotonically increasing value such as an identity field or a current timestamp. Pavan Deolasee Reviewed by Claudio Freire, Simon Riggs and Peter Geoghegan Discussion: https://postgr.es/m/CABOikdM9DrupjyKZZFM5k8-0RCDs1wk6JzEkg7UgSW6QzOwMZw@mail.gmail.com	2018-03-26 22:39:24 +10:30
Tom Lane	d0c0c89453	Fix unsafe extraction of the OID part of a relation filename. Commit `8694cc96b` did this randomly differently from other callers of parse_filename_for_nontemp_relation(). Perhaps unsurprisingly, the randomly different way is wrong; it fails to ensure the extracted string is null-terminated. Per buildfarm member skink. Discussion: https://postgr.es/m/14453.1522001792@sss.pgh.pa.us	2018-03-25 15:15:40 -04:00
Tom Lane	3a2cb59887	Remove useless if-test. Coverity complained that this check is pointless, and it's right. There is no case where we'd call ExecutorStart with a null plannedstmt, and if we did, it'd have crashed before here. Thinko in commit `cc415a56d`.	2018-03-25 14:54:16 -04:00
Peter Eisentraut	52f3a9d6a3	Small refactoring Put the "atomic" argument of ExecuteDoStmt() and ExecuteCallStmt() into a variable instead of repeating the formula.	2018-03-23 17:18:22 -04:00
Tom Lane	4b538727e2	Fix make rules that generate multiple output files. For years, our makefiles have correctly observed that "there is no correct way to write a rule that generates two files". However, what we did is to provide empty rules that "generate" the secondary output files from the primary one, and that's not right either. Depending on the details of the creating process, the primary file might end up timestamped later than one or more secondary files, causing subsequent make runs to consider the secondary file(s) out of date. That's harmless in a plain build, since make will just re-execute the empty rule and nothing happens. But it's fatal in a VPATH build, since make will expect the secondary file to be rebuilt in the build directory. This would manifest as "file not found" failures during VPATH builds from tarballs, if we were ever unlucky enough to ship a tarball with apparently out-of-date secondary files. (It's not clear whether that has ever actually happened, but it definitely could.) To ensure that secondary output files have timestamps >= their primary's, change our makefile convention to be that we provide a "touch $@" action not an empty rule. Also, make sure that this rule actually gets invoked during a distprep run, else the hazard remains. It's been like this a long time, so back-patch to all supported branches. In HEAD, I skipped the changes in src/backend/catalog/Makefile, because those rules are due to get replaced soon in the bootstrap data format patch, and there seems no need to create a merge issue for that patch. If for some reason we fail to land that patch in v11, we'll need to back-fill the changes in that one makefile from v10. Discussion: https://postgr.es/m/18556.1521668179@sss.pgh.pa.us	2018-03-23 13:46:00 -04:00
Teodor Sigaev	8694cc96b5	Exclude unlogged tables from base backups Exclude unlogged tables from base backup entirely except init fork which marks created unlogged table. The next question is do not backup temp table but it's a story for separate patch. Author: David Steele Review by: Adam Brightwell, Masahiko Sawada Discussion: https://www.postgresql.org/message-id/flat/04791bab-cb04-ba43-e9c0-664a4c1ffb2c@pgmasters.net	2018-03-23 19:14:12 +03:00
Alvaro Herrera	86f575948c	Allow FOR EACH ROW triggers on partitioned tables Previously, FOR EACH ROW triggers were not allowed in partitioned tables. Now we allow AFTER triggers on them, and on trigger creation we cascade to create an identical trigger in each partition. We also clone the triggers to each partition that is created or attached later. This means that deferred unique keys are allowed on partitioned tables, too. Author: Álvaro Herrera Reviewed-by: Peter Eisentraut, Simon Riggs, Amit Langote, Robert Haas, Thomas Munro Discussion: https://postgr.es/m/20171229225319.ajltgss2ojkfd3kp@alvherre.pgsql	2018-03-23 10:48:22 -03:00
Andres Freund	2111a48a0c	Adapt expression JIT to stdbool.h introduction. The LLVM JIT provider uses clang to synchronize types between normal C code and runtime generated code. Clang represents stdbool.h style booleans in return values & parameters differently from booleans stored in variables. Thus the expression compilation code from `2a0faed9d` needs to be adapted to `9a95a77d9`. Instead of hardcoding i8 as the type for booleans (which already was wrong on some edge case platforms!), use postgres' notion of a boolean as used for storage and for parameters. Per buildfarm animal xenodermus. Author: Andres Freund	2018-03-22 22:15:51 -07:00
Peter Eisentraut	9a95a77d9d	Use stdbool.h if suitable Using the standard bool type provided by C allows some recent compilers and debuggers to give better diagnostics. Also, some extension code and third-party headers are increasingly pulling in stdbool.h, so it's probably saner if everyone uses the same definition. But PostgreSQL code is not prepared to handle bool of a size other than 1, so we keep our own old definition if we encounter a stdbool.h with a bool of a different size. (Among current build farm members, this only applies to old macOS versions on PowerPC.) To check that the used bool is of the right size, add a static assertions about size of GinTernaryValue vs bool. This is currently the only place that assumes that bool and char are of the same size. Discussion: https://www.postgresql.org/message-id/flat/3a0fe7e1-5ed1-414b-9230-53bbc0ed1f49@2ndquadrant.com	2018-03-22 20:42:25 -04:00
Andres Freund	2a0faed9d7	Add expression compilation support to LLVM JIT provider. In addition to the interpretation of expressions (which back evaluation of WHERE clauses, target list projection, aggregates transition values etc) support compiling expressions to native code, using the infrastructure added in earlier commits. To avoid duplicating a lot of code, only support emitting code for cases that are likely to be performance critical. For expression steps that aren't deemed that, use the existing interpreter. The generated code isn't great - some architectural changes are required to address that. But this already yields a significant speedup for some analytics queries, particularly with WHERE clauses filtering a lot, or computing multiple aggregates. Author: Andres Freund Tested-By: Thomas Munro Discussion: https://postgr.es/m/20170901064131.tazjxwus3k2w3ybh@alap3.anarazel.de Disable JITing for VALUES() nodes. VALUES() nodes are only ever executed once. This is primarily helpful for debugging, when forcing JITing even for cheap queries. Author: Andres Freund Discussion: https://postgr.es/m/20170901064131.tazjxwus3k2w3ybh@alap3.anarazel.de	2018-03-22 14:45:59 -07:00
Andres Freund	fb46ac26fe	Expand list of synchronized types and functions in LLVM JIT provider. Author: Andres Freund Discussion: https://postgr.es/m/20170901064131.tazjxwus3k2w3ybh@alap3.anarazel.de	2018-03-22 14:45:59 -07:00
Tom Lane	feb8254518	Improve style guideline compliance of assorted error-report messages. Per the project style guide, details and hints should have leading capitalization and end with a period. On the other hand, errcontext should not be capitalized and should not end with a period. To support well formatted error contexts in dblink, extend dblink_res_error() to take a format+arguments rather than a hardcoded string. Daniel Gustafsson Discussion: https://postgr.es/m/B3C002C8-21A0-4F53-A06E-8CAB29FCF295@yesql.se	2018-03-22 17:33:10 -04:00
Robert Haas	88ba0ae2aa	Consider Parallel Append of partial paths for UNION [ALL]. Without this patch, we can implement a UNION or UNION ALL as an Append where Gather appears beneath one or more of the Append branches, but this lets us put the Gather node on top, with a partial path for each relation underneath. There is considerably more work that could be done to improve planning in this area, but that will probably need to wait for a future release. Patch by me, reviewed and tested by Ashutosh Bapat and Rajkumar Raghuwanshi. Discussion: http://postgr.es/m/CA+TgmoaLRAOqHmMZx=ESM3VDEPceg+-XXZsRXQ8GtFJO_zbMSw@mail.gmail.com	2018-03-22 16:09:28 -04:00
Tom Lane	7c91a0364f	Sync up our various ways of estimating pg_class.reltuples. VACUUM thought that reltuples represents the total number of tuples in the relation, while ANALYZE counted only live tuples. This can cause "flapping" in the value when background vacuums and analyzes happen separately. The planner's use of reltuples essentially assumes that it's the count of live (visible) tuples, so let's standardize on having it mean live tuples. Another issue is that the definition of "live tuple" isn't totally clear; what should be done with INSERT_IN_PROGRESS or DELETE_IN_PROGRESS tuples? ANALYZE's choices in this regard are made on the assumption that if the originating transaction commits at all, it will happen after ANALYZE finishes, so we should ignore the effects of the in-progress transaction --- unless it is our own transaction, and then we should count it. Let's propagate this definition into VACUUM, too. Likewise propagate this definition into CREATE INDEX, and into contrib/pgstattuple's pgstattuple_approx() function. Tomas Vondra, reviewed by Haribabu Kommi, some corrections by me Discussion: https://postgr.es/m/16db4468-edfa-830a-f921-39a50498e77e@2ndquadrant.com	2018-03-22 15:47:41 -04:00
Andres Freund	cc415a56d0	Basic planner and executor integration for JIT. This adds simple cost based plan time decision about whether JIT should be performed. jit_above_cost, jit_optimize_above_cost are compared with the total cost of a plan, and if the cost is above them JIT is performed / optimization is performed respectively. For that PlannedStmt and EState have a jitFlags (es_jit_flags) field that stores information about what JIT operations should be performed. EState now also has a new es_jit field, which can store a JitContext. When there are no errors the context is released in standard_ExecutorEnd(). It is likely that the default values for jit_[optimize_]above_cost will need to be adapted further, but in my test these values seem to work reasonably. Author: Andres Freund, with feedback by Peter Eisentraut Discussion: https://postgr.es/m/20170901064131.tazjxwus3k2w3ybh@alap3.anarazel.de	2018-03-22 11:51:58 -07:00
Andres Freund	250bca7fc1	Debugging and profiling support for LLVM JIT provider. This currently requires patches to the LLVM codebase to be effective (submitted upstream), the GUCs are available without those patches however. Author: Andres Freund Discussion: https://postgr.es/m/20170901064131.tazjxwus3k2w3ybh@alap3.anarazel.de	2018-03-22 11:07:55 -07:00
Andres Freund	b96d550eb0	Support for optimizing and emitting code in LLVM JIT provider. This commit introduces the ability to actually generate code using LLVM. In particular, this adds: - Ability to emit code both in heavily optimized and largely unoptimized fashion - Batching facility to allow functions to be defined in small increments, but optimized and emitted in executable form in larger batches (for performance and memory efficiency) - Type and function declaration synchronization between runtime generated code and normal postgres code. This is critical to be able to access struct fields etc. - Developer oriented jit_dump_bitcode GUC, for inspecting / debugging the generated code. - per JitContext statistics of number of functions, time spent generating code, optimizing, and emitting it. This will later be employed for EXPLAIN support. This commit doesn't yet contain any code actually generating functions. That'll follow in later commits. Documentation for GUCs added, and for JIT in general, will be added in later commits. Author: Andres Freund, with contributions by Pierre Ducroquet Testing-By: Thomas Munro, Peter Eisentraut Discussion: https://postgr.es/m/20170901064131.tazjxwus3k2w3ybh@alap3.anarazel.de	2018-03-22 11:05:22 -07:00
Robert Haas	2fe6336e2d	Avoid creating a TOAST table for a partitioned table. It's useless. Amit Langote Discussion: http://postgr.es/m/b4c9dee6-d134-49b8-79c4-07fbd7c3b898@lab.ntt.co.jp	2018-03-22 13:49:38 -04:00
Robert Haas	8a8c4f3b32	Fix typo in comment. Michael Paquier Discussion: http://postgr.es/m/20180205071404.GB17337@paquier.xyz	2018-03-22 13:36:14 -04:00
Tom Lane	649f179250	Fix tuple counting in SP-GiST index build. Count the number of tuples in the index honestly, instead of assuming that it's the same as the number of tuples in the heap. (It might be different if the index is partial.) Back-patch to all supported versions. Tomas Vondra Discussion: https://postgr.es/m/3b3d8eac-c709-0d25-088e-b98339a1b28a@2ndquadrant.com	2018-03-22 13:24:05 -04:00
Robert Haas	7de4a1bcc5	Call pgstat_report_activity() in parallel CREATE INDEX workers. Also set debug_query_string. Oversight in commit `9da0cc3528` Peter Geoghegan, per a report by Phil Florent. Discussion: https://postgr.es/m/CAH2-Wzmf-34hD4n40uTuE-ZY9P5c%2BmvhFbCdQfN%3DKrKiVm3j3A%40mail.gmail.com	2018-03-22 13:15:03 -04:00
Robert Haas	e2f1eb0ee3	Implement partition-wise grouping/aggregation. If the partition keys of input relation are part of the GROUP BY clause, all the rows belonging to a given group come from a single partition. This allows aggregation/grouping over a partitioned relation to be broken down * into aggregation/grouping on each partition. This should be no worse, and often better, than the normal approach. If the GROUP BY clause does not contain all the partition keys, we can still perform partial aggregation for each partition and then finalize aggregation after appending the partial results. This is less certain to be a win, but it's still useful. Jeevan Chalke, Ashutosh Bapat, Robert Haas. The larger patch series of which this patch is a part was also reviewed and tested by Antonin Houska, Rajkumar Raghuwanshi, David Rowley, Dilip Kumar, Konstantin Knizhnik, Pascal Legrand, and Rafia Sabih. Discussion: http://postgr.es/m/CAM2+6=V64_xhstVHie0Rz=KPEQnLJMZt_e314P0jaT_oJ9MR8A@mail.gmail.com	2018-03-22 12:49:48 -04:00
Dean Rasheed	b5db1d93d2	Improve ANALYZE's strategy for finding MCVs. Previously, a value was included in the MCV list if its frequency was 25% larger than the estimated average frequency of all nonnull values in the table. For uniform distributions, that can lead to values being included in the MCV list and significantly overestimated on the basis of relatively few (sometimes just 2) instances being seen in the sample. For non-uniform distributions, it can lead to too few values being included in the MCV list, since the overall average frequency may be dominated by a small number of very common values, while the remaining values may still have a large spread of frequencies, causing both substantial overestimation and underestimation of the remaining values. Furthermore, increasing the statistics target may have little effect because the overall average frequency will remain relatively unchanged. Instead, populate the MCV list with the largest set of common values that are statistically significantly more common than the average frequency of the remaining values. This takes into account the variance of the sample counts, which depends on the counts themselves and on the proportion of the table that was sampled. As a result, it constrains the relative standard error of estimates based on the frequencies of values in the list, reducing the chances of too many values being included. At the same time, it allows more values to be included, since the MCVs need only be more common than the remaining non-MCVs, rather than the overall average. Thus it tends to produce fewer MCVs than the previous code for uniform distributions, and more for non-uniform distributions, reducing estimation errors in both cases. In addition, the algorithm responds better to increasing the statistics target, allowing more values to be included in the MCV list when more of the table is sampled. Jeff Janes, substantially modified by me. Reviewed by John Naylor and Tomas Vondra. Discussion: https://postgr.es/m/CAMkU=1yvdGvW9TmiLAhz2erFnvnPFYHbOZuO+a=4DVkzpuQ2tw@mail.gmail.com	2018-03-22 09:37:36 +00:00
Andres Freund	31bc604e0b	Add file containing extensions of the LLVM C API. Author: Andres Freund Discussion: https://postgr.es/m/20170901064131.tazjxwus3k2w3ybh@alap3.anarazel.de	2018-03-21 19:44:17 -07:00
Andres Freund	432bb9e04d	Basic JIT provider and error handling infrastructure. This commit introduces: 1) JIT provider abstraction, which allows JIT functionality to be implemented in separate shared libraries. That's desirable because it allows to install JIT support as a separate package, and because it allows experimentation with different forms of JITing. 2) JITContexts which can be, using functions introduced in follow up commits, used to emit JITed functions, and have them be cleaned up on error. 3) The outline of a LLVM JIT provider, which will be fleshed out in subsequent commits. Documentation for GUCs added, and for JIT in general, will be added in later commits. Author: Andres Freund, with architectural input from Jeff Davis Discussion: https://postgr.es/m/20170901064131.tazjxwus3k2w3ybh@alap3.anarazel.de	2018-03-21 19:28:28 -07:00
Tom Lane	846b5a5257	Prevent extensions from creating custom GUCs that are GUC_LIST_QUOTE. Pending some solution for the problems noted in commit `742869946`, disallow dynamic creation of GUC_LIST_QUOTE variables. If there are any extensions out there using this feature, they'd not be happy for us to start enforcing this rule in minor releases, so this is a HEAD-only change. The previous commit didn't make things any worse than they already were for such cases. Discussion: https://postgr.es/m/20180111064900.GA51030@paquier.xyz	2018-03-21 20:11:07 -04:00
Tom Lane	742869946f	Fix mishandling of quoted-list GUC values in pg_dump and ruleutils.c. Code that prints out the contents of setconfig or proconfig arrays in SQL format needs to handle GUC_LIST_QUOTE variables differently from other ones, because for those variables, flatten_set_variable_args() already applied a layer of quoting. The value can therefore safely be printed as-is, and indeed must be, or flatten_set_variable_args() will muck it up completely on reload. For all other GUC variables, it's necessary and sufficient to quote the value as a SQL literal. We'd recognized the need for this long ago, but mis-analyzed the need slightly, thinking that all GUC_LIST_INPUT variables needed the special treatment. That's actually wrong, since a valid value of a LIST variable might include characters that need quoting, although no existing variables accept such values. More to the point, we hadn't made any particular effort to keep the various places that deal with this up-to-date with the set of variables that actually need special treatment, meaning that we'd do the wrong thing with, for example, temp_tablespaces values. This affects dumping of SET clauses attached to functions, as well as ALTER DATABASE/ROLE SET commands. In ruleutils.c we can fix it reasonably honestly by exporting a guc.c function that allows discovering the flags for a given GUC variable. But pg_dump doesn't have easy access to that, so continue the old method of having a hard-wired list of affected variable names. At least we can fix it to have just one list not two, and update the list to match current reality. A remaining problem with this is that it only works for built-in GUC variables. pg_dump's list obvious knows nothing of third-party extensions, and even the "ask guc.c" method isn't bulletproof since the relevant extension might not be loaded. There's no obvious solution to that, so for now, we'll just have to discourage extension authors from inventing custom GUCs that need GUC_LIST_QUOTE. This has been busted for a long time, so back-patch to all supported branches. Michael Paquier and Tom Lane, reviewed by Kyotaro Horiguchi and Pavel Stehule Discussion: https://postgr.es/m/20180111064900.GA51030@paquier.xyz	2018-03-21 20:03:28 -04:00
Tom Lane	0f0deb7194	Improve predtest.c's handling of cases with NULL-constant inputs. Currently, if operator_predicate_proof() is given an operator clause like "something op NULL", it just throws up its hands and reports it can't prove anything. But we can often do better than that, if the operator is strict, because then we know that the clause returns NULL overall. Depending on whether we're trying to prove or refute something, and whether we need weak or strong semantics for NULL, this may be enough to prove the implication, especially when we rely on the standard rule that "false implies anything". In particular, this lets us do something useful with questions like "does X IN (1,3,5,NULL) imply X <= 5?" The null entry in the IN list can effectively be ignored for this purpose, but the proof rules were not previously smart enough to deduce that. This patch is by me, but it owes something to previous work by Amit Langote to try to solve problems of the form mentioned. Thanks also to Emre Hasegeli and Ashutosh Bapat for review. Discussion: https://postgr.es/m/3bad48fc-f257-c445-feeb-8a2b2fb622ba@lab.ntt.co.jp	2018-03-21 18:30:46 -04:00
Alvaro Herrera	56163004b8	Fix relcache handling of the 'default' partition My commit `4dba331cb3` that moved around CommandCounterIncrement calls in partitioning DDL code unearthed a problem with the relcache handling for the 'default' partition: the construction of a correct relcache entry for the partitioned table was at the mercy of lack of CCI calls in non-trivial amounts of code. This was prone to creating problems later on, as the code develops. This was visible as a test failure in a compile with RELCACHE_FORCE_RELASE (buildfarm member prion). The problem is that after the mentioned commit it was possible to create a relcache entry that had incomplete information regarding the default partition because I introduced a CCI between adding the catalog entries for the default partition (StorePartitionBound) and the update of pg_partitioned_table entry for its parent partitioned table (update_default_partition_oid). It seems the best fix is to move the latter so that it occurs inside the former; the purposeful lack of intervening CCI should be more obvious, and harder to break. I also remove a check in RelationBuildPartitionDesc that returns NULL if the key is not set. I couldn't find any place that needs this hack anymore; probably it was required because of bugs that have since been fixed. Fix a few typos I noticed while reviewing the code involved. Discussion: https://postgr.es/m/20180320182659.nyzn3vqtjbbtfgwq@alvherre.pgsql	2018-03-21 12:03:35 -03:00
Peter Eisentraut	325f2ec555	Handle heap rewrites even better in logical decoding Logical decoding should not publish anything about tables created as part of a heap rewrite during DDL. Those tables don't exist externally, so consumers of logical decoding cannot do anything sensible with that information. In `ab28feae2b`, we worked around this for built-in logical replication, but that was hack. This is a more proper fix: We mark such transient heaps using the new field pg_class.relwrite, linking to the original relation OID. By default, we ignore them in logical decoding before they get to the output plugin. Optionally, a plugin can register their interest in getting such changes, if they handle DDL specially, in which case the new field will help them get information about the actual table. Reviewed-by: Craig Ringer <craig@2ndquadrant.com>	2018-03-21 09:15:04 -04:00
Andrew Gierth	d2d79887ea	Repair crash with unsortable grouping sets. If there were multiple grouping sets, none of them empty, all of which were unsortable, then an oversight in consider_groupingsets_paths led to a null pointer dereference. Fix, and add a regression test for this case. Per report from Dang Minh Huong, though I didn't use their patch. Backpatch to 10.x where hashed grouping sets were added.	2018-03-21 11:39:28 +00:00
Andres Freund	4c0000b839	Handle EEOP_FUNCEXPR_[STRICT_]FUSAGE out of line. This isn't a very common op, and it doesn't seem worth duplicating for JIT. Author: Andres Freund	2018-03-20 17:32:21 -07:00
Robert Haas	94150513ec	Don't pass the grouping target around unnecessarily. Since commit `4f15e5d09d` made grouped_rel set reltarget, a variety of other functions can just get it from grouped_rel instead of having to pass it around explicitly. Simplify accordingly. Patch by me, reviewed by Ashutosh Bapat. Discussion: http://postgr.es/m/CA+TgmoZ+ZJTVad-=vEq393N99KTooxv9k7M+z73qnTAqkb49BQ@mail.gmail.com	2018-03-20 11:37:43 -04:00
Robert Haas	b5996c2791	Determine grouping strategies in create_grouping_paths. Partition-wise aggregate will call create_ordinary_grouping_paths multiple times and we don't want to redo this work every time; have the caller do it instead and pass the details down. Patch by me, reviewed by Ashutosh Bapat. Discussion: http://postgr.es/m/CA+TgmoY7VYYn9a7YHj1nJL6zj6BkHmt4K-un9LRmXkyqRZyynA@mail.gmail.com	2018-03-20 11:31:06 -04:00
Robert Haas	4f15e5d09d	Defer creation of partially-grouped relation until it's needed. This avoids unnecessarily creating a RelOptInfo for which we have no actual need. This idea is from Ashutosh Bapat, who wrote a very different patch to accomplish a similar goal. It will be more important if and when we get partition-wise aggregate, since then there could be many partially grouped relations all of which could potentially be unnecessary. In passing, this sets the grouping relation's reltarget, which wasn't done previously but makes things simpler for this refactoring. Along the way, adjust things so that add_paths_to_partial_grouping_rel, now renamed create_partial_grouping_paths, does not perform the Gather or Gather Merge steps to generate non-partial paths from partial paths; have the caller do it instead. This is again for the convenience of partition-wise aggregate, which wants to inject additional partial paths are created and before we decide which ones to Gather/Gather Merge. This might seem like a separate change, but it's actually pretty closely entangled; I couldn't really see much value in separating it and having to change some things twice. Patch by me, reviewed by Ashutosh Bapat. Discussion: http://postgr.es/m/CA+TgmoZ+ZJTVad-=vEq393N99KTooxv9k7M+z73qnTAqkb49BQ@mail.gmail.com	2018-03-20 11:18:04 -04:00
Alvaro Herrera	4dba331cb3	Fix CommandCounterIncrement in partition-related DDL It makes sense to do the CCIs in the places that do catalog updates, rather than before the places that error out because the former ones fail to do it. In particular, it looks like StorePartitionBound() and IndexSetParentIndex() ought to make their own CCIs. Per review comments from Peter Eisentraut for row-level triggers on partitioned tables. Discussion: https://postgr.es/m/20171229225319.ajltgss2ojkfd3kp@alvherre.pgsql	2018-03-20 11:19:41 -03:00
Tom Lane	467963c3e9	Prevent query-lifespan memory leakage of SP-GiST traversal values. The original coding of the SP-GiST scan traversalValue feature (commit `ccd6eb49a`) arranged for traversal values to be stored in the query's main executor context. That's fine if there's only one index scan per query, but if there are many, we have a memory leak as successive scans create new traversal values. Fix it by creating a separate memory context for traversal values, which we can reset during spgrescan(). Back-patch to 9.6 where this code was introduced. In principle, adding the traversalCxt field to SpGistScanOpaqueData creates an ABI break in the back branches. But I (tgl) have little sympathy for extensions including spgist_private.h, so I'm not very worried about that. Alternatively we could stick the new field at the end of the struct in back branches, but that has its own downsides. Anton Dignös, reviewed by Alexander Kuzmenkov Discussion: https://postgr.es/m/CALNdv1jb6y2Te-m8xHLxLX12RsBmZJ1f4hESX7J0HjgyOhA9eA@mail.gmail.com	2018-03-19 23:59:30 -04:00
Peter Eisentraut	13c7c65ec9	Add missing break	2018-03-19 19:45:51 -04:00
Tom Lane	6497a18e6c	Fix some corner-case issues in REFRESH MATERIALIZED VIEW CONCURRENTLY. refresh_by_match_merge() has some issues in the way it builds a SQL query to construct the "diff" table: 1. It doesn't require the selected unique index(es) to be indimmediate. 2. It doesn't pay attention to the particular equality semantics enforced by a given index, but just assumes that they must be those of the column datatype's default btree opclass. 3. It doesn't check that the indexes are btrees. 4. It's insufficiently careful to ensure that the parser will pick the intended operator when parsing the query. (This would have been a security bug before CVE-2018-1058.) 5. It's not careful about indexes on system columns. The way to fix #4 is to make use of the existing code in ri_triggers.c for generating an arbitrary binary operator clause. I chose to move that to ruleutils.c, since that seems a more reasonable place to be exporting such functionality from than ri_triggers.c. While #1, #3, and #5 are just latent given existing feature restrictions, and #2 doesn't arise in the core system for lack of alternate opclasses with different equality behaviors, #4 seems like an issue worth back-patching. That's the bulk of the change anyway, so just back-patch the whole thing to 9.4 where this code was introduced. Discussion: https://postgr.es/m/13836.1521413227@sss.pgh.pa.us	2018-03-19 18:50:05 -04:00
Tom Lane	6fbd5cce22	Fix performance hazard in REFRESH MATERIALIZED VIEW CONCURRENTLY. Jeff Janes discovered that commit `7ca25b7de` made one of the queries run by REFRESH MATERIALIZED VIEW CONCURRENTLY perform badly. The root cause is bad cardinality estimation for correlated quals, but a principled solution to that problem is some way off, especially since the planner lacks any statistics about whole-row variables. Moreover, in non-error cases this query produces no rows, meaning it must be run to completion; but use of LIMIT 1 encourages the planner to pick a fast-start, slow-completion plan, exactly not what we want. Remove the LIMIT clause, and instead rely on the count parameter we pass to SPI_execute() to prevent excess work if the query does return some rows. While we've heard no field reports of planner misbehavior with this query, it could be that people are having performance issues that haven't reached the level of pain needed to cause a bug report. In any case, that LIMIT clause can't possibly do anything helpful with any existing version of the planner, and it demonstrably can cause bad choices in some cases, so back-patch to 9.4 where the code was introduced. Thomas Munro Discussion: https://postgr.es/m/CAMkU=1z-JoGymHneGHar1cru4F1XDfHqJDzxP_CtK5cL3DOfmg@mail.gmail.com	2018-03-19 17:23:21 -04:00
Alvaro Herrera	ee0a1fc84e	Remove unnecessary members from ModifyTableState and ExecInsert These values can be obtained from the ModifyTable node which is already a part of both the ModifyTableState and ExecInsert. Author: Álvaro Herrera, Amit Langote Reviewed-by: Peter Geoghegan Discussion: https://postgr.es/m/20180316151303.rml2p5wffn3o6qy6@alvherre.pgsql	2018-03-19 18:09:43 -03:00
Alvaro Herrera	839a8eb2b3	Expand comment a little bit The previous commit removed a comment that was a bit more verbose than its replacement.	2018-03-19 18:01:27 -03:00
Alvaro Herrera	6666ee49f4	Fix state reversal after partition tuple routing We make some changes to ModifyTableState and the EState it uses whenever we route tuples to partitions; but we weren't restoring properly in all cases, possibly causing crashes when partitions with different tuple descriptors are targeted by tuples inserted in the same command. Refactor some code, creating ExecPrepareTupleRouting, to encapsulate the needed state changing logic, and have it invoked one level above its current place (ie. put it in ExecModifyTable instead of ExecInsert); this makes it all more readable. Add a test case to exercise this. We don't support having views as partitions; and since only views can have INSTEAD OF triggers, there is no point in testing for INSTEAD OF when processing insertions into a partitioned table. Remove code that appears to support this (but which is actually never relevant.) In passing, fix location of some very confusing comments in ModifyTableState. Reported-by: Amit Langote Author: Etsuro Fujita, Amit Langote Discussion: https://postgr/es/m/0473bf5c-57b1-f1f7-3d58-455c2230bc5f@lab.ntt.co.jp	2018-03-19 17:45:53 -03:00
Robert Haas	c596fadbfe	Generate a separate upper relation for each stage of setop planning. Commit `3fc6e2d7f5` made setop planning stages return paths rather than plans, but all such paths were loosely associated with a single RelOptInfo, and only the final path was added to the RelOptInfo. Even at the time, it was foreseen that this should be changed, because there is otherwise no good way for a single stage of setop planning to return multiple paths. With this patch, each stage of set operation planning now creates a separate RelOptInfo; these are distinguished by using appropriate relid sets. Note that this patch does nothing whatsoever about actually returning multiple paths for the same set operation; it just makes it possible for a future patch to do so. Along the way, adjust things so that create_upper_paths_hook is called for each of these new RelOptInfos rather than just once, since that might be useful to extensions using that hook. It might be a good to provide an FDW API here as well, but I didn't try to do that for now. Patch by me, reviewed and tested by Ashutosh Bapat and Rajkumar Raghuwanshi. Discussion: http://postgr.es/m/CA+TgmoaLRAOqHmMZx=ESM3VDEPceg+-XXZsRXQ8GtFJO_zbMSw@mail.gmail.com	2018-03-19 11:55:38 -04:00
Robert Haas	49525c4630	Rewrite recurse_union_children to iterate, rather than recurse. Also, rename it to plan_union_chidren, so the old name wasn't very descriptive. This results in a small net reduction in code, seems at least to me to be easier to understand, and saves space on the process stack. Patch by me, reviewed and tested by Ashutosh Bapat and Rajkumar Raghuwanshi. Discussion: http://postgr.es/m/CA+TgmoaLRAOqHmMZx=ESM3VDEPceg+-XXZsRXQ8GtFJO_zbMSw@mail.gmail.com	2018-03-19 11:54:56 -04:00
Magnus Hagander	71cce90ee9	Fix typo in comment Author: Daniel Gustafsson <daniel@yesql.se>	2018-03-19 10:45:44 +01:00
Tom Lane	8f5ac44043	Fix WHERE CURRENT OF when the referenced cursor uses an index-only scan. "UPDATE/DELETE WHERE CURRENT OF cursor_name" failed, with an error message like "cannot extract system attribute from virtual tuple", if the cursor was using a index-only scan for the target table. Fix it by digging the current TID out of the indexscan state. It seems likely that the same failure could occur for CustomScan plans and perhaps some FDW plan types, so that leaving this to be treated as an internal error with an obscure message isn't as good an idea as it first seemed. Hence, add a bit of heaptuple.c infrastructure to let us deliver a more on-topic message. I chose to make the message match what you get for the case where execCurrentOf can't identify the target scan node at all, "cursor "foo" is not a simply updatable scan of table "bar"". Perhaps it should be different, but we can always adjust that later. In the future, it might be nice to provide hooks that would let custom scan providers and/or FDWs deal with this in other ways; but that's not a suitable topic for a back-patchable bug fix. It's been like this all along, so back-patch to all supported branches. Yugo Nagata and Tom Lane Discussion: https://postgr.es/m/20180201013349.937dfc5f.nagata@sraoss.co.jp	2018-03-17 14:59:49 -04:00
Peter Eisentraut	8a3d942529	Add ssl_passphrase_command setting This allows specifying an external command for prompting for or otherwise obtaining passphrases for SSL key files. This is useful because in many cases there is no TTY easily available during service startup. Also add a setting ssl_passphrase_command_supports_reload, which allows supporting SSL configuration reload even if SSL files need passphrases. Reviewed-by: Daniel Gustafsson <daniel@yesql.se>	2018-03-17 08:28:51 -04:00
Andres Freund	7a50bb690b	Add 'unit' parameter to ExplainProperty{Integer,Float}. This allows to deduplicate some existing code, but mainly avoids some duplication in upcoming commits. In passing, fix variable names indicating wrong unit (seconds instead of ms). Author: Andres Freund Discussion: https://postgr.es/m/20180314002740.cah3mdsonz5mxney@alap3.anarazel.de	2018-03-16 23:16:04 -07:00
Andres Freund	f3e4b95edb	Make ExplainPropertyInteger accept 64bit input, remove *Long variant. 'long' is not useful type across platforms, as it's 32bit on 32 bit platforms, and even on some 64bit platforms (e.g. windows) it's still only 32bits wide. As ExplainPropertyInteger should never be performance critical, change it to accept a 64bit argument and remove ExplainPropertyLong. Author: Andres Freund Discussion: https://postgr.es/m/20180314164832.n56wt7zcbpzi6zxe@alap3.anarazel.de	2018-03-16 23:13:12 -07:00
Tom Lane	9e17bdb8a5	Fix query-lifespan memory leakage in repeatedly executed hash joins. ExecHashTableCreate allocated some memory that wasn't freed by ExecHashTableDestroy, specifically the per-hash-key function information. That's not a huge amount of data, but if one runs a query that repeats a hash join enough times, it builds up. Fix by arranging for the data in question to be kept in the hashtable's hashCxt instead of leaving it "loose" in the query-lifespan executor context. (This ensures that we'll also clean up anything that the hash functions allocate in fn_mcxt.) Per report from Amit Khandekar. It's been like this forever, so back-patch to all supported branches. Discussion: https://postgr.es/m/CAJ3gD9cFofAWGvcxLOxDHC=B0hjtW8yGmUsF2hdGh97CM38=7g@mail.gmail.com	2018-03-16 16:03:45 -04:00
Peter Eisentraut	4120864b9e	Change transaction state debug strings to match enum symbols In some cases, these were different for no apparent reason, making debugging unnecessarily mysterious. Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org>	2018-03-16 13:18:06 -04:00
Peter Eisentraut	81148856b0	Improve savepoint error messages Include the savepoint name in the error message and rephrase it a bit to match common style. Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org>	2018-03-16 13:18:06 -04:00
Peter Eisentraut	ec87efde8d	Simplify parse representation of savepoint commands Instead of embedding the savepoint name in a list and then requiring complex code to unpack it, just add another struct field to store it directly. Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org>	2018-03-16 13:18:06 -04:00
Peter Eisentraut	04700b685f	Rename TransactionChain functions We call this thing a "transaction block" everywhere except in a few functions, where it is mysteriously called a "transaction chain". In the SQL standard, a transaction chain is something different. So rename these functions to match the common terminology. Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org>	2018-03-16 13:18:06 -04:00
Peter Eisentraut	8d47a90862	Update function comments After `a6542a4b68`, some function comments were misplaced. Fix that. Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org>	2018-03-16 13:18:05 -04:00
Tom Lane	877cdf11ea	Mop-up for letting VOID-returning SQL functions end with a SELECT. Part of the intent in commit `fd1a421fe` was to allow SQL functions that are declared to return VOID to contain anything, including an unrelated final SELECT, the same as SQL-language procedures can. However, the planner's inlining logic didn't get that memo. Fix it, and add some regression tests covering this area, since evidently we had none. In passing, clean up some typos in comments in create_function_3.sql, and get rid of its none-too-safe assumption that DROP CASCADE notice output is immutably ordered. Per report from Prabhat Sahu. Discussion: https://postgr.es/m/CANEvxPqxAj6nNHVcaXxpTeEFPmh24Whu+23emgjiuKrhJSct0A@mail.gmail.com	2018-03-16 12:48:13 -04:00
Robert Haas	1466bcfa4a	Split create_grouping_paths into degenerate and non-degenerate cases. There's no functional change here, or at least I hope there isn't, just code rearrangement. The rearrangement is motivated by partition-wise aggregate, which doesn't need to consider the degenerate case but wants to reuse the logic for the ordinary case. Based loosely on a patch from Ashutosh Bapat and Jeevan Chalke, but I whacked it around pretty heavily. The larger patch series of which this patch is a part was also reviewed and tested by Antonin Houska, Rajkumar Raghuwanshi, David Rowley, Dilip Kumar, Konstantin Knizhnik, Pascal Legrand, Rafia Sabih, and me. Discussion: http://postgr.es/m/CAFjFpRewpqCmVkwvq6qrRjmbMDpN0CZvRRzjd8UvncczA3Oz1Q@mail.gmail.com	2018-03-15 14:43:58 -04:00
Peter Eisentraut	3a4b891964	Fix more format truncation issues Fix the warnings created by the compiler warning options -Wformat-overflow=2 -Wformat-truncation=2, supported since GCC 7. This is a more aggressive variant of the fixes in `6275f5d28a`, which GCC 7 warned about by default. The issues are all harmless, but some dubious coding patterns are cleaned up. One issue that is of external interest is that BGW_MAXLEN is increased from 64 to 96. Apparently, the old value would cause the bgw_name of logical replication workers to be truncated in some circumstances. But this doesn't actually add those warning options. It appears that the warnings depend a bit on compilation and optimization options, so it would be annoying to have to keep up with that. This is more of a once-in-a-while cleanup. Reviewed-by: Michael Paquier <michael@paquier.xyz>	2018-03-15 11:41:42 -04:00
Robert Haas	648a6c7bd8	Pass additional arguments to a couple of grouping-related functions. get_number_of_groups() and make_partial_grouping_target() currently fish information directly out of the PlannerInfo; in the former case, the target list, and in the latter case, the HAVING qual. This works fine if there's only one grouping relation, but if the pending patch for partition-wise aggregate gets committed, we'll have multiple grouping relations and must therefore use appropriately translated versions of these values for each one. To make that simpler, pass the values to be used as arguments. Jeevan Chalke. The larger patch series of which this patch is a part was also reviewed and tested by Antonin Houska, Rajkumar Raghuwanshi, David Rowley, Dilip Kumar, Konstantin Knizhnik, Pascal Legrand, Rafia Sabih, and me. Discussion: http://postgr.es/m/CAM2+6=UqFnFUypOvLdm5TgC+2M=-E0Q7_LOh0VDFFzmk2BBPzQ@mail.gmail.com Discussion: http://postgr.es/m/CAM2+6=W+L=C4yBqMrgrfTfNtbtmr4T53-hZhwbA2kvbZ9VMrrw@mail.gmail.com	2018-03-15 11:33:52 -04:00
Alvaro Herrera	24c0a6c649	logical replication: fix OID type mapping mechanism The logical replication type map seems to have been misused by its only caller -- it would try to use the remote OID as input for local type routines, which unsurprisingly could result in bogus "cache lookup failed for type XYZ" errors, or random other type names being picked up if they happened to use the right OID. Fix that, changing Oid logicalrep_typmap_getid(Oid remoteid) to char *logicalrep_typmap_gettypname(Oid remoteid) which is more useful. If the remote type is not part of the typmap, this simply prints "unrecognized type" instead of choking trying to figure out -- a pointless exercise (because the only input for that comes from replication messages, which are not under the local node's control) and dangerous to boot, when called from within an error context callback. Once that is done, it comes to light that the local OID in the typmap entry was not being used for anything; the type/schema names are what we need, so remove local type OID from that struct. Once you do that, it becomes pointless to attach a callback to regular syscache invalidation. So remove that also. Reported-by: Dang Minh Huong Author: Masahiko Sawada Reviewed-by: Álvaro Herrera, Petr Jelínek, Dang Minh Huong, Atsushi Torikoshi Discussion: https://postgr.es/m/75DB81BEEA95B445AE6D576A0A5C9E936A6BE964@BPXM05GP.gisp.nec.co.jp Discussion: https://postgr.es/m/75DB81BEEA95B445AE6D576A0A5C9E936A6C4B0A@BPXM05GP.gisp.nec.co.jp	2018-03-14 21:34:26 -03:00
Peter Eisentraut	f66e8bf875	Remove pg_class.relhaspkey It is not used for anything internally, and it cannot be relied on for external uses, so it can just be removed. To correct recommended way to check for a primary key is in pg_index. Discussion: https://www.postgresql.org/message-id/flat/b1a24c6c-6913-f89c-674e-0704f0ed69db@2ndquadrant.com	2018-03-14 15:31:34 -04:00
Stephen Frost	6b960aae90	Fix function-header comments in planner.c In `b5635948ab`, a couple of function header comments weren't changed, or weren't changed correctly, to reflect the arguments being passed into the functions. Specifically, get_number_of_groups() had the wrong argument name in the commit and create_grouping_paths() wasn't updated even though the arguments had been changed. The issue with create_grouping_paths() was noticed by Ashutosh Bapat, while I discovered the issue with get_number_of_groups() by looking to see if there were any similar issues from that commit. Discussion: https://postgr.es/m/CAFjFpRcbp4702jcp387PExt3fNCt62QJN8++DQGwBhsW6wRHWA@mail.gmail.com	2018-03-14 13:51:15 -04:00
Stephen Frost	1f7b8967ef	Fix typo in add_paths_to_append_rel() The comment should have been referring to the number of workers, not the number of paths. Author: Ashutosh Bapat Discussion: https://postgr.es/m/CAFjFpRcbp4702jcp387PExt3fNCt62QJN8++DQGwBhsW6wRHWA@mail.gmail.com	2018-03-14 13:51:14 -04:00
Peter Eisentraut	33803f67f1	Support INOUT arguments in procedures In a top-level CALL, the values of INOUT arguments will be returned as a result row. In PL/pgSQL, the values are assigned back to the input arguments. In other languages, the same convention as for return a record from a function is used. That does not require any code changes in the PL implementations. Reviewed-by: Pavel Stehule <pavel.stehule@gmail.com>	2018-03-14 12:07:28 -04:00
Alvaro Herrera	484a4a08ab	Log when a BRIN autosummarization request fails Autovacuum's 'workitem' request queue is of limited size, so requests can fail if they arrive more quickly than autovacuum can process them. Emit a log message when this happens, to provide better visibility of this. Backpatch to 10. While this represents an API change for AutoVacuumRequestWork, that function is not yet prepared to deal with external modules calling it, so there doesn't seem to be any risk (other than log spam, that is.) Author: Masahiko Sawada Reviewed-by: Fabrízio Mello, Ildar Musin, Álvaro Herrera Discussion: https://postgr.es/m/CAD21AoB1HrQhp6_4rTyHN5kWEJCEsG8YzsjZNt-ctoXSn5Uisw@mail.gmail.com	2018-03-14 11:59:40 -03:00
Stephen Frost	97d18ce27d	Fix comment for ExecProcessReturning resultRelInfo is the argument for the function, not projectReturning. Author: Etsuro Fujita Discussion: https://postgr.es/m/5AA8E11E.1040609@lab.ntt.co.jp	2018-03-14 09:28:08 -04:00
Robert Haas	0927d2f46d	Let Parallel Append over simple UNION ALL have partial subpaths. A simple UNION ALL gets flattened into an appendrel of subquery RTEs, but up until now it's been impossible for the appendrel to use the partial paths for the subqueries, so we can implement the appendrel as a Parallel Append but only one with non-partial paths as children. There are three separate obstacles to removing that limitation. First, when planning a subquery, propagate any partial paths to the final_rel so that they are potentially visible to outer query levels (but not if they have initPlans attached, because that wouldn't be safe). Second, after planning a subquery, propagate any partial paths for the final_rel to the subquery RTE in the outer query level in the same way we do for non-partial paths. Third, teach finalize_plan() to account for the possibility that the fake parameter we use for rescan signalling when the plan contains a Gather (Merge) node may be propagated from an outer query level. Patch by me, reviewed and tested by Amit Khandekar, Rajkumar Raghuwanshi, and Ashutosh Bapat. Test cases based on examples by Rajkumar Raghuwanshi. Discussion: http://postgr.es/m/CA+Tgmoa6L9A1nNCk3aTDVZLZ4KkHDn1+tm7mFyFvP+uQPS7bAg@mail.gmail.com	2018-03-13 16:34:08 -04:00
Tom Lane	d04900de7d	When updating reltuples after ANALYZE, just extrapolate from our sample. The existing logic for updating pg_class.reltuples trusted the sampling results only for the pages ANALYZE actually visited, preferring to believe the previous tuple density estimate for all the unvisited pages. While there's some rationale for doing that for VACUUM (first that VACUUM is likely to visit a very nonrandom subset of pages, and second that we know for sure that the unvisited pages did not change), there's no such rationale for ANALYZE: by assumption, it's looked at an unbiased random sample of the table's pages. Furthermore, in a very large table ANALYZE will have examined only a tiny fraction of the table's pages, meaning it cannot slew the overall density estimate very far at all. In a table that is physically growing, this causes reltuples to increase nearly proportionally to the change in relpages, regardless of what is actually happening in the table. This has been observed to cause reltuples to become so much larger than reality that it effectively shuts off autovacuum, whose threshold for doing anything is a fraction of reltuples. (Getting to the point where that would happen seems to require some additional, not well understood, conditions. But it's undeniable that if reltuples is seriously off in a large table, ANALYZE alone will not fix it in any reasonable number of iterations, especially not if the table is continuing to grow.) Hence, restrict the use of vac_estimate_reltuples() to VACUUM alone, and in ANALYZE, just extrapolate from the sample pages on the assumption that they provide an accurate model of the whole table. If, by very bad luck, they don't, at least another ANALYZE will fix it; in the old logic a single bad estimate could cause problems indefinitely. In HEAD, let's remove vac_estimate_reltuples' is_analyze argument altogether; it was never used for anything and now it's totally pointless. But keep it in the back branches, in case any third-party code is calling this function. Per bug #15005. Back-patch to all supported branches. David Gould, reviewed by Alexander Kuzmenkov, cosmetic changes by me Discussion: https://postgr.es/m/20180117164916.3fdcf2e9@engels	2018-03-13 13:24:27 -04:00
Tom Lane	38f7831d70	Avoid holding AutovacuumScheduleLock while rechecking table statistics. In databases with many tables, re-fetching the statistics takes some time, so that this behavior seriously decreases the available concurrency for multiple autovac workers. There's discussion afoot about more complete fixes, but a simple and back-patchable amelioration is to claim the table and release the lock before rechecking stats. If we find out there's no longer a reason to process the table, re-taking the lock to un-claim the table is cheap enough. (This patch is quite old, but got lost amongst a discussion of more aggressive fixes. It's not clear when or if such a fix will be accepted, but in any case it'd be unlikely to get back-patched. Let's do this now so we have some improvement for the back branches.) In passing, make the normal un-claim step take AutovacuumScheduleLock not AutovacuumLock, since that is what is documented to protect the wi_tableoid field. This wasn't an actual bug in view of the fact that readers of that field hold both locks, but it creates some concurrency penalty against operations that need only AutovacuumLock. Back-patch to all supported versions. Jeff Janes Discussion: https://postgr.es/m/26118.1520865816@sss.pgh.pa.us	2018-03-13 12:28:35 -04:00
Peter Eisentraut	17bb625017	Move strtoint() to common Several places used similar code to convert a string to an int, so take the function that we already had and make it globally available. Reviewed-by: Michael Paquier <michael@paquier.xyz>	2018-03-13 10:21:09 -04:00
Peter Eisentraut	6cf86f4354	Change internal integer representation of Value node A Value node would store an integer as a long. This causes needless portability risks, as long can be of varying sizes. Change it to use int instead. All code using this was already careful to only store 32-bit values anyway. Reviewed-by: Michael Paquier <michael@paquier.xyz>	2018-03-13 09:56:25 -04:00
Peter Eisentraut	377b5ac484	Fix CREATE TABLE / LIKE with bigint identity column CREATE TABLE / LIKE with a bigint identity column would fail on platforms where long is 32 bits. Copying the sequence values used makeInteger(), which would truncate the 64-bit sequence data to 32 bits. To fix, use makeFloat() instead, like the parser. (This does not actually make use of floats, but stores the values as strings.) Bug: #15096 Reviewed-by: Michael Paquier <michael@paquier.xyz>	2018-03-13 09:41:30 -04:00
Alvaro Herrera	1f8a3327a9	Avoid having two PKs in a partition If a table containing a primary key is attach as partition to a partitioned table which has a primary key with a different definition, we would happily create a second one in the new partition. Oops. It turns out that this is because an error check in DefineIndex is executed only if you tell it that it's being run by ALTER TABLE, and the original code here wasn't. Change it so that it does. Added a couple of test cases for this, also. A previously working test started to fail in a different way than before patch because the new check is called earlier; change the PK to plain UNIQUE so that the new behavior isn't invoked, so that the test continues to verify what we want it to verify. Reported by: Noriyoshi Shinoda Discussion: https://postgr.es/m/DF4PR8401MB102060EC2615EC9227CC73F7EEDF0@DF4PR8401MB1020.NAMPRD84.PROD.OUTLOOK.COM	2018-03-12 19:42:32 -03:00
Tom Lane	4a4e2442a7	Fix improper uses of canonicalize_qual(). One of the things canonicalize_qual() does is to remove constant-NULL subexpressions of top-level AND/OR clauses. It does that on the assumption that what it's given is a top-level WHERE clause, so that NULL can be treated like FALSE. Although this is documented down inside a subroutine of canonicalize_qual(), it wasn't mentioned in the documentation of that function itself, and some callers hadn't gotten that memo. Notably, commit `d007a9505` caused get_relation_constraints() to apply canonicalize_qual() to CHECK constraints. That allowed constraint exclusion to misoptimize situations in which a CHECK constraint had a provably-NULL subclause, as seen in the regression test case added here, in which a child table that should be scanned is not. (Although this thinko is ancient, the test case doesn't fail before 9.2, for reasons I've not bothered to track down in detail. There may be related cases that do fail before that.) More recently, commit `f0e44751d` added an independent bug by applying canonicalize_qual() to index expressions, which is even sillier since those might not even be boolean. If they are, though, I think this could lead to making incorrect index entries for affected index expressions in v10. I haven't attempted to prove that though. To fix, add an "is_check" parameter to canonicalize_qual() to specify whether it should assume WHERE or CHECK semantics, and make it perform NULL-elimination accordingly. Adjust the callers to apply the right semantics, or remove the call entirely in cases where it's not known that the expression has one or the other semantics. I also removed the call in some cases involving partition expressions, where it should be a no-op because such expressions should be canonical already ... and was a no-op, independently of whether it could in principle have done something, because it was being handed the qual in implicit-AND format which isn't what it expects. In HEAD, add an Assert to catch that type of mistake in future. This represents an API break for external callers of canonicalize_qual(). While that's intentional in HEAD to make such callers think about which case applies to them, it seems like something we probably wouldn't be thanked for in released branches. Hence, in released branches, the extra parameter is added to a new function canonicalize_qual_ext(), and canonicalize_qual() is a wrapper that retains its old behavior. Patch by me with suggestions from Dean Rasheed. Back-patch to all supported branches. Discussion: https://postgr.es/m/24475.1520635069@sss.pgh.pa.us	2018-03-11 18:10:42 -04:00
Tom Lane	5748f3a0aa	Improve predtest.c's internal docs, and enhance its functionality a bit. Commit `b08df9cab` left things rather poorly documented as far as the exact semantics of "clause_is_check" mode went. Also, that mode did not really work correctly for predicate_refuted_by; although given the lack of specification as to what it should do, as well as the lack of any actual use-case, that's perhaps not surprising. Rename "clause_is_check" to "weak" proof mode, and provide specifications for what it should do. I defined weak refutation as meaning "truth of A implies non-truth of B", which makes it possible to use the mode in the part of relation_excluded_by_constraints that checks for mutually contradictory WHERE clauses. Fix up several places that did things wrong for that definition. (As far as I can see, these errors would only lead to failure-to-prove, not incorrect claims of proof, making them not serious bugs even aside from the fact that v10 contains no use of this mode. So there seems no need for back-patching.) In addition, teach predicate_refuted_by_recurse that it can use predicate_implied_by_recurse after all when processing a strong NOT-clause, so long as it asks for the correct proof strength. This is an optimization that could have been included in commit `b08df9cab`, but wasn't. Also, simplify and generalize the logic that checks for whether nullness of the argument of IS [NOT] NULL would force overall nullness of the predicate or clause. (This results in a change in the partition_prune test's output, as it is now able to prune an all-nulls partition that it did not recognize before.) In passing, in PartConstraintImpliedByRelConstraint, remove bogus conversion of the constraint list to explicit-AND form and then right back again; that accomplished nothing except forcing a useless extra level of recursion inside predicate_implied_by. Discussion: https://postgr.es/m/5983.1520487191@sss.pgh.pa.us	2018-03-09 16:58:26 -05:00
Robert Haas	960df2a971	Correctly assess parallel-safety of tlists when SRFs are used. Since commit `69f4b9c85f`, the existing code was no longer assessing the parallel-safety of the real tlist for each upper rel, but rather the first of possibly several tlists created by split_pathtarget_at_srfs(). Repair. Even though this is clearly wrong, it's not clear that it has any user-visible consequences at the moment, so no back-patch for now. If we discover later that it does have user-visible consequences, we might need to back-patch this to v10. Patch by me, per a report from Rajkumar Raghuwanshi. Discussion: http://postgr.es/m/CA+Tgmoaob_Strkg4Dcx=VyxnyXtrmkV=ofj=pX7gH9hSre-g0Q@mail.gmail.com	2018-03-08 14:25:31 -05:00
Tom Lane	4e0c743c18	Fix cross-checking of ReservedBackends/max_wal_senders/MaxConnections. We were independently checking ReservedBackends < MaxConnections and max_wal_senders < MaxConnections, but because walsenders aren't allowed to use superuser-reserved connections, that's really the wrong thing. Correct behavior is to insist on ReservedBackends + max_wal_senders being less than MaxConnections. Fix the code and associated documentation. This has been wrong for a long time, but since the situation probably hardly ever arises in the field (especially pre-v10, when the default for max_wal_senders was zero), no back-patch. Discussion: https://postgr.es/m/28271.1520195491@sss.pgh.pa.us	2018-03-08 11:25:26 -05:00
Alvaro Herrera	f9d34ce4e7	Add missing debug lines during bootstrap Noticed while playing with changes that mess with the bootstrap sequence; the operations patched here failed to emit anything, leading the developer to think that the bug was in the previous operation that did emit a message.	2018-03-07 11:47:35 -03:00
Alvaro Herrera	f4a2842ac3	Fix typo Author: Kyotaro HORIGUCHI Discussion: https://postgr.es/m/20180307.163428.209919771.horiguchi.kyotaro@lab.ntt.co.jp	2018-03-07 07:08:38 -03:00
Stephen Frost	06ca148430	Fix typo for RangeVarGetRelidExtended The function is actually RangeVarGetRelidExtended, so the comment should reflect that. Author: Michael Paquier Discussion: https://postgr.es/m/20180307035216.GA3184@paquier.xyz	2018-03-06 23:36:26 -05:00
Peter Eisentraut	5b804cc168	Fix costing of parallel hash joins. Commit `1804284042` established that single-batch parallel-aware hash joins could create one large shared hash table using the combined work_mem budget of all participants. The costing accidentally assumed that parallel-oblivious hash joins could also do that. The documentation for initial_cost_hashjoin() also failed to mention the new argument. Repair. Author: Thomas Munro Reported-By: Antonin Houska Reviewed-By: Antonin Houska Discussion: https://postgr.es/m/12441.1513935950%40localhost	2018-03-06 21:54:37 -05:00
Alvaro Herrera	8aa75e1384	Refrain from duplicating data in reorderbuffers If a walsender exits leaving data in reorderbuffers, the next walsender that tries to decode the same transaction would append its decoded data in the same spill files without truncating it first, which effectively duplicate the data. Avoid that by removing any leftover reorderbuffer spill files when a walsender starts. Backpatch to 9.4; this bug has been there from the very beginning of logical decoding. Author: Craig Ringer, revised by me Reviewed by: Álvaro Herrera, Petr Jelínek, Masahiko Sawada	2018-03-06 18:34:29 -03:00
Alvaro Herrera	1ffb63a2a1	Fix bogus Name assignment in CreateStatistics Apparently, it doesn't work to use a plain cstring as a Name datum: you may end up having random bytes because of failing to zero the bytes after the terminating \0, as indicated by valgrind. I introduced this bug in `5564c11815`, so backpatch this fix to REL_10_STABLE, like that commit. While at it, fix a slightly misleading comment, pointed out by David Rowley.	2018-03-06 13:20:40 -03:00
Andres Freund	d06aba240d	Fix parent node of WCO expressions in partitioned tables. Since `edd44738bc` WCO expressions of partitioned tables are initialized with the first subplan as parent. That's not correct, as the correct context is the ModifyTableState node. That's also what is used for RETURNING processing, initialized nearby. This appears not to cause any visible problems for in core code, but is problematic for in development patch. Discussion: https://postgr.es/m/20180303043818.tnvlo243bgy7una3@alap3.anarazel.de	2018-03-05 17:49:59 -08:00
Andres Freund	854dd8cff5	Add parenthesized options syntax for ANALYZE. This is analogous to the syntax allowed for VACUUM. This allows us to avoid making new options reserved keywords and makes it easier to allow arbitrary argument order. Oh, and it's consistent with the other commands, too. Author: Nathan Bossart Reviewed-By: Michael Paquier, Masahiko Sawada Discussion: https://postgr.es/m/D3FC73E2-9B1A-4DB4-8180-55F57D116B4E@amazon.com	2018-03-05 16:21:05 -08:00
Andres Freund	b2a177bff1	Fix HEAP_INSERT_IS_SPECULATIVE to HEAP_INSERT_SPECULATIVE in comments. This was wrong since `168d5805e4`, which introduced speculative inserts. Author: Andres Freund	2018-03-05 15:28:03 -08:00
Alvaro Herrera	5564c11815	Clone extended stats in CREATE TABLE (LIKE INCLUDING ALL) The LIKE INCLUDING ALL clause to CREATE TABLE intuitively indicates cloning of extended statistics on the source table, but it failed to do so. Patch it up so that it does. Also include an INCLUDING STATISTICS option to the LIKE clause, so that the behavior can be requested individually, or excluded individually. While at it, reorder the INCLUDING options, both in code and in docs, in alphabetical order which makes more sense than feature-implementation order that was previously used. Backpatch this to Postgres 10, where extended statistics were introduced, because this is seen as an oversight in a fresh feature which is better to get consistent from the get-go instead of changing only in pg11. In pg11, comments on statistics objects are cloned too. In pg10 they are not, because I (Álvaro) was too coward to change the parse node as required to support it. Also, in pg10 I chose not to renumber the parser symbols for the various INCLUDING options in LIKE, for the same reason. Any corresponding user-visible changes (docs) are backpatched, though. Reported-by: Stephen Froehlich Author: David Rowley Reviewed-by: Álvaro Herrera, Tomas Vondra Discussion: https://postgr.es/m/CY1PR0601MB1927315B45667A1B679D0FD5E5EF0@CY1PR0601MB1927.namprd06.prod.outlook.com	2018-03-05 19:37:19 -03:00
Robert Haas	42d7074ebb	shm_mq: Fix detach race condition. Commit `34db06ef9a` adopted a lock-free design for shm_mq.c, but it introduced a race condition that could lose messages. When shm_mq_receive_bytes() detects that the other end has detached, it must make sure that it has seen the final version of mq_bytes_written, or it might miss a message sent before detaching. Thomas Munro Discussion: https://postgr.es/m/CAEepm%3D2myZ4qxpt1a%3DC%2BwEv3o188K13K3UvD-44FK0SdAzHy%2Bw%40mail.gmail.com	2018-03-05 15:12:49 -05:00
Tom Lane	58d9acc18d	Fix assorted issues in convert_to_scalar(). If convert_to_scalar is passed a pair of datatypes it can't cope with, its former behavior was just to elog(ERROR). While this is OK so far as the core code is concerned, there's extension code that would like to use scalarltsel/scalargtsel/etc as selectivity estimators for operators that work on non-core datatypes, and this behavior is a show-stopper for that use-case. If we simply allow convert_to_scalar to return FALSE instead of outright failing, then the main logic of scalarltsel/scalargtsel will work fine for any operator that behaves like a scalar inequality comparison. The lack of conversion capability will mean that we can't estimate to better than histogram-bin-width precision, since the code will effectively assume that the comparison constant falls at the middle of its bin. But that's still a lot better than nothing. (Someday we should provide a way for extension code to supply a custom version of convert_to_scalar, but today is not that day.) While poking at this issue, we noted that the existing code for handling type bytea in convert_to_scalar is several bricks shy of a load. It assumes without checking that if the comparison value is type bytea, the bounds values are too; in the worst case this could lead to a crash. It also fails to detoast the input values, so that the comparison result is complete garbage if any input is toasted out-of-line, compressed, or even just short-header. I'm not sure how often such cases actually occur --- the bounds values, at least, are probably safe since they are elements of an array and hence can't be toasted. But that doesn't make this code OK. Back-patch to all supported branches, partly because author requested that, but mostly because of the bytea bugs. The change in API for the exposed routine convert_network_to_scalar() is theoretically a back-patch hazard, but it seems pretty unlikely that any third-party code is calling that function directly. Tomas Vondra, with some adjustments by me Discussion: https://postgr.es/m/b68441b6-d18f-13ab-b43b-9a72188a4e02@2ndquadrant.com	2018-03-03 20:31:35 -05:00
Tom Lane	8d90b4d01a	Minor cleanup in genbki.pl. Separate out the pg_attribute logic of genbki.pl into its own function. Drop unnecessary "defined $catalog->{data}" check. This both narrows and shortens the data writing loop of the script. There is no functional change (the emitted files are the same as before). John Naylor Discussion: https://postgr.es/m/CAJVSVGXnLH=BSo0x-aA818f=MyQqGS5nM-GDCWAMdnvQJTRC1A@mail.gmail.com	2018-03-03 12:05:28 -05:00
Tom Lane	a351679c80	Trivial adjustments in preparation for bootstrap data conversion. Rationalize a couple of macro names: * In catalog/pg_init_privs.h, rename Anum_pg_init_privs_privs to Anum_pg_init_privs_initprivs to match the column's actual name. * In ecpg, rename ZPBITOID to BITOID to match catalog/pg_type.h. This reduces reader confusion, and will allow us to generate these macros automatically in future. In catalog/pg_tablespace.h, fix the ordering of related DATA and #define lines to agree with how it's done elsewhere. This has no impact today, but simplifies life for the bootstrap data conversion scripts. John Naylor Discussion: https://postgr.es/m/CAJVSVGXnLH=BSo0x-aA818f=MyQqGS5nM-GDCWAMdnvQJTRC1A@mail.gmail.com	2018-03-03 11:23:33 -05:00
Tom Lane	0b1d1a038b	Fix VM buffer pin management in heap_lock_updated_tuple_rec(). Sloppy coding in this function could lead to leaking a VM buffer pin, or to attempting to free the same pin twice. Repair. While at it, reduce the code's tendency to free and reacquire the same page pin. Back-patch to 9.6; before that, this routine did not concern itself with VM pages. Amit Kapila and Tom Lane Discussion: https://postgr.es/m/CAA4eK1KJKwhc=isgTQHjM76CAdVswzNeAuZkh_cx-6QgGkSEgA@mail.gmail.com	2018-03-02 17:40:48 -05:00
Peter Eisentraut	fd1a421fe6	Add prokind column, replacing proisagg and proiswindow The new column distinguishes normal functions, procedures, aggregates, and window functions. This replaces the existing columns proisagg and proiswindow, and replaces the convention that procedures are indicated by prorettype == 0. Also change prorettype to be VOIDOID for procedures. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Michael Paquier <michael@paquier.xyz>	2018-03-02 13:48:33 -05:00
Robert Haas	497171d3e2	shm_mq: Have the receiver set the sender's less frequently. Instead of marking data from the ringer buffer consumed and setting the sender's latch for every message, do it only when the amount of data we can consume is at least 1/4 of the size of the ring buffer, or when no data remains in the ring buffer. This is dramatically faster in my testing; apparently, the savings from sending signals less frequently outweighs the benefit of letting the sender know about available buffer space sooner. Patch by me, reviewed by Andres Freund and tested by Rafia Sabih. Discussion: http://postgr.es/m/CA+TgmoYK7RFj6r7KLEfSGtYZCi3zqTRhAz8mcsDbUAjEmLOZ3Q@mail.gmail.com	2018-03-02 12:20:30 -05:00
Robert Haas	34db06ef9a	shm_mq: Reduce spinlock usage. Previously, mq_bytes_read and mq_bytes_written were protected by the spinlock, but that turns out to cause pretty serious spinlock contention on queries which send many tuples through a Gather or Gather Merge node. This patches changes things so that we instead read and write those values using 8-byte atomics. Since mq_bytes_read can only be changed by the receiver and mq_bytes_written can only be changed by the sender, the only purpose of the spinlock is to prevent reads and writes of these values from being torn on platforms where 8-byte memory access is not atomic, making the conversion fairly straightforward. Testing shows that this produces some slowdown if we're using emulated 64-bit atomics, but since they should be available on any platform where performance is a primary concern, that seems OK. It's faster, sometimes a lot faster, on platforms where such atomics are available. Patch by me, reviewed by Andres Freund, who also suggested the design. Also tested by Rafia Sabih. Discussion: http://postgr.es/m/CA+TgmoYuK0XXxmUNTFT9TSNiBtWnRwasBcHHRCOK9iYmDLQVPg@mail.gmail.com	2018-03-02 12:16:59 -05:00
Tom Lane	81b9b5ce49	Make gistvacuumcleanup() count the actual number of index tuples. Previously, it just returned the heap tuple count, which might be only an estimate, and would be completely the wrong thing if the index is partial. Since this function scans every index page anyway to find free pages, it's practically free to count the surviving index tuples. Let's do that and return an accurate count. This is easily visible as a wrong reltuples value for a partial GiST index following VACUUM, so back-patch to all supported branches. Andrey Borodin, reviewed by Michail Nikolaev Discussion: https://postgr.es/m/151956654251.6915.675951950408204404.pgcf@coridan.postgresql.org	2018-03-02 11:22:42 -05:00
Andres Freund	17b340abf8	Minor clean-up in dshash.{c,h}. For consistency with other code that deals in numbers of buckets, the macro BUCKETS_PER_PARTITION should produce a value of type size_t. Also, fix a mention of an obsolete proposed name for dshash.c that appeared in a comment. Author: Thomas Munro, based on an observation from Amit Kapila Discussion: https://postgr.es/m/CAA4eK1%2BBOp5aaW3aHEkg5Bptf8Ga_BkBnmA-%3DXcAXShs0yCiYQ%40mail.gmail.com	2018-03-01 16:25:46 -08:00
Andres Freund	07c6e5163e	Remove volatile qualifiers from shm_mq.c. Since commit `0709b7ee`, spinlock primitives include a compiler barrier so it is no longer necessary to access either spinlocks or the memory they protect through pointer-to-volatile. Like earlier commits `e93b6298`, `d53e3d5f`, `430008b5`, `8f6bb851`, `df4077cd`. Author: Thomas Munro Discussion: https://postgr.es/m/CAEepm=204T37SxcHo4=xw5btho9jQ-=ZYYrVdcKyz82XYzMoqg@mail.gmail.com	2018-03-01 16:21:52 -08:00
Tom Lane	8ecdc2ffe3	Use ereport not elog for some corrupt-HOT-chain reports. These errors have been seen in the field in corrupted-data situations. It seems worthwhile to report them with ERRCODE_DATA_CORRUPTED, rather than the generic ERRCODE_INTERNAL_ERROR, for the benefit of log monitoring and tools like amcheck. However, use errmsg_internal so that the text strings still aren't translated; it seems unlikely to be worth translators' time to do so. Back-patch to 9.3, like the predecessor commit `d70cf811f` that introduced these elog calls originally (replacing Asserts). Peter Geoghegan Discussion: https://postgr.es/m/CAH2-Wzmn4-Pg-UGFwyuyK-wiTih9j32pwg_7T9iwqXpAUZr=Mg@mail.gmail.com	2018-03-01 16:23:45 -05:00
Alvaro Herrera	477ad05e16	Relax overly strict sanity check for upgraded ancient databases Commit `4800f16a7a` added some sanity checks to ensure we don't accidentally corrupt data, but in one of them we failed to consider the effects of a database upgraded from 9.2 or earlier, where a tuple exclusively locked prior to the upgrade has a slightly different bit pattern. Fix that by using the macro that we fixed in commit `74ebba84ae` for similar situations. Reported-by: Alexandre Garcia Reviewed-by: Andres Freund Discussion: https://postgr.es/m/CAPYLKR6yxV4=pfW0Gwij7aPNiiPx+3ib4USVYnbuQdUtmkMaEA@mail.gmail.com Andres suspects that this bug may have wider ranging consequences, but I couldn't find anything.	2018-03-01 18:07:46 -03:00
Tom Lane	b5febc1d12	Fix IOS planning when only some index columns can return an attribute. Since 9.5, it's possible that some but not all columns of an index support returning the indexed value for index-only scans. If the same indexed column appears in index columns that behave both ways, check_index_only() supposed that it'd be OK to do an index-only scan testing that column; but that fails if we have to recheck the indexed condition on one of the columns that doesn't support this. In principle we could make this work by remapping the recheck expressions to pull the value from a column that does support returning the indexed value. But such cases are so weird and rare that, at least for now, it doesn't seem worth the trouble. Instead, just teach check_index_only that a value is returnable only if all the index columns containing it are returnable, rather than any of them. Per report from David Pereiro Lagares. Back-patch to 9.5 where the possibility of this situation appeared. Kyotaro Horiguchi Discussion: https://postgr.es/m/1516210494.1798.16.camel@nlpgo.com	2018-03-01 15:35:03 -05:00
Tom Lane	6452b098c0	Remove out-of-date comment about formrdesc(). formrdesc's comment listed the specific catalogs it is called for, but the list was out of date. Rather than jumping back onto that maintenance treadmill, let's just remove the list. It tells the reader nothing that can't be learned quickly and more reliably by searching relcache.c for callers of formrdesc(). Oversight noted by Kyotaro Horiguchi. Discussion: https://postgr.es/m/20180214.105314.138966434.horiguchi.kyotaro@lab.ntt.co.jp	2018-03-01 12:03:29 -05:00
Tom Lane	8f72a57048	Fix format_type() to restore its old behavior. Commit `a26116c6c` accidentally changed the behavior of the SQL format_type() function while refactoring. For the reasons explained in that function's comment, a NULL typemod argument should behave differently from a -1 argument. Since we've managed to break this, add a regression test memorializing the intended behavior. In passing, be consistent about the type of the "flags" parameter. Noted by Rushabh Lathia, though I revised the patch some more. Discussion: https://postgr.es/m/CAGPqQf3RB2q-d2Awp_-x-Ur6aOxTUwnApt-vm-iTtceZxYnePg@mail.gmail.com	2018-03-01 11:37:46 -05:00
Tom Lane	d79e7e92bf	Remove redundant IndexTupleDSize macro. Use IndexTupleSize everywhere, instead. Also, remove IndexTupleSize's internal typecast, as that's not really needed and might mask coding errors. Change some pointer variable datatypes in the call sites to compensate for that and make it clearer what we're assuming. Ildar Musin, Robert Haas, Stephen Frost Discussion: https://postgr.es/m/0274288e-9e88-13b6-c61c-7b36928bf221@postgrespro.ru	2018-02-28 19:25:54 -05:00
Tom Lane	43e9490866	Rename base64 routines to avoid conflict with Solaris built-in functions. Solaris 11.4 has built-in functions named b64_encode and b64_decode. Rename ours to something else to avoid the conflict (fortunately, ours are static so the impact is limited). One could wish for less duplication of code in this area, but that would be a larger patch and not very suitable for back-patching. Since this is a portability fix, we want to put it into all supported branches. Report and initial patch by Rainer Orth, reviewed and adjusted a bit by Michael Paquier Discussion: https://postgr.es/m/ydd372wk28h.fsf@CeBiTec.Uni-Bielefeld.DE	2018-02-28 18:33:45 -05:00
Robert Haas	2af28e6033	For partitionwise join, match on partcollation, not parttypcoll. The previous code considered two tables to have the partition scheme if the underlying columns had the same collation, but what we actually need to compare is not the collations associated with the column but the collation used for partitioning. Fix that. Robert Haas and Amit Langote Discussion: http://postgr.es/m/0f95f924-0efa-4cf5-eb5f-9a3d1bc3c33d@lab.ntt.co.jp	2018-02-28 12:16:09 -05:00
Robert Haas	73797b7884	Document LWTRANCHE_PARALLEL_HASH_JOIN. Thomas Munro Discussion: http://postgr.es/m/CAEepm=3g1hhbFzYkR_QT9RmBvsGX4UaeCtX-4Js8OOEMmFeaSQ@mail.gmail.com	2018-02-28 11:46:26 -05:00
Robert Haas	ce1663cdcd	Fix assertion failure when Parallel Append is run serially. Parallel-aware plan nodes must be prepared to run without parallelism if it's not possible at execution time for whatever reason. Commit `ab72716778`, which introduced Parallel Append, overlooked this. Rajkumar Raghuwanshi reported this problem, and I included his test case in this patch. The code changes are by me. Discussion: http://postgr.es/m/CAKcux6=WqkUudLg1GLZZ7fc5ScWC1+Y9qD=pAHeqy32WoeJQvw@mail.gmail.com	2018-02-28 10:58:27 -05:00
Robert Haas	c161ea138f	Update and improve comments. Commits `6f6b99d133` and `f3b0897a12` didn't properly update these comments. Etsuro Fujita, reviewed by Amit Langote Discussion: http://postgr.es/m/5A671FE1.6020305@lab.ntt.co.jp	2018-02-28 10:09:31 -05:00
Tom Lane	e98a4de7d2	Use the correct tuplestore read pointer in a NamedTuplestoreScan. Tom Kazimiers reported that transition tables don't work correctly when they are scanned by more than one executor node. That's because commit `18ce3a4ab` allocated separate read pointers for each executor node, as it must, but failed to make them active at the appropriate times. Repair. Thomas Munro Discussion: https://postgr.es/m/20180224034748.bixarv6632vbxgeb%40dewberry.localdomain	2018-02-27 15:56:51 -05:00
Tom Lane	25b692568f	Prevent dangling-pointer access when update trigger returns old tuple. A before-update row trigger may choose to return the "new" or "old" tuple unmodified. ExecBRUpdateTriggers failed to consider the second possibility, and would proceed to free the "old" tuple even if it was the one returned, leading to subsequent access to already-deallocated memory. In debug builds this reliably leads to an "invalid memory alloc request size" failure; in production builds it might accidentally work, but data corruption is also possible. This is a very old bug. There are probably a couple of reasons it hasn't been noticed up to now. It would be more usual to return NULL if one wanted to suppress the update action; returning "old" is significantly less efficient since the update will occur anyway. Also, none of the standard PLs would ever cause this because they all returned freshly-manufactured tuples even if they were just copying "old". But commit `4b93f5799` changed that for plpgsql, making it possible to see the bug with a plpgsql trigger. Still, this is certainly legal behavior for a trigger function, so it's ExecBRUpdateTriggers's fault not plpgsql's. It seems worth creating a test case that exercises returning "old" directly with a C-language trigger; testing this through plpgsql seems unreliable because its behavior might change again. Report and fix by Rushabh Lathia; regression test case by me. Back-patch to all supported branches. Discussion: https://postgr.es/m/CAGPqQf1P4pjiNPrMof=P_16E-DFjt457j+nH2ex3=nBTew7tXw@mail.gmail.com	2018-02-27 13:28:02 -05:00
Robert Haas	5e6a63c0d1	Minor cleanup of code related to partially_grouped_rel. Jeevan Chalke Discussion: http://postgr.es/m/CAM2+6=X9kxQoL2ZqZ00E6asBt9z+rfyWbOmhXJ0+8fPAyMZ9Jg@mail.gmail.com	2018-02-27 13:23:50 -05:00
Robert Haas	3bfe957761	Fix logic error in add_paths_to_partial_grouping_rel. Commit `3bf05e096b` sometimes uses the cheapest_partial_path variable in this function to mean the cheapest one from the input rel and at other times the cheapest one from the partially grouped rel, but it never resets it, so we can end up with bad plans, leading to "ERROR: Aggref found in non-Agg plan node". Jeevan Chalke, per a report from Andreas Joseph Krogh and a separate off-list report from Rajkumar Raghuwanshi Discussion: http://postgr.es/m/CAM2+6=X9kxQoL2ZqZ00E6asBt9z+rfyWbOmhXJ0+8fPAyMZ9Jg@mail.gmail.com	2018-02-27 13:23:50 -05:00
Noah Misch	582edc369c	Empty search_path in Autovacuum and non-psql/pgbench clients. This makes the client programs behave as documented regardless of the connect-time search_path and regardless of user-created objects. Today, a malicious user with CREATE permission on a search_path schema can take control of certain of these clients' queries and invoke arbitrary SQL functions under the client identity, often a superuser. This is exploitable in the default configuration, where all users have CREATE privilege on schema "public". This changes behavior of user-defined code stored in the database, like pg_index.indexprs and pg_extension_config_dump(). If they reach code bearing unqualified names, "does not exist" or "no schema has been selected to create in" errors might appear. Users may fix such errors by schema-qualifying affected names. After upgrading, consider watching server logs for these errors. The --table arguments of src/bin/scripts clients have been lax; for example, "vacuumdb -Zt pg_am\;CHECKPOINT" performed a checkpoint. That now fails, but for now, "vacuumdb -Zt 'pg_am(amname);CHECKPOINT'" still performs a checkpoint. Back-patch to 9.3 (all supported versions). Reviewed by Tom Lane, though this fix strategy was not his first choice. Reported by Arseniy Sharoglazov. Security: CVE-2018-1058	2018-02-26 07:39:44 -08:00
Tom Lane	3d2aed664e	Avoid using unsafe search_path settings during dump and restore. Historically, pg_dump has "set search_path = foo, pg_catalog" when dumping an object in schema "foo", and has also caused that setting to be used while restoring the object. This is problematic because functions and operators in schema "foo" could capture references meant to refer to pg_catalog entries, both in the queries issued by pg_dump and those issued during the subsequent restore run. That could result in dump/restore misbehavior, or in privilege escalation if a nefarious user installs trojan-horse functions or operators. This patch changes pg_dump so that it does not change the search_path dynamically. The emitted restore script sets the search_path to what was used at dump time, and then leaves it alone thereafter. Created objects are placed in the correct schema, regardless of the active search_path, by dint of schema-qualifying their names in the CREATE commands, as well as in subsequent ALTER and ALTER-like commands. Since this change requires a change in the behavior of pg_restore when processing an archive file made according to this new convention, bump the archive file version number; old versions of pg_restore will therefore refuse to process files made with new versions of pg_dump. Security: CVE-2018-1058	2018-02-26 10:18:21 -05:00
Robert Haas	3bf05e096b	Add a new upper planner relation for partially-aggregated results. Up until now, we've abused grouped_rel->partial_pathlist as a place to store partial paths that have been partially aggregate, but that's really not correct, because a partial path for a relation is supposed to be one which produces the correct results with the addition of only a Gather or Gather Merge node, and these paths also require a Finalize Aggregate step. Instead, add a new partially_group_rel which can hold either partial paths (which need to be gathered and then have aggregation finalized) or non-partial paths (which only need to have aggregation finalized). This allows us to reuse generate_gather_paths for partially_grouped_rel instead of writing new code, so that this patch actually basically no net new code while making things cleaner, simplifying things for pending patches for partition-wise aggregate. Robert Haas and Jeevan Chalke. The larger patch series of which this patch is a part was also reviewed and tested by Antonin Houska, Rajkumar Raghuwanshi, David Rowley, Dilip Kumar, Konstantin Knizhnik, Pascal Legrand, Rafia Sabih, and me. Discussion: http://postgr.es/m/CA+TgmobrzFYS3+U8a_BCy3-hOvh5UyJbC18rEcYehxhpw5=ETA@mail.gmail.com Discussion: http://postgr.es/m/CA+TgmoZyQEjdBNuoG9-wC5GQ5GrO4544Myo13dVptvx+uLg9uQ@mail.gmail.com	2018-02-26 09:32:32 -05:00
Peter Eisentraut	c4ba1bee68	Update headers of generated files The scripts were changed in `c98c35cd08`, but the output files were not updated to reflect the script changes.	2018-02-24 14:54:17 -05:00
Peter Eisentraut	9ee0573ef1	Add current directory to Perl include path Recent Perl versions don't have the current directory in the module include path anymore, so we need to add it here explicitly to make these scripts continue to work.	2018-02-24 14:54:16 -05:00
Peter Eisentraut	fde03e8b55	Use croak instead of die in Perl code when appropriate	2018-02-24 14:54:16 -05:00
Tom Lane	32291aed49	Fix thinko in in_range_float4_float8. I forgot the coding rule for correct use of Float8GetDatumFast. Per buildfarm.	2018-02-24 14:46:37 -05:00
Tom Lane	8b29e88cdc	Add window RANGE support for float4, float8, numeric. Commit `0a459cec9` left this for later, but since time's running out, I went ahead and took care of it. There are more data types that somebody might someday want RANGE support for, but this is enough to satisfy all expectations of the SQL standard, which just says that "numeric, datetime, and interval" types should have RANGE support.	2018-02-24 13:23:38 -05:00
Peter Eisentraut	bc1adc651b	Fix filtering of unsupported relations in logical replication In the pgoutput plugin, skip changes for relations that are not publishable, per is_publishable_class(). This concerns in particular materialized views and information_schema tables. While those relations cannot be part of a publication, per existing checks, they will be considered by a FOR ALL TABLES publication. A subscription would not actually apply changes for those relations, again per existing checks, but trying to match incoming changes to local tables on the subscriber would lead to errors if no matching local table exists. Skipping those changes on the publisher avoids sending useless changes and eliminates the error. Bug: #15044 Reported-by: Chad Trabant <chad@iris.washington.edu> Reviewed-by: Petr Jelinek <petr.jelinek@2ndquadrant.com>	2018-02-23 22:13:21 -05:00
Tom Lane	9fe802c818	Fix brown-paper-bag bug in commit `0a459cec96`. RANGE_OFFSET comparisons need to examine the first ORDER BY column, which isn't necessarily the first column in the incoming tuples. No idea how this slipped through initial testing. Per bug #15082 from Zhou Digoal. Discussion: https://postgr.es/m/151939899974.1461.9411971793110285476@wrigleys.postgresql.org	2018-02-23 15:11:40 -05:00
Tom Lane	9afd513df0	Fix planner failures with overlapping mergejoin clauses in an outer join. Given overlapping or partially redundant join clauses, for example t1 JOIN t2 ON t1.a = t2.x AND t1.b = t2.x the planner's EquivalenceClass machinery will ordinarily refactor the clauses as "t1.a = t1.b AND t1.a = t2.x", so that join processing doesn't see multiple references to the same EquivalenceClass in a list of join equality clauses. However, if the join is outer, it's incorrect to derive a restriction clause on the outer side from the join conditions, so the clause refactoring does not happen and we end up with overlapping join conditions. The code that attempted to deal with such cases had several subtle bugs, which could result in "left and right pathkeys do not match in mergejoin" or "outer pathkeys do not match mergeclauses" planner errors, if the selected join plan type was a mergejoin. (It does not appear that any actually incorrect plan could have been emitted.) The core of the problem really was failure to recognize that the outer and inner relations' pathkeys have different relationships to the mergeclause list. A join's mergeclause list is constructed by reference to the outer pathkeys, so it will always be ordered the same as the outer pathkeys, but this cannot be presumed true for the inner pathkeys. If the inner sides of the mergeclauses contain multiple references to the same EquivalenceClass ({t2.x} in the above example) then a simplistic rendering of the required inner sort order is like "ORDER BY t2.x, t2.x", but the pathkey machinery recognizes that the second sort column is redundant and throws it away. The mergejoin planning code failed to account for that behavior properly. One error was to try to generate cut-down versions of the mergeclause list from cut-down versions of the inner pathkeys in the same way as the initial construction of the mergeclause list from the outer pathkeys was done; this could lead to choosing a mergeclause list that fails to match the outer pathkeys. The other problem was that the pathkey cross-checking code in create_mergejoin_plan treated the inner and outer pathkey lists identically, whereas actually the expectations for them must be different. That led to false "pathkeys do not match" failures in some cases, and in principle could have led to failure to detect bogus plans in other cases, though there is no indication that such bogus plans could be generated. Reported by Alexander Kuzmenkov, who also reviewed this patch. This has been broken for years (back to around 8.3 according to my testing), so back-patch to all supported branches. Discussion: https://postgr.es/m/5dad9160-4632-0e47-e120-8e2082000c01@postgrespro.ru	2018-02-23 13:47:33 -05:00
Robert Haas	f724022d0a	Revise API for partition bound search functions. Similar to what commit `b022923556` for a different set of functions, pass the required bits of the PartitionKey instead of the whole thing. This allows these functions to be used without needing the PartitionKey to be available. Amit Langote. The larger patch series of which this patch is a part has been reviewed and tested by Ashutosh Bapat, David Rowley, Dilip Kumar, Jesper Pedersen, Rajkumar Raghuwanshi, Beena Emerson, Kyotaro Horiguchi, Álvaro Herrera, and me, but especially and in great detail by David Rowley. Discussion: http://postgr.es/m/098b9c71-1915-1a2a-8d52-1a7a50ce79e8@lab.ntt.co.jp Discussion: http://postgr.es/m/1f6498e8-377f-d077-e791-5dc84dba2c00@lab.ntt.co.jp	2018-02-23 09:08:43 -05:00
Robert Haas	b022923556	Revise API for partition_rbound_cmp/partition_rbound_datum_cmp. Instead of passing the PartitionKey, pass just the required bits of it. This allows these functions to be used without needing the PartitionKey to be available, which is important for several pending patches. Ashutosh Bapat, reviewed by Amit Langote, with a comment tweak by me. Discussion: http://postgr.es/m/3d835ed1-36ab-f06d-0ce8-a76a2bbf7677@lab.ntt.co.jp Discussion: http://postgr.es/m/b4d88995-094b-320c-b614-2282fae0bf6c@lab.ntt.co.jp	2018-02-23 08:43:52 -05:00
Peter Eisentraut	76b6aa41f4	Support parameters in CALL To support parameters in CALL, move the parse analysis of the procedure and arguments into the global transformation phase, so that the parser hooks can be applied. And then at execution time pass the parameters from ProcessUtility on to ExecuteCallStmt.	2018-02-22 21:36:48 -05:00
Robert Haas	a6a80134e3	Remove extra words. Thomas Munro Discussion: http://postgr.es/m/CAEepm=2x3NUSPed6=-wDYs39KtUU5Dw3mK_NAMWps+18FmkApQ@mail.gmail.com	2018-02-22 18:06:30 -05:00
Peter Eisentraut	10cfce34c0	Add user-callable SHA-2 functions Add the user-callable functions sha224, sha256, sha384, sha512. We already had these in the C code to support SCRAM, but there was no test coverage outside of the SCRAM tests. Adding these as user-callable functions allows writing some tests. Also, we have a user-callable md5 function but no more modern alternative, which led to wide use of md5 as a general-purpose hash function, which leads to occasional complaints about using md5. Also mark the existing md5 functions as leak-proof. Reviewed-by: Michael Paquier <michael@paquier.xyz>	2018-02-22 11:34:53 -05:00
Robert Haas	edd44738bc	Be lazier about partition tuple routing. It's not necessary to fully initialize the executor data structures for partitions to which no tuples are ever routed. Consider, for example, an INSERT statement that inserts only one row: it only cares about the partition to which that one row is routed. The new function ExecInitPartitionInfo performs the initialization in question only when a particular partition is about to receive a tuple. This includes creating, validating, and saving a pointer to the ResultRelInfo, setting up for speculative insertions, translating WCOs and initializing the resulting expressions, translating returning lists and building the appropriate projection information, and setting up a tuple conversion map. One thing that's not deferred is locking the child partitions; that seems desirable but would need more thought. Still, testing shows that this makes single-row inserts significantly faster on a table with many partitions without harming the bulk-insert case. Amit Langote, reviewed by Etsuro Fujita, with a few changes by me Discussion: http://postgr.es/m/8975331d-d961-cbdd-f862-fdd3d97dc2d0@lab.ntt.co.jp	2018-02-22 10:55:54 -05:00
Robert Haas	810e7e264a	Remove extra word from comment. Etsuro Fujita Discussion: http://postgr.es/m/5A8EAF74.5010905@lab.ntt.co.jp	2018-02-22 10:08:03 -05:00
Robert Haas	de6428afe1	Avoid another valgrind complaint about write() of uninitalized bytes. Peter Geoghegan, per buildfarm member skink and Andres Freund Discussion: http://postgr.es/m/20180221053426.gp72lw67yfpzkw7a@alap3.anarazel.de	2018-02-22 09:28:12 -05:00
Robert Haas	7d8ac9814b	Charge cpu_tuple_cost * 0.5 for Append and MergeAppend nodes. Previously, Append didn't charge anything at all, and MergeAppend charged only cpu_operator_cost, about half the value used here. This change might make MergeAppend plans slightly more likely to be chosen than before, since this commit increases the assumed cost for Append -- with default values -- by 0.005 per tuple but MergeAppend by only 0.0025 per tuple. Since the comparisons required by MergeAppend are costed separately, it's not clear why MergeAppend needs to be otherwise more expensive than Append, so hopefully this is OK. Prior to partition-wise join, it didn't really matter whether or not an Append node had any cost of its own, because every plan had to use the same number of Append or MergeAppend nodes and in the same places. Only the relative cost of Append vs. MergeAppend made a difference. Now, however, it is possible to avoid some of the Append nodes using a partition-wise join, so it's worth making an effort. Pending patches for partition-wise aggregate care too, because an Append of Aggregate nodes will incur the Append overhead fewer times than an Aggregate over an Append. Although in most cases this change will favor the use of partition-wise techniques, it does the opposite when the join cardinality is greater than the sum of the input cardinalities. Since this situation arises in an existing regression test, I [rhaas] adjusted it to keep the overall plan shape approximately the same. Jeevan Chalke, per a suggestion from David Rowley. Reviewed by Ashutosh Bapat. Some changes by me. The larger patch series of which this patch is a part was also reviewed and tested by Antonin Houska, Rajkumar Raghuwanshi, David Rowley, Dilip Kumar, Konstantin Knizhnik, Pascal Legrand, Rafia Sabih, and me. Discussion: http://postgr.es/m/CAKJS1f9UXdk6ZYyqbJnjFO9a9hyHKGW7B=ZRh-rxy9qxfPA5Gw@mail.gmail.com	2018-02-21 23:09:27 -05:00
Andres Freund	4c0ec9ee28	Use platform independent type for TupleTableSlot->tts_off. Previously tts_off was, for unknown reasons, of type long. For one that's unnecessary as tuples are restricted in length, for another long would be a bad choice of type even if that weren't the case, as it's not reliably wider than an int. Also HeapTupleHeader->t_len is a uint32. This is split off from a larger patch implementing JITed tuple deforming. Seems like an independent improvement, as tiny as it is. Author: Andres Freund	2018-02-20 15:12:52 -08:00
Peter Eisentraut	c2ff42c6c1	Error message improvement	2018-02-20 17:58:27 -05:00
Magnus Hagander	9a44a26b65	Fix typo Author: Masahiko Sawada	2018-02-20 12:03:18 +01:00
Alvaro Herrera	6f1d723b63	Fix crash in pg_replication_slot_advance We were trying to use a LSN variable after releasing its containing slot structure. Reported by: tushar Author: amul sul Reviewed-by: Petr Jelinek, Masahiko Sawada Discussion: https://postgr.es/m/94ba999c-f76a-0423-6523-b8d531dfe4c7@enterprisedb.com	2018-02-19 22:25:27 -03:00
Tom Lane	159efe4af4	Fix misbehavior of CTE-used-in-a-subplan during EPQ rechecks. An updating query that reads a CTE within an InitPlan or SubPlan could get incorrect results if it updates rows that are concurrently being modified. This is caused by CteScanNext supposing that nothing inside its recursive ExecProcNode call could change which read pointer is selected in the CTE's shared tuplestore. While that's normally true because of scoping considerations, it can break down if an EPQ plan tree gets built during the call, because EvalPlanQualStart builds execution trees for all subplans whether they're going to be used during the recheck or not. And it seems like a pretty shaky assumption anyway, so let's just reselect our own read pointer here. Per bug #14870 from Andrei Gorita. This has been broken since CTEs were implemented, so back-patch to all supported branches. Discussion: https://postgr.es/m/20171024155358.1471.82377@wrigleys.postgresql.org	2018-02-19 16:00:31 -05:00
Alvaro Herrera	eb7ed3f306	Allow UNIQUE indexes on partitioned tables If we restrict unique constraints on partitioned tables so that they must always include the partition key, then our standard approach to unique indexes already works --- each unique key is forced to exist within a single partition, so enforcing the unique restriction in each index individually is enough to have it enforced globally. Therefore we can implement unique indexes on partitions by simply removing a few restrictions (and adding others.) Discussion: https://postgr.es/m/20171222212921.hi6hg6pem2w2t36z@alvherre.pgsql Discussion: https://postgr.es/m/20171229230607.3iib6b62fn3uaf47@alvherre.pgsql Reviewed-by: Simon Riggs, Jesper Pedersen, Peter Eisentraut, Jaime Casanova, Amit Langote	2018-02-19 17:40:00 -03:00
Tom Lane	524d64ea8e	Remove bogus "extern" annotations on function definitions. While this is not illegal C, project style is to put "extern" only on declarations not definitions. David Rowley Discussion: https://postgr.es/m/CAKJS1f9RKLWXcMBQhvDYhmsMEo+ALuNgA-NE+AX5Uoke9DJ2Xg@mail.gmail.com	2018-02-19 12:07:44 -05:00
Tom Lane	8c44802b6e	Remove redundant initialization of a local variable. In what was doubtless a typo, commit `bf6c614a2` introduced a duplicate initialization of a local variable. This made Coverity unhappy, as well as pretty much anybody reading the code. We don't even have a real use for the local variable, so just remove it.	2018-02-18 23:32:56 -05:00
Peter Eisentraut	2e1d1ebdff	Remove redundant function declaration	2018-02-18 22:28:11 -05:00
Peter Eisentraut	97a804cb2b	Message style fix	2018-02-18 17:16:11 -05:00
Peter Eisentraut	1a1adb215c	Move function comment to the right place	2018-02-17 20:45:28 -05:00
Peter Eisentraut	7923118c16	Minor comment fix	2018-02-17 20:45:02 -05:00
Alvaro Herrera	a26116c6cb	Refactor format_type APIs to be more modular Introduce a new format_type_extended, with a flags bitmask argument that can modify the default behavior. A few compatibility and readability wrappers remain: format_type_be format_type_be_qualified format_type_with_typemod while format_type_with_typemod_qualified, which had a single caller, is removed. Author: Michael Paquier, some revisions by me Discussion: 20180213035107.GA2915@paquier.xyz	2018-02-17 19:02:15 -03:00
Andres Freund	ad7dbee368	Allow tupleslots to have a fixed tupledesc, use in executor nodes. The reason for doing so is that it will allow expression evaluation to optimize based on the underlying tupledesc. In particular it will allow to JIT tuple deforming together with the expression itself. For that expression initialization needs to be moved after the relevant slots are initialized - mostly unproblematic, except in the case of nodeWorktablescan.c. After doing so there's no need for ExecAssignResultType() and ExecAssignResultTypeFromTL() anymore, as all former callers have been converted to create a slot with a fixed descriptor. When creating a slot with a fixed descriptor, tts_values/isnull can be allocated together with the main slot, reducing allocation overhead and increasing cache density a bit. Author: Andres Freund Discussion: https://postgr.es/m/20171206093717.vqdxe5icqttpxs3p@alap3.anarazel.de	2018-02-16 21:17:38 -08:00
Andres Freund	bf6c614a2f	Do execGrouping.c via expression eval machinery, take two. This has a performance benefit on own, although not hugely so. The primary benefit is that it will allow for to JIT tuple deforming and comparator invocations. Large parts of this were previously committed (`773aec7aa`), but the commit contained an omission around cross-type comparisons and was thus reverted. Author: Andres Freund Discussion: https://postgr.es/m/20171129080934.amqqkke2zjtekd4t@alap3.anarazel.de	2018-02-16 14:38:13 -08:00
Peter Eisentraut	ad9a274778	Fix crash when canceling parallel query elog(FATAL) would end up calling PortalCleanup(), which would call executor shutdown code, which could fail and crash, especially under parallel query. This was introduced by `8561e4840c`, which did not want to mark an active portal as failed by a normal transaction abort anymore. But we do need to do that for an elog(FATAL) exit. Introduce a variable shmem_exit_inprogress similar to the existing proc_exit_inprogress, so we can tell whether we are in the FATAL exit scenario. Reported-by: Andres Freund <andres@anarazel.de>	2018-02-16 16:21:24 -05:00
Tom Lane	49bff412ed	Remove some inappropriate #includes. Other header files should never #include postgres.h (nor postgres_fe.h, nor c.h), per project policy. Also, there's no need for any backend .c file to explicitly include elog.h or palloc.h, because postgres.h pulls those in already. Extracted from a larger patch by Kyotaro Horiguchi. The rest of the removals he suggests require more study, but these are no-brainers. Discussion: https://postgr.es/m/20180215.200447.209320006.horiguchi.kyotaro@lab.ntt.co.jp	2018-02-16 12:14:08 -05:00
Peter Eisentraut	2fb1abaeb0	Rename enable_partition_wise_join to enable_partitionwise_join Discussion: https://www.postgresql.org/message-id/flat/ad24e4f4-6481-066e-e3fb-6ef4a3121882%402ndquadrant.com	2018-02-16 10:33:59 -05:00
Magnus Hagander	f8437c819a	Fix typo in comment	2018-02-16 12:46:41 +01:00
Andres Freund	2a41507dab	Revert "Do execGrouping.c via expression eval machinery." This reverts commit `773aec7aa9`. There's an unresolved issue in the reverted commit: It only creates one comparator function, but in for the nodeSubplan.c case we need more (c.f. FindTupleHashEntry vs LookupTupleHashEntry calls in nodeSubplan.c). This isn't too difficult to fix, but it's not entirely trivial either. The fact that the issue only causes breakage on 32bit systems shows that the current test coverage isn't that great. To avoid turning half the buildfarm red till those two issues are addressed, revert.	2018-02-15 22:39:18 -08:00
Andres Freund	773aec7aa9	Do execGrouping.c via expression eval machinery. This has a performance benefit on own, although not hugely so. The primary benefit is that it will allow for to JIT tuple deforming and comparator invocations. Author: Andres Freund Discussion: https://postgr.es/m/20171129080934.amqqkke2zjtekd4t@alap3.anarazel.de	2018-02-15 21:55:31 -08:00
Tom Lane	51940f9760	Cast to void in StaticAssertExpr, not its callers. Seems a bit silly that many (in fact all, as of today) uses of StaticAssertExpr would need to cast it to void to avoid warnings from pickier compilers. Let's just do the cast right in the macro, instead. In passing, change StaticAssertExpr to StaticAssertStmt in one place where that seems more apropos. Discussion: https://postgr.es/m/16161.1518715186@sss.pgh.pa.us	2018-02-15 13:41:30 -05:00
Tom Lane	9a725f7b5c	Silence assorted "variable may be used uninitialized" warnings. All of these are false positives, but in each case a fair amount of analysis is needed to see that, and it's not too surprising that not all compilers are smart enough. (In particular, in the logtape.c case, a compiler lacking the knowledge provided by the Assert would almost surely complain, so that this warning will be seen in any non-assert build.) Some of these are of long standing while others are pretty recent, but it only seems worth fixing them in HEAD. Jaime Casanova, tweaked a bit by me Discussion: https://postgr.es/m/CAJGNTeMcYAMJdPAom52dppLMtF-UnEZi0dooj==75OEv1EoBZA@mail.gmail.com	2018-02-14 16:06:49 -05:00
Tom Lane	4b93f57999	Make plpgsql use its DTYPE_REC code paths for composite-type variables. Formerly, DTYPE_REC was used only for variables declared as "record"; variables of named composite types used DTYPE_ROW, which is faster for some purposes but much less flexible. In particular, the ROW code paths are entirely incapable of dealing with DDL-caused changes to the number or data types of the columns of a row variable, once a particular plpgsql function has been parsed for the first time in a session. And, since the stored representation of a ROW isn't a tuple, there wasn't any easy way to deal with variables of domain-over-composite types, since the domain constraint checking code would expect the value to be checked to be a tuple. A lesser, but still real, annoyance is that ROW format cannot represent a true NULL composite value, only a row of per-field NULL values, which is not exactly the same thing. Hence, switch to using DTYPE_REC for all composite-typed variables, whether "record", named composite type, or domain over named composite type. DTYPE_ROW remains but is used only for its native purpose, to represent a fixed-at-compile-time list of variables, for instance the targets of an INTO clause. To accomplish this without taking significant performance losses, introduce infrastructure that allows storing composite-type variables as "expanded objects", similar to the "expanded array" infrastructure introduced in commit `1dc5ebc90`. A composite variable's value is thereby kept (most of the time) in the form of separate Datums, so that field accesses and updates are not much more expensive than they were in the ROW format. This holds the line, more or less, on performance of variables of named composite types in field-access-intensive microbenchmarks, and makes variables declared "record" perform much better than before in similar tests. In addition, the logic involved with enforcing composite-domain constraints against updates of individual fields is in the expanded record infrastructure not plpgsql proper, so that it might be reusable for other purposes. In further support of this, introduce a typcache feature for assigning a unique-within-process identifier to each distinct tuple descriptor of interest; in particular, DDL alterations on composite types result in a new identifier for that type. This allows very cheap detection of the need to refresh tupdesc-dependent data. This improves on the "tupDescSeqNo" idea I had in commit 687f096ea: that assigned identifying sequence numbers to successive versions of individual composite types, but the numbers were not unique across different types, nor was there support for assigning numbers to registered record types. In passing, allow plpgsql functions to accept as well as return type "record". There was no good reason for the old restriction, and it was out of step with most of the other PLs. Tom Lane, reviewed by Pavel Stehule Discussion: https://postgr.es/m/8962.1514399547@sss.pgh.pa.us	2018-02-13 18:52:21 -05:00
Peter Eisentraut	7a32ac8a66	Add procedure support to pg_get_functiondef This also makes procedures work in psql's \ef and \sf commands. Reported-by: Pavel Stehule <pavel.stehule@gmail.com>	2018-02-13 15:13:44 -05:00
Peter Eisentraut	ebdb42a0d6	Fix typo Author: Masahiko Sawada <sawada.mshk@gmail.com>	2018-02-12 22:39:52 -05:00
Alvaro Herrera	8237f27b50	get_relid_attribute_name is dead, long live get_attname The modern way is to use a missing_ok argument instead of two separate almost-identical routines, so do that. Author: Michaël Paquier Reviewed-by: Álvaro Herrera Discussion: https://postgr.es/m/20180201063212.GE6398@paquier.xyz	2018-02-12 19:33:15 -03:00
Robert Haas	88ef48c1cc	Fix parallel index builds for dynamic_shared_memory_type=none. The previous code failed to realize that this setting effectively disables parallelism, and would crash if it decided to attempt parallelism anyway. Instead, treat it as a disabling condition. Kyotaro Horiguchi, who also reported the issue. Reviewed by Michael Paquier and Peter Geoghegan. Discussion: http://postgr.es/m/20180209.170635.256350357.horiguchi.kyotaro@lab.ntt.co.jp	2018-02-12 12:55:12 -05:00
Tom Lane	d02d4a6d4f	Avoid premature free of pass-by-reference CALL arguments. Prematurely freeing the EState used to evaluate CALL arguments led, in some cases, to passing dangling pointers to the procedure. This was masked in trivial cases because the argument pointers would point to Const nodes in the original expression tree, and in some other cases because the result value would end up in the standalone ExprContext rather than in memory belonging to the EState --- but that wasn't exactly high quality programming either, because the standalone ExprContext was never explicitly freed, breaking assorted API contracts. In addition, using a separate EState for each argument was just silly. So let's use just one EState, and one ExprContext, and make the latter belong to the former rather than be standalone, and clean up the EState (and hence the ExprContext) post-call. While at it, improve the function's commentary a bit. Discussion: https://postgr.es/m/29173.1518282748@sss.pgh.pa.us	2018-02-10 13:37:12 -05:00
Tom Lane	65b1d76785	Fix oversight in CALL argument handling, and do some minor cleanup. CALL statements cannot support sub-SELECTs in the arguments of the called procedure, since they just use ExecEvalExpr to evaluate such arguments. Teach transformSubLink() to reject the case, as it already does for other contexts in which subqueries are not supported. In passing, s/EXPR_KIND_CALL/EXPR_KIND_CALL_ARGUMENT/ to make that enum symbol line up more closely with the phrasing of the error messages it is associated with. And fix someone's weak grasp of English grammar in the preceding EXPR_KIND_PARTITION_EXPRESSION addition. Also update an incorrect comment in resolve_unique_index_expr (possibly it was correct when written, but nowadays transformExpr definitely does reject SRFs here). Per report from Pavel Stehule --- but this resolves only one of the bugs he mentions. Discussion: https://postgr.es/m/CAFj8pRDxOwPPzpA8i+AQeDQFj7bhVw-dR2==rfWZ3zMGkm568Q@mail.gmail.com	2018-02-10 13:05:14 -05:00
Robert Haas	be42015fcc	Clear stmt_timeout_active if we disable_all_timeouts. Otherwise, we can end up with the flag set when the timeout is actually disabled, leading to misbehavior. Commit `f8e5f156b3` introduced this bug. Reported by Peter Eisentraut. Analysis and fix by Thomas Munro, tweaked by me. Discussion: http://postgr.es/m/6a909374-2602-7136-8c70-397330a418f3@2ndquadrant.com	2018-02-09 15:48:18 -05:00
Robert Haas	b78d0160da	Fix incorrect method name in comment. Atsushi Torikoshi Discussion: http://postgr.es/m/1b056262-4bc0-a982-c899-bb67a0a7fd52@lab.ntt.co.jp	2018-02-08 14:35:54 -05:00
Robert Haas	e44dd84325	Avoid listing the same ResultRelInfo in more than one EState list. Doing so causes EXPLAIN ANALYZE to show trigger statistics multiple times. Commit `2f17844104` seems to be to blame for this. Amit Langote, revieed by Amit Khandekar, Etsuro Fujita, and me.	2018-02-08 14:29:05 -05:00
Robert Haas	88fdc70060	Fix possible infinite loop with Parallel Append. When the previously-chosen plan was non-partial, all pa_finished flags for partial plans are now set, and pa_next_plan has not yet been set to INVALID_SUBPLAN_INDEX, the previous code could go into an infinite loop. Report by Rajkumar Raghuwanshi. Patch by Amit Khandekar and me. Review by Kyotaro Horiguchi. Discussion: http://postgr.es/m/CAJ3gD9cf43z78qY=U=H0HvOEN341qfRO-vLpnKPSviHeWgJQ5w@mail.gmail.com	2018-02-08 12:31:48 -05:00
Peter Eisentraut	32ff269117	Add more information_schema columns - table_constraints.enforced - triggers.action_order - triggers.action_reference_old_table - triggers.action_reference_new_table Reviewed-by: Michael Paquier <michael.paquier@gmail.com>	2018-02-07 10:08:02 -05:00
Robert Haas	b98a7cd58f	Update out-of-date comment in StartupXLOG. Commit `4b0d28de06` should have updated this comment, but did not. Thomas Munro Discussion: http://postgr.es/m/CAEepm=0iJ8aqQcF9ij2KerAkuHF3SwrVTzjMdm1H4w++nfBf9A@mail.gmail.com	2018-02-07 08:48:04 -05:00
Tom Lane	0a459cec96	Support all SQL:2011 options for window frame clauses. This patch adds the ability to use "RANGE offset PRECEDING/FOLLOWING" frame boundaries in window functions. We'd punted on that back in the original patch to add window functions, because it was not clear how to do it in a reasonably data-type-extensible fashion. That problem is resolved here by adding the ability for btree operator classes to provide an "in_range" support function that defines how to add or subtract the RANGE offset value. Factoring it this way also allows the operator class to avoid overflow problems near the ends of the datatype's range, if it wishes to expend effort on that. (In the committed patch, the integer opclasses handle that issue, but it did not seem worth the trouble to avoid overflow failures for datetime types.) The patch includes in_range support for the integer_ops opfamily (int2/int4/int8) as well as the standard datetime types. Support for other numeric types has been requested, but that seems like suitable material for a follow-on patch. In addition, the patch adds GROUPS mode which counts the offset in ORDER-BY peer groups rather than rows, and it adds the frame_exclusion options specified by SQL:2011. As far as I can see, we are now fully up to spec on window framing options. Existing behaviors remain unchanged, except that I changed the errcode for a couple of existing error reports to meet the SQL spec's expectation that negative "offset" values should be reported as SQLSTATE 22013. Internally and in relevant parts of the documentation, we now consistently use the terminology "offset PRECEDING/FOLLOWING" rather than "value PRECEDING/FOLLOWING", since the term "value" is confusingly vague. Oliver Ford, reviewed and whacked around some by me Discussion: https://postgr.es/m/CAGMVOdu9sivPAxbNN0X+q19Sfv9edEPv=HibOJhB14TJv_RCQg@mail.gmail.com	2018-02-07 00:06:56 -05:00
Robert Haas	2320945731	Fix incorrect grammar. Etsuro Fujita Discussion: http://postgr.es/m/5A7981EA.8020201@lab.ntt.co.jp	2018-02-06 15:50:13 -05:00
Robert Haas	9fafa413ac	Avoid valgrind complaint about write() of uninitalized bytes. LogicalTapeFreeze() may write out its first block when it is dirty but not full, and then immediately read the first block back in from its BufFile as a BLCKSZ-width block. This can only occur in rare cases where very few tuples were written out, which is currently only possible with parallel external tuplesorts. To avoid valgrind complaints, tell it to treat the tail of logtape.c's buffer as defined. Commit `9da0cc3528` exposed this problem but did not create it. LogicalTapeFreeze() has always tended to write out some amount of garbage bytes, but previously never wrote less than one block of data in total, so the problem was masked. Per buildfarm members lousyjack and skink. Peter Geoghegan, based on a suggestion from Tom Lane and me. Some comment revisions by me.	2018-02-06 14:24:57 -05:00
Tom Lane	3785f7eee3	Doc: move info for btree opclass implementors into main documentation. Up to now, useful info for writing a new btree opclass has been buried in the backend's nbtree/README file. Let's move it into the SGML docs, in preparation for extending it with info about "in_range" functions in the upcoming window RANGE patch. To do this, I chose to create a new chapter for btree indexes in Part VII (Internals), parallel to the chapters that exist for the newer index AMs. This is a pretty short chapter as-is. At some point somebody might care to flesh it out with more detail about btree internals, but that is beyond the scope of my ambition for today. Discussion: https://postgr.es/m/23141.1517874668@sss.pgh.pa.us	2018-02-06 13:52:27 -05:00
Robert Haas	f069c91a57	Fix possible crash in partition-wise join. The previous code assumed that we'd always succeed in creating child-joins for a joinrel for which partition-wise join was considered, but that's not guaranteed, at least in the case where dummy rels are involved. Ashutosh Bapat, with some wordsmithing by me. Discussion: http://postgr.es/m/CAFjFpRf8=uyMYYfeTBjWDMs1tR5t--FgOe2vKZPULxxdYQ4RNw@mail.gmail.com	2018-02-05 17:31:57 -05:00
Tom Lane	3492a0af0b	Fix RelationBuildPartitionKey's processing of partition key expressions. Failure to advance the list pointer while reading partition expressions from a list results in invoking an input function with inappropriate data, possibly leading to crashes or, with carefully crafted input, disclosure of arbitrary backend memory. Bug discovered independently by Álvaro Herrera and David Rowley. This patch is by Álvaro but owes something to David's proposed fix. Back-patch to v10 where the issue was introduced. Security: CVE-2018-1052	2018-02-05 10:37:30 -05:00
Tom Lane	05d0f13f07	Skip setting up shared instrumentation for Hash node if not needed. We don't need to set up the shared space for hash join instrumentation data if instrumentation hasn't been requested. Let's follow the example of the similar Sort node code and save a few cycles by skipping that when we can. This reverts commit `d59ff4ab3` and instead allows us to use the safer choice of passing noError = false to shm_toc_lookup in ExecHashInitializeWorker, since if we reach that call there should be a TOC entry to be found. Thomas Munro Discussion: https://postgr.es/m/E1ehkoZ-0005uW-43%40gemulon.postgresql.org	2018-02-04 22:14:07 -05:00
Tom Lane	d59ff4ab31	Fix another instance of unsafe coding for shm_toc_lookup failure. One or another author of commit `5bcf389ec` seems to have thought that computing an offset from a NULL pointer would yield another NULL pointer. There may possibly be architectures where that works, but common machines don't work like that. Per a quick code review of places calling shm_toc_lookup and not using noError = false.	2018-02-02 18:32:05 -05:00
Tom Lane	957ff087c8	Be more wary about shm_toc_lookup failure. Commit `445dbd82a` basically missed the point of commit `d46633506`, which was that we shouldn't allow shm_toc_lookup() failure to lead to a core dump or assertion crash, because the odds of such a failure should never be considered negligible. It's correct that we can't expect the PARALLEL_KEY_ERROR_QUEUE TOC entry to be there if we have no workers. But if we have no workers, we're not going to do anything in this function with the lookup result anyway, so let's just skip it. That lets the code use the easy-to-prove-safe noError=false case, rather than anything requiring effort to review. Back-patch to v10, like the previous commit. Discussion: https://postgr.es/m/3647.1517601675@sss.pgh.pa.us	2018-02-02 18:26:07 -05:00
Peter Eisentraut	533c5d8bdd	Fix application of identity values in some cases Investigation of `2d2d06b7e2` revealed that identity values were not applied in some further cases, including logical replication subscribers, VALUES RTEs, and ALTER TABLE ... ADD COLUMN. To fix all that, apply the identity column expression in build_column_default() instead of repeating the same logic at each call site. For ALTER TABLE ... ADD COLUMN ... IDENTITY, the previous coding completely ignored that existing rows for the new column should have values filled in from the identity sequence. The coding using build_column_default() fails for this because the sequence ownership isn't registered until after ALTER TABLE, and we can't do it before because we don't have the column in the catalog yet. So we specially remember in ColumnDef the sequence name that we decided on and build a custom NextValueExpr using that. Reviewed-by: Michael Paquier <michael.paquier@gmail.com>	2018-02-02 14:39:10 -05:00
Robert Haas	9da0cc3528	Support parallel btree index builds. To make this work, tuplesort.c and logtape.c must also support parallelism, so this patch adds that infrastructure and then applies it to the particular case of parallel btree index builds. Testing to date shows that this can often be 2-3x faster than a serial index build. The model for deciding how many workers to use is fairly primitive at present, but it's better than not having the feature. We can refine it as we get more experience. Peter Geoghegan with some help from Rushabh Lathia. While Heikki Linnakangas is not an author of this patch, he wrote other patches without which this feature would not have been possible, and therefore the release notes should possibly credit him as an author of this feature. Reviewed by Claudio Freire, Heikki Linnakangas, Thomas Munro, Tels, Amit Kapila, me. Discussion: http://postgr.es/m/CAM3SWZQKM=Pzc=CAHzRixKjp2eO5Q0Jg1SoFQqeXFQ647JiwqQ@mail.gmail.com Discussion: http://postgr.es/m/CAH2-Wz=AxWqDoVvGU7dq856S4r6sJAj6DBn7VMtigkB33N5eyg@mail.gmail.com	2018-02-02 13:32:44 -05:00
Robert Haas	9aef173163	Refactor code for partition bound searching Remove partition_bound_cmp() and partition_bound_bsearch(), whose void * argument could be, depending on the situation, of any of three different types: PartitionBoundSpec , PartitionRangeBound , Datum *. Instead, introduce separate bound-searching functions for each situation: partition_list_bsearch, partition_range_bsearch, partition_range_datum_bsearch, and partition_hash_bsearch. This requires duplicating the code for binary search, but it makes the code much more type safe, involves fewer branches at runtime, and at least in my opinion, is much easier to understand. Along the way, add an option to partition_range_datum_bsearch allowing the number of keys to be specified, so that we can search for partitions based on a prefix of the full list of partition keys. This is important for pending work to improve partition pruning. Amit Langote, per a suggestion from me. Discussion: http://postgr.es/m/CA+TgmoaVLDLc8=YESRwD32gPhodU_ELmXyKs77gveiYp+JE4vQ@mail.gmail.com	2018-02-02 09:32:44 -05:00
Robert Haas	9222c0d9ed	Add new function WaitForParallelWorkersToAttach. Once this function has been called, we know that all workers have started and attached to their error queues -- so if any of them subsequently exit uncleanly, we'll be sure to throw an ERROR promptly. Otherwise, users of the ParallelContext machinery must be careful not to wait forever for a worker that has failed to start. Parallel query manages to work without needing this for reasons explained in new comments added by this patch, but it's a useful primitive for other parallel operations, such as the pending patch to make creating a btree index run in parallel. Amit Kapila, revised by me. Additional review by Peter Geoghegan. Discussion: http://postgr.es/m/CAA4eK1+e2MzyouF5bg=OtyhDSX+=Ao=3htN=T-r_6s3gCtKFiw@mail.gmail.com	2018-02-02 09:00:59 -05:00
Robert Haas	ad25a6b1f2	Fix possible failure to mark hash metapage dirty. Report and suggested fix by Lixian Zou. Amit Kapila put it in the form of a patch and reviewed. Discussion: http://postgr.es/m/151739848647.1239.12528851873396651946@wrigleys.postgresql.org	2018-02-01 15:23:45 -05:00
Robert Haas	22757960bb	Fix typo: colums -> columns. Along the way, also fix code indentation. Alexander Lakhin, reviewed by Michael Paquier Discussion: http://postgr.es/m/45c44aa7-7cfa-7f3b-83fd-d8300677fdda@gmail.com	2018-01-31 16:45:37 -05:00
Robert Haas	3ccdc6f9a5	Fix list partition constraints for partition keys of array type. The old code generated always generated a constraint of the form col = ANY(ARRAY[val1, val2, ...]), but that's invalid when col is an array type. Instead, generate col = val when there's only one value, col = val1 OR col = val2 OR ... when there are multiple values and col is of array type, and the old form when there are multiple values and col is not of an array type. As a side benefit, this makes constraint exclusion able to prune a list partition declared to accept a single Boolean value, which didn't work before. Amit Langote, reviewed by Etsuro Fujita Discussion: http://postgr.es/m/97267195-e235-89d1-a41a-c110198dfce9@lab.ntt.co.jp	2018-01-31 15:43:11 -05:00
Peter Eisentraut	38d485fdaa	Fix up references to scram-sha-256 pg_hba_file_rules erroneously reported this as scram-sha256. Fix that. To avoid future errors and confusion, also adjust documentation links and internal symbols to have a separator between "sha" and "256". Reported-by: Christophe Courtois <christophe.courtois@dalibo.com> Author: Michael Paquier <michael.paquier@gmail.com>	2018-01-30 16:50:30 -05:00
Peter Eisentraut	a044378ce2	Add some noreturn attributes to help static analyzers	2018-01-29 20:44:35 -05:00
Peter Eisentraut	07e524d3e9	Silence complaint about dead assignment The preferred place for "placate compiler" assignments is after elog(ERROR), not before it. Otherwise, scan-build complains about a dead assignment.	2018-01-29 20:43:43 -05:00
Andres Freund	c12693d8f3	Introduce ExecQualAndReset() helper. It's a common task to evaluate a qual and reset the corresponding expression context. Currently that requires storing the result of the qual eval, resetting the context, and then reacting on the result. As that's awkward several places only reset the context next time through a node. That's not great, so introduce a helper that evaluates and resets. It's a bit ugly that it currently uses MemoryContextReset() instead of ResetExprContext(), but that seems easier than reordering all of executor.h. Author: Andres Freund Discussion: https://postgr.es/m/20180109222544.f7loxrunqh3xjl5f@alap3.anarazel.de	2018-01-29 12:19:12 -08:00
Tom Lane	97d4445a03	Save a few bytes by removing useless last argument to SearchCatCacheList. There's never any value in giving a fully specified cache key to SearchCatCacheList: you might as well call SearchCatCache instead, since there could be only one match. So the maximum useful number of key arguments is one less than the supported number of key columns. We might as well remove the useless extra argument and save some few bytes per call site, as well as a cycle or so per call. I believe the reason it was coded like this is that originally, callers had to write out all the dummy arguments in each call, and so it seemed less confusing if SearchCatCache and SearchCatCacheList took the same number of key arguments. But since commit `e26c539e9`, callers only write their live arguments explicitly, making that a non-factor; and there's surely been enough time for third-party modules to adapt to that coding style. So this is only an ABI break not an API break for callers. Per discussion with Oliver Ford, this might also make it less confusing how to use SearchCatCacheList correctly. Discussion: https://postgr.es/m/27788.1517069693@sss.pgh.pa.us	2018-01-29 15:13:17 -05:00
Andres Freund	fc96c69425	Initialize unused ExprEvalStep fields. ExecPushExprSlots didn't initialize ExprEvalStep's resvalue/resnull steps as it didn't use them. That caused wrong valgrind warnings for an upcoming patch, so zero-intialize. Also zero-initialize all scratch ExprEvalStep's allocated on the stack, to avoid issues with similar future omissions of non-critial data.	2018-01-29 12:01:07 -08:00
Andres Freund	c068f87723	Improve bit perturbation in TupleHashTableHash. The changes in `b81b5a96f4` did not fully address the issue, because the bit-mixing of the IV into the final hash-key didn't prevent clustering in the input-data survive in the output data. This didn't cause a lot of problems because of the additional growth conditions added `d4c62a6b62`. But as we want to rein those in due to explosive growth in some edges, this needs to be fixed. Author: Andres Freund Discussion: https://postgr.es/m/20171127185700.1470.20362@wrigleys.postgresql.org Backpatch: 10, where simplehash was introduced	2018-01-29 11:24:57 -08:00
Tom Lane	35a528062c	Add stack-overflow guards in set-operation planning. create_plan_recurse lacked any stack depth check. This is not per our normal coding rules, but I'd supposed it was safe because earlier planner processing is more complex and presumably should eat more stack. But bug #15033 from Andrew Grossman shows this isn't true, at least not for queries having the form of a many-thousand-way INTERSECT stack. Further testing showed that recurse_set_operations is also capable of being crashed in this way, since it likewise will recurse to the bottom of a parsetree before calling any support functions that might themselves contain any stack checks. However, its stack consumption is only perhaps a third of create_plan_recurse's. It's possible that this particular problem with create_plan_recurse can only manifest in 9.6 and later, since before that we didn't build a Path tree for set operations. But having seen this example, I now have no faith in the proposition that create_plan_recurse doesn't need a stack check, so back-patch to all supported branches. Discussion: https://postgr.es/m/20180127050845.28812.58244@wrigleys.postgresql.org	2018-01-28 13:39:07 -05:00
Bruce Momjian	010123e144	C includes: Reorder C includes in partition.c Discussion: https://postgr.es/m/5A69AA50.2060600@lab.ntt.co.jp Author: Etsuro Fujita	2018-01-27 23:05:52 -05:00
Tom Lane	2e668c522e	Avoid crash during EvalPlanQual recheck of an inner indexscan. Commit `09529a70b` changed nodeIndexscan.c and nodeIndexonlyscan.c to postpone initialization of the indexscan proper until the first tuple fetch. It overlooked the question of mark/restore behavior, which means that if some caller attempts to mark the scan before the first tuple fetch, you get a null pointer dereference. The only existing user of mark/restore is nodeMergejoin.c, which (somewhat accidentally) will never attempt to set a mark before the first inner tuple unless the inner child node is a Material node. Hence the case can't arise normally, so it seems sufficient to document the assumption at both ends. However, during an EvalPlanQual recheck, ExecScanFetch doesn't call IndexNext but just returns the jammed-in test tuple. Therefore, if we're doing a recheck in a plan tree with a mergejoin with inner indexscan, it's possible to reach ExecIndexMarkPos with iss_ScanDesc still null, as reported by Guo Xiang Tan in bug #15032. Really, when there's a test tuple supplied during an EPQ recheck, touching the index at all is the wrong thing: rather, the behavior of mark/restore ought to amount to saving and restoring the es_epqScanDone flag. We can avoid finding a place to actually save the flag, for the moment, because given the assumption that no caller will set a mark before fetching a tuple, es_epqScanDone must always be set by the time we try to mark. So the actual behavior change required is just to not reach the index access if a test tuple is supplied. The set of plan node types that need to consider this issue are those that support EPQ test tuples (i.e., call ExecScan()) and also support mark/restore; which is to say, IndexScan, IndexOnlyScan, and perhaps CustomScan. It's tempting to try to fix the problem in one place by teaching ExecMarkPos() itself about EPQ; but ExecMarkPos supports some plan types that aren't Scans, and also it seems risky to make assumptions about what a CustomScan wants to do here. Also, the most likely future change here is to decide that we do need to support marks placed before the first tuple, which would require additional work in IndexScan and IndexOnlyScan in any case. Hence, fix the EPQ issue in nodeIndexscan.c and nodeIndexonlyscan.c, accepting the small amount of code duplicated thereby, and leave it to CustomScan providers to fix this bug if they have it. Back-patch to v10 where commit `09529a70b` came in. In earlier branches, the index_markpos() call is a waste of cycles when EPQ is active, but no more than that, so it doesn't seem appropriate to back-patch further. Discussion: https://postgr.es/m/20180126074932.3098.97815@wrigleys.postgresql.org	2018-01-27 13:52:24 -05:00
Tom Lane	fb8697b31a	Avoid unnecessary use of pg_strcasecmp for already-downcased identifiers. We have a lot of code in which option names, which from the user's viewpoint are logically keywords, are passed through the grammar as plain identifiers, and then matched to string literals during command execution. This approach avoids making words into lexer keywords unnecessarily. Some places matched these strings using plain strcmp, some using pg_strcasecmp. But the latter should be unnecessary since identifiers would have been downcased on their way through the parser. Aside from any efficiency concerns (probably not a big factor), the lack of consistency in this area creates a hazard of subtle bugs due to different places coming to different conclusions about whether two option names are the same or different. Hence, standardize on using strcmp() to match any option names that are expected to have been fed through the parser. This does create a user-visible behavioral change, which is that while formerly all of these would work: alter table foo set (fillfactor = 50); alter table foo set (FillFactor = 50); alter table foo set ("fillfactor" = 50); alter table foo set ("FillFactor" = 50); now the last case will fail because that double-quoted identifier is different from the others. However, none of our documentation says that you can use a quoted identifier in such contexts at all, and we should discourage doing so since it would break if we ever decide to parse such constructs as true lexer keywords rather than poor man's substitutes. So this shouldn't create a significant compatibility issue for users. Daniel Gustafsson, reviewed by Michael Paquier, small changes by me Discussion: https://postgr.es/m/29405B24-564E-476B-98C0-677A29805B84@yesql.se	2018-01-26 18:25:14 -05:00
Robert Haas	9fd8b7d632	Factor some code out of create_grouping_paths. This is preparatory refactoring to prepare the way for partition-wise aggregate, which will reuse the new subroutines for child grouping rels. It also does not seem like a bad idea on general principle, as the function was getting pretty long. Jeevan Chalke. The larger patch series of which this patch is a part was reviewed and tested by Antonin Houska, Rajkumar Raghuwanshi, Ashutosh Bapat, David Rowley, Dilip Kumar, Konstantin Knizhnik, Pascal Legrand, and me. Some cosmetic changes by me. Discussion: http://postgr.es/m/CAM2+6=V64_xhstVHie0Rz=KPEQnLJMZt_e314P0jaT_oJ9MR8A@mail.gmail.com	2018-01-26 15:03:12 -05:00
Tom Lane	4971d2a322	Remove the obsolete WITH clause of CREATE FUNCTION. This clause was superseded by SQL-standard syntax back in 7.3. We've kept it around for backwards-compatibility purposes ever since; but 15 years seems like long enough for that, especially seeing that there are undocumented weirdnesses in how it interacts with the SQL-standard syntax for specifying the same options. Michael Paquier, per an observation by Daniel Gustafsson; some small cosmetic adjustments to nearby code by me. Discussion: https://postgr.es/m/20180115022748.GB1724@paquier.xyz	2018-01-26 12:25:44 -05:00
Peter Eisentraut	c1869542b3	Use abstracted SSL API in server connection log messages The existing "connection authorized" server log messages used OpenSSL API calls directly, even though similar abstracted API calls exist. Change to use the latter instead. Change the function prototype for the functions that return the TLS version and the cipher to return const char * directly instead of copying into a buffer. That makes them slightly easier to use. Add bits= to the message. psql shows that, so we might as well show the same information on the client and server. Reviewed-by: Daniel Gustafsson <daniel@yesql.se> Reviewed-by: Michael Paquier <michael.paquier@gmail.com>	2018-01-26 09:50:46 -05:00
Tom Lane	bb415675d8	Add missing "static" markers. Per buildfarm.	2018-01-25 14:32:28 -05:00
Alvaro Herrera	05fb5d6619	Ignore partitioned indexes where appropriate get_relation_info() was too optimistic about opening indexes in partitioned tables, which would raise errors when any queries were planned on such tables. Fix by ignoring any indexes of the partitioned kind. CLUSTER (and ALTER TABLE CLUSTER ON) had a similar problem. Fix by disallowing these commands in partitioned tables. Fallout from `8b08f7d482`.	2018-01-25 16:12:15 -03:00
Peter Eisentraut	0b5e33f667	Remove use of byte-masking macros in record_image_cmp These were introduced in `4cbb646334`, but after further analysis and testing, they should not be necessary and probably weren't the part of that commit that fixed anything. Reviewed-by: Michael Paquier <michael.paquier@gmail.com>	2018-01-25 09:41:19 -05:00
Robert Haas	945f71db84	Avoid referencing off the end of subplan_partition_offsets. Report by buildfarm member skink and Tom Lane. Analysis by me. Patch by Amit Khandekar. Discussion: http://postgr.es/m/CAJ3gD9fVA1iXQYhfqHP5n_TEd4U9=V8TL_cc-oKRnRmxgdvJrQ@mail.gmail.com	2018-01-24 16:34:51 -05:00
Tom Lane	434e6e1484	Improve implementation of pg_attribute_always_inline. Avoid compiler warnings on MSVC (which doesn't want to see both __forceinline and inline) and ancient GCC (which doesn't have __attribute__((always_inline))). Don't force inline-ing when building at -O0, as the programmer is probably hoping for exact source-to-object-line correspondence in that case. (For the moment this only works for GCC; maybe we can extend it later.) Make pg_attribute_always_inline be syntactically a drop-in replacement for inline, rather than an additional wart. And improve the comments. Thomas Munro and Michail Nikolaev, small tweaks by me Discussion: https://postgr.es/m/32278.1514863068@sss.pgh.pa.us Discussion: https://postgr.es/m/CANtu0oiYp74brgntKOxgg1FK5+t8uQ05guSiFU6FYz_5KUhr6Q@mail.gmail.com	2018-01-23 23:07:13 -05:00
Tom Lane	bb94ce4d26	Teach reparameterize_path() to handle AppendPaths. If we're inside a lateral subquery, there may be no unparameterized paths for a particular child relation of an appendrel, in which case we must be able to create similarly-parameterized paths for each other child relation, else the planner will fail with "could not devise a query plan for the given query". This means that there are situations where we'd better be able to reparameterize at least one path for each child. This calls into question the assumption in reparameterize_path() that it can just punt if it feels like it. However, the only case that is known broken right now is where the child is itself an appendrel so that all its paths are AppendPaths. (I think possibly I disregarded that in the original coding on the theory that nested appendrels would get folded together --- but that only happens after reparameterize_path(), so it's not excused from handling a child AppendPath.) Given that this code's been like this since 9.3 when LATERAL was introduced, it seems likely we'd have heard of other cases by now if there were a larger problem. Per report from Elvis Pranskevichus. Back-patch to 9.3. Discussion: https://postgr.es/m/5981018.zdth1YWmNy@hammer.magicstack.net	2018-01-23 16:50:34 -05:00
Alvaro Herrera	95be5ce1bc	Remove unnecessary include autovacuum.c no longer needs dsa.h, since commit `31ae1638ce`. Author: Masahiko Sawada Discussion: https://postgr.es/m/CAD21AoCWvYyXrvdANSHWWWEWJH5TeAWAkJ_2gqrHhukG+OBo1g@mail.gmail.com	2018-01-23 15:22:13 -03:00
Robert Haas	28e04155f1	Update obsolete sentence in README.parallel. Since 9.6, heavyweight locking is not an abstract and unhandled concern of the parallel machinery, but rather something to which we have a specific approach.	2018-01-23 11:22:47 -05:00
Robert Haas	2badb5afb8	Report an ERROR if a parallel worker fails to start properly. Commit `28724fd90d` fixed things so that if a background worker fails to start due to fork() failure or because it is terminated before startup succeeds, BGWH_STOPPED will be reported. However, that only helps if the code that uses the background worker machinery notices the change in status, and the code in parallel.c did not. To fix that, do two things. First, make sure that when a worker exits, it triggers the leader to read from error queues. That way, if a worker which has attached to an error queue exits uncleanly, the leader is sure to throw some error, either the contents of the ErrorResponse sent by the worker, or "lost connection to parallel worker" if it exited without sending one. To cover the case where the worker never starts up in the first place or exits before attaching to the error queue, the ParallelContext now keeps track of which workers have sent at least one message via the error queue. A worker which sends no messages by the time the parallel operation finishes will be checked to see whether it exited before attaching to the error queue; if so, a new error message, "parallel worker failed to initialize", will be reported. If not, we'll continue to wait until it either starts up and exits cleanly, starts up and exits uncleanly, or fails to start, and then take the appropriate action. Patch by me, reviewed by Amit Kapila. Discussion: http://postgr.es/m/CA+TgmoYnBgXgdTu6wk5YPdWhmgabYc9nY_pFLq=tB=FSLYkD8Q@mail.gmail.com	2018-01-23 11:03:03 -05:00
Peter Eisentraut	1c2183403b	Extract common bits from OpenSSL implementation Some things in be-secure-openssl.c and fe-secure-openssl.c were not actually specific to OpenSSL but could also be used by other implementations. In order to avoid copy-and-pasting, move some of that code to common files.	2018-01-23 07:11:39 -05:00
Peter Eisentraut	f966101d19	Move SSL API comments to header files Move the documentation of the SSL API calls are supposed to do into the headers files, instead of keeping them in the files for the OpenSSL implementation. That way, they don't have to be duplicated or be inconsistent when other implementations are added.	2018-01-23 07:11:39 -05:00
Peter Eisentraut	573bd08b99	Move EDH support to common files The EDH support is not really specific to the OpenSSL implementation, so move the support and documentation comments to common files.	2018-01-23 07:11:38 -05:00
Peter Eisentraut	7404e77cc1	Split out documentation of SSL parameters into their own section Split the "Authentication and Security" section into two separate sections "Authentication" and "SSL". The latter part has gotten much longer over time, and doesn't primarily have to do with authentication. Also, the row_security parameter was inconsistently categorized, so clean that up while we're here.	2018-01-23 07:11:38 -05:00
Peter Eisentraut	8561e4840c	Transaction control in PL procedures In each of the supplied procedural languages (PL/pgSQL, PL/Perl, PL/Python, PL/Tcl), add language-specific commit and rollback functions/commands to control transactions in procedures in that language. Add similar underlying functions to SPI. Some additional cleanup so that transaction commit or abort doesn't blow away data structures still used by the procedure call. Add execution context tracking to CALL and DO statements so that transaction control commands can only be issued in top-level procedure and block calls, not function calls or other procedure or block calls. - SPI Add a new function SPI_connect_ext() that is like SPI_connect() but allows passing option flags. The only option flag right now is SPI_OPT_NONATOMIC. A nonatomic SPI connection can execute transaction control commands, otherwise it's not allowed. This is meant to be passed down from CALL and DO statements which themselves know in which context they are called. A nonatomic SPI connection uses different memory management. A normal SPI connection allocates its memory in TopTransactionContext. For nonatomic connections we use PortalContext instead. As the comment in SPI_connect_ext() (previously SPI_connect()) indicates, one could potentially use PortalContext in all cases, but it seems safest to leave the existing uses alone, because this stuff is complicated enough already. SPI also gets new functions SPI_start_transaction(), SPI_commit(), and SPI_rollback(), which can be used by PLs to implement their transaction control logic. - portalmem.c Some adjustments were made in the code that cleans up portals at transaction abort. The portal code could already handle a command committing a transaction and continuing (e.g., VACUUM), but it was not quite prepared for a command aborting a transaction and continuing. In AtAbort_Portals(), remove the code that marks an active portal as failed. As the comment there already predicted, this doesn't work if the running command wants to keep running after transaction abort. And it's actually not necessary, because pquery.c is careful to run all portal code in a PG_TRY block and explicitly runs MarkPortalFailed() if there is an exception. So the code in AtAbort_Portals() is never used anyway. In AtAbort_Portals() and AtCleanup_Portals(), we need to be careful not to clean up active portals too much. This mirrors similar code in PreCommit_Portals(). - PL/Perl Gets new functions spi_commit() and spi_rollback() - PL/pgSQL Gets new commands COMMIT and ROLLBACK. Update the PL/SQL porting example in the documentation to reflect that transactions are now possible in procedures. - PL/Python Gets new functions plpy.commit and plpy.rollback. - PL/Tcl Gets new commands commit and rollback. Reviewed-by: Andrew Dunstan <andrew.dunstan@2ndquadrant.com>	2018-01-22 08:43:06 -05:00
Magnus Hagander	1cc4f536ef	Support huge pages on Windows Add support for huge pages (called large pages on Windows) to the Windows build. This (probably) breaks compatibility with Windows versions prior to Windows 2003 or Windows Vista. Authors: Takayuki Tsunakawa and Thomas Munro Reviewed by: Magnus Hagander, Amit Kapila	2018-01-21 15:40:46 +01:00
Tom Lane	96102a32a3	Suppress possibly-uninitialized-variable warnings. Apparently, Peter's compiler has faith that the switch test values here could never not be valid values of their enums. Mine does not, and I tend to agree with it.	2018-01-19 22:16:25 -05:00
Robert Haas	2f17844104	Allow UPDATE to move rows between partitions. When an UPDATE causes a row to no longer match the partition constraint, try to move it to a different partition where it does match the partition constraint. In essence, the UPDATE is split into a DELETE from the old partition and an INSERT into the new one. This can lead to surprising behavior in concurrency scenarios because EvalPlanQual rechecks won't work as they normally did; the known problems are documented. (There is a pending patch to improve the situation further, but it needs more review.) Amit Khandekar, reviewed and tested by Amit Langote, David Rowley, Rajkumar Raghuwanshi, Dilip Kumar, Amul Sul, Thomas Munro, Álvaro Herrera, Amit Kapila, and me. A few final revisions by me. Discussion: http://postgr.es/m/CAJ3gD9do9o2ccQ7j7+tSgiE1REY65XRiMb=yJO3u3QhyP8EEPQ@mail.gmail.com	2018-01-19 15:33:06 -05:00
Alvaro Herrera	7f17fd6fc7	Fix CompareIndexInfo's attnum comparisons When an index column is an expression, it makes no sense to compare its attribute numbers. This seems to account for remaining buildfarm fallout from `8b08f7d482`. At least, it solves the issue in my local 32bit VM -- let's see what the rest thinks.	2018-01-19 16:56:42 -03:00
Peter Eisentraut	8b9e9644dc	Replace AclObjectKind with ObjectType AclObjectKind was basically just another enumeration for object types, and we already have a preferred one for that. It's only used in aclcheck_error. By using ObjectType instead, we can also give some more precise error messages, for example "index" instead of "relation". Reviewed-by: Michael Paquier <michael.paquier@gmail.com>	2018-01-19 14:01:15 -05:00
Peter Eisentraut	2c6f37ed62	Replace GrantObjectType with ObjectType There used to be a lot of different Type and Kind symbol groups to address objects within different commands, most of which have been replaced by ObjectType, starting with `b256f24264`. But this conversion was never done for the ACL commands until now. This change ends up being just a plain replacement of the types and symbols, without any code restructuring needed, except deleting some now redundant code. Reviewed-by: Michael Paquier <michael.paquier@gmail.com> Reviewed-by: Stephen Frost <sfrost@snowman.net>	2018-01-19 14:01:14 -05:00
Alvaro Herrera	8b08f7d482	Local partitioned indexes When CREATE INDEX is run on a partitioned table, create catalog entries for an index on the partitioned table (which is just a placeholder since the table proper has no data of its own), and recurse to create actual indexes on the existing partitions; create them in future partitions also. As a convenience gadget, if the new index definition matches some existing index in partitions, these are picked up and used instead of creating new ones. Whichever way these indexes come about, they become attached to the index on the parent table and are dropped alongside it, and cannot be dropped on isolation unless they are detached first. To support pg_dump'ing these indexes, add commands CREATE INDEX ON ONLY <table> (which creates the index on the parent partitioned table, without recursing) and ALTER INDEX ATTACH PARTITION (which is used after the indexes have been created individually on each partition, to attach them to the parent index). These reconstruct prior database state exactly. Reviewed-by: (in alphabetical order) Peter Eisentraut, Robert Haas, Amit Langote, Jesper Pedersen, Simon Riggs, David Rowley Discussion: https://postgr.es/m/20171113170646.gzweigyrgg6pwsg4@alvherre.pgsql	2018-01-19 11:49:22 -03:00
Alvaro Herrera	1ef61ddce9	Fix StoreCatalogInheritance1 to use 32bit inhseqno For no apparent reason, this function was using a 16bit-wide inhseqno value, rather than the correct 32 bit width which is what is stored in the pg_inherits catalog. This becomes evident if you try to create a table with more than 65535 parents, because this error appears: ERROR: duplicate key value violates unique constraint «pg_inherits_relid_seqno_index» DETAIL: Key (inhrelid, inhseqno)=(329371, 0) already exists. Needless to say, having so many parents is an uncommon situations, which explains why this error has never been reported despite being having been introduced with the Postgres95 1.01 sources in commit d31084e9d111: https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/commands/creatinh.c;hb=d31084e9d111#l349 Backpatch all the way back. David Rowley noticed this while reviewing a patch of mine. Discussion: https://postgr.es/m/CAKJS1f8Dn7swSEhOWwzZzssW7747YB=2Hi+T7uGud40dur69-g@mail.gmail.com	2018-01-19 10:17:54 -03:00
Robert Haas	29d58fd3ad	Transfer state pertaining to pending REINDEX operations to workers. This will allow the pending patch for parallel CREATE INDEX to work on system catalogs, and to provide the same level of protection against use of user indexes while they are being rebuilt that we have for non-parallel CREATE INDEX. Patch by me, reviewed by Peter Geoghegan. Discussion: http://postgr.es/m/CA+TgmoYN-YQU9JsGQcqFLovZ-C+Xgp1_xhJQad=cunGG-_p5gg@mail.gmail.com Discussion: http://postgr.es/m/CAH2-Wzkv4UNkXYhqQRqk-u9rS7h5c-4cCW+EqQ8K_WSeS43aZg@mail.gmail.com	2018-01-19 07:48:54 -05:00
Simon Riggs	4e54dd2e0a	Fix typo in recent commit Typo in `9c7d06d606` Reported-by: Masahiko Sawada	2018-01-19 06:36:17 +00:00
Peter Eisentraut	a228e44ce4	Update comment The "callback" that this comment was referring to was removed by commit `c0a15e07cd`, so update to match the current code.	2018-01-18 19:36:34 -05:00
Bruce Momjian	f033462d8f	Reorder C includes Reorder header files in joinrels.c and pathnode.c in alphabetical order, removing unnecessary ones. Author: Etsuro Fujita	2018-01-17 18:10:05 -05:00
Tom Lane	dca48d145e	Remove useless lookup of root partitioned rel in ExecInitModifyTable(). node->partitioned_rels is only set in UPDATE/DELETE cases, but ExecInitModifyTable only uses its "rel" variable in INSERT cases, so the extra logic to find the root rel is just a waste of complexity and cycles. Etsuro Fujita, reviewed by Amit Langote Discussion: https://postgr.es/m/93cf9816-2f7d-0f67-8ed2-4a4e497a6ab8@lab.ntt.co.jp	2018-01-17 14:44:15 -05:00
Simon Riggs	9c7d06d606	Ability to advance replication slots Ability to advance both physical and logical replication slots using a new user function pg_replication_slot_advance(). For logical advance that means records are consumed as fast as possible and changes are not given to output plugin for sending. Makes 2nd phase (after we reached SNAPBUILD_FULL_SNAPSHOT) of replication slot creation faster, especially when there are big transactions as the reorder buffer does not have to deal with data changes and does not have to spill to disk. Author: Petr Jelinek Reviewed-by: Simon Riggs	2018-01-17 11:38:34 +00:00
Andrew Dunstan	cc4feded0a	Centralize json and jsonb handling of datetime types The creates a single function JsonEncodeDateTime which will format these data types in an efficient and consistent manner. This will be all the more important when we come to jsonpath so we don't have to implement yet more code doing the same thing in two more places. This also extends the code to handle time and timetz types which were not previously handled specially. This requires exposing the time2tm and timetz2tm functions. Patch from Nikita Glukhov	2018-01-16 19:07:13 -05:00
Peter Eisentraut	d91da5eced	Remove useless use of bit-masking macros In this case, the macros SET_8_BYTES(), GET_8_BYTES(), SET_4_BYTES(), GET_4_BYTES() are no-ops, so we can just remove them. The plan is to perhaps remove them from the source code altogether, so we'll start here. Discussion: https://www.postgresql.org/message-id/5d51721a-69ef-2053-9172-599b539f0628@2ndquadrant.com	2018-01-16 17:12:16 -05:00
Tom Lane	680d540502	Avoid unnecessary failure in SELECT concurrent with ALTER NO INHERIT. If a query against an inheritance tree runs concurrently with an ALTER TABLE that's disinheriting one of the tree members, it's possible to get a "could not find inherited attribute" error because after obtaining lock on the removed member, make_inh_translation_list sees that its columns have attinhcount=0 and decides they aren't the columns it's looking for. An ideal fix, perhaps, would avoid including such a just-removed member table in the query at all; but there seems no way to accomplish that without adding expensive catalog rechecks or creating a likelihood of deadlocks. Instead, let's just drop the check on attinhcount. In this way, a query that's included a just-disinherited child will still succeed, which is not a completely unreasonable behavior. This problem has existed for a long time, so back-patch to all supported branches. Also add an isolation test verifying related behaviors. Patch by me; the new isolation test is based on Kyotaro Horiguchi's work. Discussion: https://postgr.es/m/20170626.174612.23936762.horiguchi.kyotaro@lab.ntt.co.jp	2018-01-12 15:46:37 -05:00
Tom Lane	90947674fc	Fix incorrect handling of subquery pullup in the presence of grouping sets. If we flatten a subquery whose target list contains constants or expressions, when those output columns are used in GROUPING SET columns, the planner was capable of doing the wrong thing by merging a pulled-up expression into the surrounding expression during const-simplification. Then the late processing that attempts to match subexpressions to grouping sets would fail to match those subexpressions to grouping sets, with the effect that they'd not go to null when expected. To fix, wrap such subquery outputs in PlaceHolderVars, ensuring that they preserve their separate identity throughout the planner's expression processing. This is a bit of a band-aid, because the wrapper defeats const-simplification even in places where it would be safe to allow. But a nicer fix would likely be too invasive to back-patch, and the consequences of the missed optimizations probably aren't large in most cases. Back-patch to 9.5 where grouping sets were introduced. Heikki Linnakangas, with small mods and better test cases by me; additional review by Andrew Gierth Discussion: https://postgr.es/m/7dbdcf5c-b5a6-ef89-4958-da212fe10176@iki.fi	2018-01-12 12:24:50 -05:00
Alvaro Herrera	49c784ece7	Remove hard-coded schema knowledge about pg_attribute from genbki.pl Add the ability to label a column's default value in the catalog header, and implement this for pg_attribute. A new function in Catalog.pm is used to fill in a tuple with defaults. The build process will complain loudly if a catalog entry is incomplete, Commit `8137f2c323` labeled variable length columns for the C preprocessor. Expose that label to genbki.pl so we can exclude those columns from schema macros in a general fashion. Also, format schema macro entries according to their types. This means slightly less code maintenance, but more importantly it's a proving ground for mechanisms intended to be used in later commits. While at it, I (Álvaro) couldn't resist making some changes in genbki.pl: rename some functions to actually indicate their purpose instead of actively misleading onlookers; and don't iterate on the whole of pg_type to find the entry for each catalog row, using a hash instead of an array. Author: John Naylor, some changes by Álvaro Herrera Discussion: https://postgr.es/m/CAJVSVGVJHwD8sfDfZW9TbCHWKf=C1YDRM-rF=2JenRU_y+VcFg@mail.gmail.com	2018-01-12 11:21:42 -03:00
Bruce Momjian	bdb70c12b3	C comment: fix "the the" mentions in C comments Reported-by: Christoph Dreis Discussion: https://postgr.es/m/007e01d3519e$2734ca10$759e5e30$@freenet.de Author: Christoph Dreis	2018-01-11 21:50:21 -05:00
Tom Lane	4d41b2e092	Add QueryEnvironment to ExplainOneQuery_hook's parameter list. This should have been done in commit `18ce3a4ab`, which added that parameter to ExplainOneQuery, but it was overlooked. This makes it impossible for a user of the hook to pass the queryEnv down to ExplainOnePlan. It's too late to change this API in v10, I suppose, but fortunately passing NULL to ExplainOnePlan will work in nearly all interesting cases in v10. That might not be true forever, so we'd better fix it. Tatsuro Yamada, reviewed by Thomas Munro Discussion: https://postgr.es/m/890e8dd9-c1c7-a422-6892-874f5eaee048@lab.ntt.co.jp	2018-01-11 12:16:18 -05:00
Peter Eisentraut	9e945f8626	Fix Latin spelling "c.f." should be "cf.".	2018-01-11 08:32:01 -05:00
Peter Eisentraut	b48b2f8793	Revert "Move portal pinning from PL/pgSQL to SPI" This reverts commit `b3617cdfbb`. This broke returning unnamed cursors from PL/pgSQL functions. Apparently, there are no test cases for this.	2018-01-10 16:01:17 -05:00
Tom Lane	3afd75eaac	Remove dubious micro-optimization in ckpt_buforder_comparator(). It seems incorrect to assume that the list of CkptSortItems can never contain duplicate page numbers: concurrent activity could result in some page getting dropped from a low-numbered buffer and later loaded into a high-numbered buffer while BufferSync is scanning the buffer pool. If that happened, the comparator would give self-inconsistent results, potentially confusing qsort(). Saving one comparison step is not worth possibly getting the sort wrong. So far as I can tell, nothing would actually go wrong given our current implementation of qsort(). It might get a bit slower than expected if there were a large number of duplicates of one value, but that's surely a probability-epsilon case. Still, the comment is wrong, and if we ever switched to another sort implementation it might be less forgiving. In passing, avoid casting away const-ness of the argument pointers; I've not seen any compiler complaints from that, but it seems likely that some compilers would not like it. Back-patch to 9.6 where this code came in, just in case I've underestimated the possible consequences. Discussion: https://postgr.es/m/18437.1515607610@sss.pgh.pa.us	2018-01-10 15:50:54 -05:00
Robert Haas	2fd58096f0	Add missing "return" statement to accumulate_append_subpath. Without this, Parallel Append can end up with extra children. Report by Rajkumar Raghuwanshi. Fix by Amit Khandekar. Brown paper bag bug by me. Discussion: http://postgr.es/m/CAKcux6mBF-NiddyEe9LwymoUC5+wh8bQJ=uk2gGkOE+L8cv=LA@mail.gmail.com	2018-01-10 11:21:20 -05:00
Peter Eisentraut	b3617cdfbb	Move portal pinning from PL/pgSQL to SPI PL/pgSQL "pins" internally generated (unnamed) portals so that user code cannot close them by guessing their names. This logic is also useful in other languages and really for any code. So move that logic into SPI. An unnamed portal obtained through SPI_cursor_open() and related functions is now automatically pinned, and SPI_cursor_close() automatically unpins a portal that is pinned. In the core distribution, this affects PL/Perl and PL/Python, preventing users from manually closing cursors created by spi_query and plpy.cursor, respectively. (PL/Tcl does not currently offer any cursor functionality.) Reviewed-by: Andrew Dunstan <andrew.dunstan@2ndquadrant.com>	2018-01-10 10:20:51 -05:00
Peter Eisentraut	acc67ffd0a	Give more accurate error message for dropping pinned portal The previous code gave the same error message for attempting to drop pinned and active portals, but those are separate states, so give separate error messages.	2018-01-10 09:22:07 -05:00
Andres Freund	69c3936a14	Expression evaluation based aggregate transition invocation. Previously aggregate transition and combination functions were invoked by special case code in nodeAgg.c, evaluating input and filters separately using the expression evaluation machinery. That turns out to not be great for performance for several reasons: - repeated expression evaluations have some cost - the transition functions invocations are poorly predicted, as commonly there are multiple aggregates in a query, resulting in the same call-stack invoking different functions. - filter and input computation had to be done separately - the special case code made it hard to implement JITing of the whole transition function invocation Address this by building one large expression that computes input, evaluates filters, and invokes transition functions. This leads to moderate speedups in queries bottlenecked by aggregate computations, and enables large speedups for similar cases once JITing is done. There's potential for further improvement: - It'd be nice if we could simplify the somewhat expensive aggstate->all_pergroups lookups. - right now there's still an advance_transition_function invocation in nodeAgg.c, leading to some code duplication. Author: Andres Freund Discussion: https://postgr.es/m/20170901064131.tazjxwus3k2w3ybh@alap3.anarazel.de	2018-01-09 13:25:38 -08:00
Alvaro Herrera	272c2ab9fd	Change some bogus PageGetLSN calls to BufferGetLSNAtomic As src/backend/access/transam/README says, PageGetLSN may only be called by processes holding either exclusive lock on buffer, or a shared lock on buffer plus buffer header lock. Therefore any place that only holds a shared buffer lock must use BufferGetLSNAtomic instead of PageGetLSN, which internally obtains buffer header lock prior to reading the LSN. A few callsites failed to comply with this rule. This was detected by running all tests under a new (not committed) assertion that verifies PageGetLSN locking contract. All but one of the callsites that failed the assertion are fixed by this patch. Remaining callsites were inspected manually and determined not to need any change. The exception (unfixed callsite) is in TestForOldSnapshot, which only has a Page argument, making it impossible to access the corresponding Buffer from it. Fixing that seems a much larger patch that will have to be done separately; and that's just as well, since it was only introduced in 9.6 and other bugs are much older. Some of these bugs are ancient; backpatch all the way back to 9.3. Authors: Jacob Champion, Asim Praveen, Ashwin Agrawal Reviewed-by: Michaël Paquier Discussion: https://postgr.es/m/CABAq_6GXgQDVu3u12mK9O5Xt5abBZWQ0V40LZCE+oUf95XyNFg@mail.gmail.com	2018-01-09 17:06:31 -03:00
Andrew Dunstan	11b623dd0a	Implement TZH and TZM timestamp format patterns These are compatible with Oracle and required for the datetime template language for jsonpath in an upcoming patch. Nikita Glukhov and Andrew Dunstan, reviewed by Pavel Stehule.	2018-01-09 14:25:05 -05:00
Peter Eisentraut	a77dd53f30	Remove PortalGetQueryDesc() After having gotten rid of PortalGetHeapMemory(), there seems little reason to keep one Portal access macro around that offers no actual abstraction and isn't consistently used anyway. Reviewed-by: Andrew Dunstan <andrew.dunstan@2ndquadrant.com> Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org>	2018-01-09 13:47:56 -05:00
Peter Eisentraut	0f7c49e855	Update portal-related memory context names and API Rename PortalMemory to TopPortalContext, to avoid confusion with PortalContext and align naming with similar top-level memory contexts. Rename PortalData's "heap" field to portalContext. The "heap" naming seems quite antiquated and confusing. Also get rid of the PortalGetHeapMemory() macro and access the field directly, which we do for other portal fields, so this abstraction doesn't buy anything. Reviewed-by: Andrew Dunstan <andrew.dunstan@2ndquadrant.com> Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org>	2018-01-09 13:47:56 -05:00
Tom Lane	3cb1b2a880	Rewrite list_qsort() to avoid trashing its input list. The initial implementation of list_qsort(), from commit `ab7271677`, re-used the ListCells of the input list while not touching the List header. This meant that anybody who still had a pointer to the original header would now be in possession of a corrupted list, a problem that seems sure to bite us eventually. One possible solution is to re-use the original List header as well, giving the function the semantics of update-in-place. However, that doesn't seem like a very good idea either given the way that the function is used in the planner: create_path functions aren't normally supposed to modify their input lists. It doesn't look like there would be a problem today, but it's not hard to foresee a time when modifying a list of Paths in-place could have side-effects on some other append path. On the whole, and in view of the likelihood that this function might be used in other contexts in the future, it seems best to get rid of the micro-optimization of re-using the input list cells. Just build a new list. Discussion: https://postgr.es/m/16912.1515449066@sss.pgh.pa.us	2018-01-09 13:25:53 -05:00
Tom Lane	624e440a47	Improve the heuristic for ordering child paths of a parallel append. Commit `ab7271677` introduced code that attempts to order the child scans of a Parallel Append node in a way that will minimize execution time, based on total cost and startup cost. However, it failed to think hard about what to do when estimated costs are exactly equal; a case that's particularly likely to occur when comparing on startup cost. In such a case the ordering of the child paths would be left to the whims of qsort, an algorithm that isn't even stable. We can improve matters by applying the rule used elsewhere in the planner: if total costs are equal, sort on startup cost, and vice versa. When both cost estimates are exactly equal, rather than letting qsort do something unpredictable, sort based on the child paths' relids, which should typically result in sorting in inheritance order. (The latter provision requires inventing a qsort-style comparator for bitmapsets, but maybe we'll have use for that for other reasons in future.) This results in a few plan changes in the select_parallel test, but those all look more reasonable than before, when the actual underlying cost numbers are taken into account. Discussion: https://postgr.es/m/4944.1515446989@sss.pgh.pa.us	2018-01-09 13:07:52 -05:00
Tom Lane	80259d4dbf	While waiting for a condition variable, detect postmaster death. The general assumption for postmaster child processes is that they should just exit(1), reasonably promptly, if the postmaster disappears. condition_variable.c neglected this consideration and could be left waiting forever, if the counterpart process it is waiting for has done the right thing and exited. We had some discussion of adjusting the WaitEventSet API to make it harder to make this type of mistake in future; but for the moment, and for v10, let's make this narrow fix. Discussion: https://postgr.es/m/20412.1515456143@sss.pgh.pa.us	2018-01-09 12:34:57 -05:00
Tom Lane	8a906204ae	Fix race condition during replication origin drop. replorigin_drop() misunderstood the API for condition variables: it had ConditionVariablePrepareToSleep and ConditionVariableCancelSleep inside its test-and-sleep loop, rather than outside the loop as intended. The net effect is a narrow race-condition window wherein, if the process using a replication slot releases it immediately after replorigin_drop() releases the ReplicationOriginLock, replorigin_drop() would get into the condition variable's wait list too late and then wait indefinitely for a signal that won't come. Because there's a different CV for each replication slot, we can't just move the ConditionVariablePrepareToSleep call to above the test-and-sleep loop. What we can do, in the wake of commit `13db3b936`, is drop the ConditionVariablePrepareToSleep call entirely. This fix depends on that commit because (at least in principle) the slot matching the target replication origin might move around, so that once in a blue moon successive loop iterations might involve different CVs. We can now cope with such a scenario, at the cost of an extra trip through the retry loop. (There are ways we could fix this bug without depending on that commit, but they're all a lot more complicated than this way.) While at it, upgrade the rather skimpy comments in this function. Back-patch to v10 where this code came in. Discussion: https://postgr.es/m/19947.1515455433@sss.pgh.pa.us	2018-01-09 12:09:30 -05:00
Tom Lane	13db3b9363	Allow ConditionVariable[PrepareTo]Sleep to auto-switch between CVs. The original coding here insisted that callers manually cancel any prepared sleep for one condition variable before starting a sleep on another one. While that's not a huge burden today, it seems like a gotcha that will bite us in future if the use of condition variables increases; anything we can do to make the use of this API simpler and more robust is attractive. Hence, allow these functions to automatically switch their attention to a different CV when required. This is safe for the same reason it was OK for commit `aced5a92b` to let a broadcast operation cancel any prepared CV sleep: whenever we return to the other test-and-sleep loop, we will automatically re-prepare that CV, paying at most an extra test of that loop's exit condition. Back-patch to v10 where condition variables were introduced. Ordinarily we would probably not back-patch a change like this, but since it does not invalidate any coding pattern that was legal before, it seems safe enough. Furthermore, there's an open bug in replorigin_drop() for which the simplest fix requires this. Even if we chose to fix that in some more complicated way, the hazard would remain that we might back-patch some other bug fix that requires this behavior. Patch by me, reviewed by Thomas Munro. Discussion: https://postgr.es/m/2437.1515368316@sss.pgh.pa.us	2018-01-09 11:39:10 -05:00
Robert Haas	921059bd66	Don't allow VACUUM VERBOSE ANALYZE VERBOSE. There are plans to extend the syntax for ANALYZE, so we need to break the link between VacuumStmt and AnalyzeStmt. But apart from that, the syntax above is undocumented and, if discovered by users, might give the impression that the VERBOSE option for VACUUM differs from the verbose option from ANALYZE, which it does not. Nathan Bossart, reviewed by Michael Paquier and Masahiko Sawada Discussion: http://postgr.es/m/D3FC73E2-9B1A-4DB4-8180-55F57D116B4E@amazon.com	2018-01-09 10:20:48 -05:00
Robert Haas	63008b19ee	Fix comment. RELATION_IS_OTHER_TEMP is tested in the caller, not here. Discussion: http://postgr.es/m/5A5438E4.3090709@lab.ntt.co.jp	2018-01-09 09:40:31 -05:00
Tom Lane	e35dba475a	Cosmetic improvements in condition_variable.[hc]. Clarify a bunch of comments. Discussion: https://postgr.es/m/CAEepm=0NWKehYw7NDoUSf8juuKOPRnCyY3vuaSvhrEWsOTAa3w@mail.gmail.com	2018-01-08 18:28:03 -05:00
Tom Lane	ea8e1bbc53	Improve error detection capability in proclists. Previously, although the initial state of a proclist_node is expected to be next == prev == 0, proclist_delete_offset would reset nodes to next == prev == INVALID_PGPROCNO when removing them from a list. This is the same state that a node in a singleton list has, so that it's impossible to distinguish not-in-a-list from in-a-list. Change proclist_delete_offset to reset removed nodes to next == prev == 0, making it possible to distinguish those cases, and then add Asserts to the list add and delete functions that the supplied node isn't or is in a list at entry. Also tighten assertions about the node being in the particular list (not some other one) where it is possible to check that in O(1) time. In ConditionVariablePrepareToSleep, since we don't expect the process's cvWaitLink to already be in a list, remove the more-or-less-useless proclist_contains check; we'd rather have proclist_push_tail's new assertion fire if that happens. Improve various comments related to proclists, too. Patch by me, reviewed by Thomas Munro. This isn't back-patchable, since there could theoretically be inlined copies of proclist_delete_offset in third-party modules. But it's only improving debuggability anyway. Discussion: https://postgr.es/m/CAEepm=0NWKehYw7NDoUSf8juuKOPRnCyY3vuaSvhrEWsOTAa3w@mail.gmail.com	2018-01-08 18:07:04 -05:00
Tom Lane	eeb3c2df42	Back off chattiness in RemovePgTempFiles(). In commit `561885db0`, as part of normalizing RemovePgTempFiles's error handling, I removed its behavior of silently ignoring ENOENT failures during directory opens. Thomas Munro points out that this is a bad idea at the top level, because we don't create pgsql_tmp directories until needed. Thus this coding could produce LOG messages in perfectly normal situations, which isn't what I intended. Restore the suppression of ENOENT logging, but only at top level --- it would still be unexpected for a nested temp directory to disappear between seeing it in the parent directory and opening it. Discussion: https://postgr.es/m/CAEepm=2y06SehAkTnd5sU_eVqdv5P-=Srt1y5vYNQk6yVDVaPw@mail.gmail.com	2018-01-07 20:40:40 -05:00
Simon Riggs	6271fceb8a	Add TIMELINE to backup_label file Allows new test to confirm timelines match Author: Michael Paquier Reviewed-by: David Steele	2018-01-06 12:24:19 +00:00
Simon Riggs	6668a54eb8	Default monitoring roles - errata `25fff40798` introduced default monitoring roles. Apply these corrections: * Allow access to pg_stat_get_wal_senders() by role pg_read_all_stats * Correct comment in pg_stat_get_wal_receiver() to show it is no longer superuser-only. Author: Feike Steenbergen Reviewed-by: Michael Paquier Apply to HEAD, then later backpatch to 10	2018-01-06 11:48:21 +00:00
Tom Lane	ccf312a448	Remove return values of ConditionVariableSignal/Broadcast. In the wake of commit `aced5a92b`, the semantics of these results are a bit squishy: we can tell whether we signaled some other process(es), but we do not know which ones were real waiters versus mere sentinels for ConditionVariableBroadcast operations. It does not help much that ConditionVariableBroadcast will attempt to pass on the signal to the next real waiter, because (a) there might not be one, and (b) that will only happen awhile later, anyway. So these results could overstate how much effect the calls really had. However, no existing caller of either function pays any attention to its result value, so it seems reasonable to just define that as a required property of a correct algorithm. To encourage correctness and save some tiny number of cycles, change both functions to return void. Patch by me, per an observation by Thomas Munro. No back-patch, since if any third parties happen to be using these functions, they might not appreciate an API break in a minor release. Discussion: https://postgr.es/m/CAEepm=0NWKehYw7NDoUSf8juuKOPRnCyY3vuaSvhrEWsOTAa3w@mail.gmail.com	2018-01-05 20:33:26 -05:00
Tom Lane	3cac0ec859	Reorder steps in ConditionVariablePrepareToSleep for more safety. In the admittedly-very-unlikely case that AddWaitEventToSet fails, ConditionVariablePrepareToSleep would error out after already having set cv_sleep_target, which is probably bad, and after having already set cv_wait_event_set, which is very bad. Transaction abort might or might not clean up cv_sleep_target properly; but there is nothing that would be aware that the WaitEventSet wasn't fully constructed, so that all future condition variable sleeps would be broken. We can easily guard against these hazards with slight restructuring. Back-patch to v10 where condition_variable.c was introduced. Discussion: https://postgr.es/m/CAEepm=0NWKehYw7NDoUSf8juuKOPRnCyY3vuaSvhrEWsOTAa3w@mail.gmail.com	2018-01-05 19:42:49 -05:00
Tom Lane	aced5a92bf	Rewrite ConditionVariableBroadcast() to avoid live-lock. The original implementation of ConditionVariableBroadcast was, per its self-description, "the dumbest way possible". Thomas Munro found out it was a bit too dumb. An awakened process may immediately re-queue itself, if the specific condition it's waiting for is not yet satisfied. If this happens before ConditionVariableBroadcast is able to see the wait queue as empty, then ConditionVariableBroadcast will re-awaken the same process, repeating the cycle. Given unlucky timing this back-and-forth can repeat indefinitely; loops lasting thousands of seconds have been seen in testing. To fix, add our own process to the end of the wait queue to serve as a sentinel, and exit the broadcast loop once our process is not there anymore. There are various special considerations described in the comments, the principal disadvantage being that wakers can no longer be sure whether they awakened a real waiter or just a sentinel. But in practice nobody pays attention to the result of ConditionVariableSignal or ConditionVariableBroadcast anyway, so that problem seems hypothetical. Back-patch to v10 where condition_variable.c was introduced. Tom Lane and Thomas Munro Discussion: https://postgr.es/m/CAEepm=0NWKehYw7NDoUSf8juuKOPRnCyY3vuaSvhrEWsOTAa3w@mail.gmail.com	2018-01-05 19:21:30 -05:00
Robert Haas	19c47e7c82	Factor error generation out of ExecPartitionCheck. At present, we always raise an ERROR if the partition constraint is violated, but a pending patch for UPDATE tuple routing will consider instead moving the tuple to the correct partition. Refactor to make that simpler. Amit Khandekar, reviewed by Amit Langote, David Rowley, and me. Discussion: http://postgr.es/m/CAJ3gD9cue54GbEzfV-61nyGpijvjZgCcghvLsB0_nL8Nm8HzCA@mail.gmail.com	2018-01-05 15:22:33 -05:00
Alvaro Herrera	df9f682c7b	Fix failure to delete spill files of aborted transactions Logical decoding's reorderbuffer.c may spill transaction files to disk when transactions are large. These are supposed to be removed when they become "too old" by xid; but file removal requires the boundary LSNs of the transaction to be known. The final_lsn is only set when we see the commit or abort record for the transaction, but nothing sets the value for transactions that crash, so the removal code misbehaves -- in assertion-enabled builds, it crashes by a failed assertion. To fix, modify the final_lsn of transactions that don't have a value set, to the LSN of the very latest change in the transaction. This causes the spilled files to be removed appropriately. Author: Atsushi Torikoshi Reviewed-by: Kyotaro HORIGUCHI, Craig Ringer, Masahiko Sawada Discussion: https://postgr.es/m/54e4e488-186b-a056-6628-50628e4e4ebc@lab.ntt.co.jp	2018-01-05 12:17:10 -03:00
Peter Eisentraut	054e8c6cdb	Another attempt at fixing build with various OpenSSL versions It seems we can't easily work around the lack of X509_get_signature_nid(), so revert the previous attempts and just disable the tls-server-end-point feature if we don't have it.	2018-01-04 19:09:27 -05:00
Peter Eisentraut	1834c1e432	Add missing includes <openssl/x509.h> is necessary to look into the X509 struct, used by `ac3ff8b1d8`.	2018-01-04 17:56:09 -05:00
Robert Haas	ef6087ee5f	Minor preparatory refactoring for UPDATE row movement. Generalize is_partition_attr to has_partition_attrs and make it accessible from outside tablecmds.c. Change map_partition_varattnos to clarify that it can be used for mapping between any two relations in a partitioning hierarchy, not just parent -> child. Amit Khandekar, reviewed by Amit Langote, David Rowley, and me. Some comment changes by me. Discussion: http://postgr.es/m/CAJ3gD9fWfxgKC+PfJZF3hkgAcNOy-LpfPxVYitDEXKHjeieWQQ@mail.gmail.com	2018-01-04 16:25:49 -05:00
Peter Eisentraut	ac3ff8b1d8	Fix build with older OpenSSL versions Apparently, X509_get_signature_nid() is only in fairly new OpenSSL versions, so use the lower-level interface it is built on instead.	2018-01-04 16:22:06 -05:00
Robert Haas	cc6337d2fe	Simplify and encapsulate tuple routing support code. Instead of having ExecSetupPartitionTupleRouting return multiple out parameters, have it return a pointer to a structure containing all of those different things. Also, provide and use a cleanup function, ExecCleanupTupleRouting, instead of cleaning up all of the resources allocated by ExecSetupPartitionTupleRouting individually. Amit Khandekar, reviewed by Amit Langote, David Rowley, and me Discussion: http://postgr.es/m/CAJ3gD9fWfxgKC+PfJZF3hkgAcNOy-LpfPxVYitDEXKHjeieWQQ@mail.gmail.com	2018-01-04 15:48:15 -05:00
Peter Eisentraut	d3fb72ea6d	Implement channel binding tls-server-end-point for SCRAM This adds a second standard channel binding type for SCRAM. It is mainly intended for third-party clients that cannot implement tls-unique, for example JDBC. Author: Michael Paquier <michael.paquier@gmail.com>	2018-01-04 15:29:50 -05:00
Peter Eisentraut	f3049a603a	Refactor channel binding code to fetch cbind_data only when necessary As things stand now, channel binding data is fetched from OpenSSL and saved into the SCRAM exchange context for any SSL connection attempted for a SCRAM authentication, resulting in data fetched but not used if no channel binding is used or if a different channel binding type is used than what the data is here for. Refactor the code in such a way that binding data is fetched from the SSL stack only when a specific channel binding is used for both the frontend and the backend. In order to achieve that, save the libpq connection context directly in the SCRAM exchange state, and add a dependency to SSL in the low-level SCRAM routines. This makes the interface in charge of initializing the SCRAM context cleaner as all its data comes from either PGconn* (for frontend) or Port* (for the backend). Author: Michael Paquier <michael.paquier@gmail.com>	2018-01-04 13:55:12 -05:00
Peter Eisentraut	3ad2afc2e9	Define LDAPS_PORT if it's missing and disable implicit LDAPS on Windows Some versions of Windows don't define LDAPS_PORT. Also, Windows' ldap_sslinit() is documented to use LDAPS even if you said secure=0 when the port number happens to be 636 or 3269. Let's avoid using the port number to imply that you want LDAPS, so that connection strings have the same meaning on Windows and Unix. Author: Thomas Munro Discussion: https://postgr.es/m/CAEepm%3D23B7GV4AUz3MYH1TKpTv030VHxD2Sn%2BLYWDv8d-qWxww%40mail.gmail.com	2018-01-04 10:34:41 -05:00
Robert Haas	c759395617	Code review for Parallel Append. - Remove unnecessary #include mistakenly added in execnodes.h. - Fix mistake in comment in choose_next_subplan_for_leader. - Adjust row estimates in cost_append for a possibly-different parallel divisor. - Clamp row estimates in cost_append after operations that may not produce integers. Amit Kapila, with cosmetic adjustments by me. Discussion: http://postgr.es/m/CAA4eK1+qcbeai3coPpRW=GFCzFeLUsuY4T-AKHqMjxpEGZBPQg@mail.gmail.com	2018-01-04 07:56:09 -05:00
Tom Lane	47c6772eb7	Clean up tupdesc.c for recent changes. TupleDescCopy needs to have the same effects as CreateTupleDescCopy in that, since it doesn't copy constraints, it should clear the per-attribute fields associated with them. Oversight in commit `cc5f81366`. Since TupleDescCopy has already established the presumption that it can just flat-copy the entire attribute array in one go, propagate that approach into CreateTupleDescCopy and CreateTupleDescCopyConstr. (I'm suspicious that this would lead to valgrind complaints if we had any trailing padding in the struct, but we do not, and anyway fixing that seems like a job for a separate commit.) Add some better comments. Thomas Munro, reviewed by Vik Fearing, some additional hacking by me Discussion: https://postgr.es/m/CAEepm=0NvOGZ8B6GbQyQe2C_c2m3LKJ9w=8OMBaYRLgZ_Gw6Nw@mail.gmail.com	2018-01-03 17:53:41 -05:00
Alvaro Herrera	bab2969867	Fix typo Author: Dagfinn Ilmari Mannsåker Discussion: https://postgr.es/m/d8jefpk4jtd.fsf@dalvik.ping.uio.no	2018-01-03 19:12:06 -03:00

... 4 5 6 7 8 ...

18360 Commits