postgresql

Commit Graph

Author	SHA1	Message	Date
Robert Haas	289992c462	Don't invoke arbitrary code inside a possibly-aborted transaction. The code here previously tried to call the partitioning operator, but really the right thing to do (and the safe thing to do) is use datumIsEqual(). Amit Langote, but I expanded the comment and fixed a compiler warning.	2017-01-24 08:57:10 -05:00
Robert Haas	b1ecb9b3fc	Fix interaction of partitioned tables with BulkInsertState. When copying into a partitioned table, the target heap may change from one tuple to next. We must ask ReadBufferBI() to get a new buffer every time such change occurs. To do that, use new function ReleaseBulkInsertStatePin(). This fixes the bug that tuples ended up being inserted into the wrong partition, which occurred exactly because the wrong buffer was used. Amit Langote, per a suggestion from Robert Haas. Some cosmetic adjustments by me. Reports by 高增琦 (Gao Zengqi), Venkata B Nagothi, and Ragnar Ouchterlony. Discussion: http://postgr.es/m/CAFmBtr32FDOqofo8yG-4mjzL1HnYHxXK5S9OGFJ%3D%3DcJpgEW4vA%40mail.gmail.com Discussion: http://postgr.es/m/CAEyp7J9WiX0L3DoiNcRrY-9iyw%3DqP%2Bj%3DDLsAnNFF1xT2J1ggfQ%40mail.gmail.com Discussion: http://postgr.es/m/16d73804-c9cd-14c5-463e-5caad563ff77%40agama.tv Discussion: http://postgr.es/m/CA+TgmoaiZpDVUUN8LZ4jv1qFE_QyR+H9ec+79f5vNczYarg5Zg@mail.gmail.com	2017-01-24 08:50:16 -05:00
Fujii Masao	dc82f5a640	Be sure to release the lock on failure to launch logical replication worker. Petr Jelinek	2017-01-24 12:41:09 +09:00
Tatsuo Ishii	73293ebae2	Fix comments in StrategyNotifyBgWriter(). The interface for the function was changed in `d72731a704` but the comments of the function was not updated. Patch by Yugo Nagata.	2017-01-24 09:39:11 +09:00
Peter Eisentraut	1a2d9a655a	Fix parallel make issue with new fmgrprotos.h The make rules needed further refinement so that we don't run multiple generations per build. reported by Tom Lane	2017-01-23 15:36:27 -05:00
Peter Eisentraut	0bc1207aeb	Fix default minimum value for descending sequences For some reason that is lost in history, a descending sequence would default its minimum value to -2^63+1 (-PG_INT64_MAX) instead of -2^63 (PG_INT64_MIN), even though explicitly specifying a minimum value of -2^63 would work. Fix this inconsistency by using the full range by default. Reported-by: Daniel Verite <daniel@manitou-mail.org> Reviewed-by: Michael Paquier <michael.paquier@gmail.com>	2017-01-23 14:00:58 -05:00
Peter Eisentraut	46d482814c	Don't error when no system locales were found initdb used to warn about that, but it was changed to an error in pg_import_system_locales, but some build farm members failed because of that. Change it back to a warning.	2017-01-23 13:45:32 -05:00
Peter Eisentraut	70c56a014e	Fix NULL pointer access in logical replication workers From: Petr Jelinek <pjmodos@pjmodos.net>	2017-01-23 12:33:27 -05:00
Peter Eisentraut	6cffe54aef	Fix pointer confusion get_object_address_publication_rel() needed to check *relation, not relation. Rename the variables to match style used nearby to avoid the confusion.	2017-01-23 11:55:06 -05:00
Peter Eisentraut	16a61884b5	Fix memory leaks in libpqwalreceiver The results of the libpq functions PQescapeIdentifier() and PQescapeLiteral() must be freed explicitly. Also handle errors in these functions better.	2017-01-23 11:06:30 -05:00
Alvaro Herrera	7e26e02eec	Prefetch blocks during lazy vacuum's truncation scan Vacuum truncation scan can be sped up on rotating media by prefetching blocks in forward direction. That makes the blocks already present in memory by the time they are needed, while also letting OS read-ahead kick in. The truncate scan has been measured to be five times faster than without this patch (that was on a slow disk, but it shouldn't hurt on fast disks.) Author: Álvaro Herrera, loosely based on a submission by Claudio Freire Discussion: https://postgr.es/m/CAGTBQpa6NFGO_6g_y_7zQx8L9GcHDSQKYdo1tGuh791z6PYgEg@mail.gmail.com	2017-01-23 12:55:18 -03:00
Tom Lane	3c821466ab	Fix example plan in optimizer/README. Joining three tables only takes two join nodes. I think when I (tgl) wrote this, I was envisioning possible additional joins; but since the example doesn't show any fourth table, it's just confusing to write a third join node. Etsuro Fujita Discussion: https://postgr.es/m/e6cfbaa3-af02-1abc-c25e-8fa5c6bc4e21@lab.ntt.co.jp	2017-01-23 09:38:36 -05:00
Tom Lane	90992e0e2f	Relocate static function declarations to be after typedefs in jsonfuncs.c. Project style is to put things in this order, for the good and sufficient reason that you often need the typedefs in the function declarations. There already was one function declaration that needed a typedef, which was randomly placed away from all the other static function declarations in consequence. And the submitted patch for better json_populate_record functionality jumped through even more hoops in order to preserve this bad idea. This patch only moves lines from point A to point B, no other changes.	2017-01-22 14:08:26 -05:00
Tom Lane	0a8b9d3b2c	Remove no-longer-needed loop in ExecGather(). Coverity complained quite properly that commit `ea15e1867` had introduced unreachable code into ExecGather(); to wit, it was no longer possible to iterate the final for-loop more or less than once. So remove the for(). In passing, clean up a couple of comments, and make better use of a local variable.	2017-01-22 11:47:38 -05:00
Peter Eisentraut	8f164e1eea	Add missing break	2017-01-22 06:40:04 -05:00
Peter Eisentraut	f21a563d25	Move some things from builtins.h to new header files This avoids that builtins.h has to include additional header files.	2017-01-20 20:29:53 -05:00
Robert Haas	c6a389792e	Avoid useless respawining the autovacuum launcher at high speed. When (1) autovacuum = off and (2) there's at least one database with an XID age greater than autovacuum_freeze_max_age and (3) all tables in that database that need vacuuming are already being processed by a worker and (4) the autovacuum launcher is started, a kind of infinite loop occurs. The launcher starts a worker and immediately exits. The worker, finding no worker to do, immediately starts the launcher, supposedly so that the next database can be processed. But because datfrozenxid for that database hasn't been advanced yet, the new worker gets put right back into the same database as the old one, where it once again starts the launcher and exits. High-speed ping pong ensues. There are several possible ways to break the cycle; this seems like the safest one. Amit Khandekar (code) and Robert Haas (comments), reviewed by Álvaro Herrera. Discussion: http://postgr.es/m/CAJ3gD9eWejf72HKquKSzax0r+epS=nAbQKNnykkMA0E8c+rMDg@mail.gmail.com	2017-01-20 15:55:45 -05:00
Robert Haas	6546ffb35d	Fix comparison logic in partition_bounds_equal for non-finite bounds. If either bound is infinite, then we shouldn't even try to perform a comparison of the values themselves. Rearrange the logic so that we don't. Per buildfarm member skink and Tom Lane.	2017-01-20 15:49:38 -05:00
Alvaro Herrera	50cf1c80e6	Record dependencies on owners for logical replication objects This was forgotten in `665d1fad99` and caused the whole buildfarm to become red for a little while. Author: Petr Jelínek Also fix a typo in a nearby error message.	2017-01-20 16:45:02 -03:00
Tom Lane	0502e85464	Try to fix non-MSVC Windows builds in the wake of logical replication. pgoutput evidently needs to be built without -DBUILDING_DLL. (It seems like a pretty bad idea that these makefiles need to know exactly where all the shlibs are in the tree, or maybe what's bad is putting them under src/backend/. But right now is not the time to redesign that.) Also, remove "override CPPFLAGS" in pgoutput's Makefile. I don't think that that actually has any bad consequences, but it's certainly useless in a directory that has no .h files, and it might be contributing to the failure somehow. Per buildfarm.	2017-01-20 12:51:31 -05:00
Peter Eisentraut	665d1fad99	Logical replication - Add PUBLICATION catalogs and DDL - Add SUBSCRIPTION catalog and DDL - Define logical replication protocol and output plugin - Add logical replication workers From: Petr Jelinek <petr@2ndquadrant.com> Reviewed-by: Steve Singer <steve@ssinger.info> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Erik Rijkers <er@xs4all.nl> Reviewed-by: Peter Eisentraut <peter.eisentraut@2ndquadrant.com>	2017-01-20 09:04:49 -05:00
Tom Lane	ba61a04bc7	Avoid core dump for empty prepared statement in an aborted transaction. Brown-paper-bag bug in commit ab1f0c822: the old code here coped with null CachedPlanSource.raw_parse_tree, the new code not so much. Per report from Dave Cramer. No regression test, because our core testing infrastructure doesn't provide any easy way to exercise this path. Fortunately, the JDBC crew test it regularly. Discussion: https://postgr.es/m/CADK3HH+Ug3xCysKqw_dZOnaNnytZ1Rh5yP05hjO-e4NoyRxVvA@mail.gmail.com	2017-01-19 19:52:13 -05:00
Tom Lane	d479e37e3d	Fix Assert failure induced by commit `215b43cdc`. I'd somehow talked myself into believing that set_append_rel_size doesn't need to worry about getting back an AND clause when it applies eval_const_expressions to the result of adjust_appendrel_attrs (that is, transposing the appendrel parent's restriction clauses for one child). But that is nonsense, and Andreas Seltenreich's fuzz tester soon turned up a counterexample. Put back the make_ands_implicit step that was there before, and add a regression test covering the case. Report: https://postgr.es/m/878tq6vja6.fsf@ansel.ydns.eu	2017-01-19 18:20:58 -05:00
Andres Freund	ea15e18677	Remove obsoleted code relating to targetlist SRF evaluation. Since `69f4b9c` plain expression evaluation (and thus normal projection) can't return sets of tuples anymore. Thus remove code dealing with that possibility. This will require adjustments in external code using ExecEvalExpr()/ExecProject() - that should neither be hard nor very common. Author: Andres Freund and Tom Lane Discussion: https://postgr.es/m/20160822214023.aaxz5l4igypowyri@alap3.anarazel.de	2017-01-19 14:40:41 -08:00
Alvaro Herrera	8eace46d34	Fix race condition in reading commit timestamps If a user requests the commit timestamp for a transaction old enough that its data is concurrently being truncated away by vacuum at just the right time, they would receive an ugly internal file-not-found error message from slru.c rather than the expected NULL return value. In a primary server, the window for the race is very small: the lookup has to occur exactly between the two calls by vacuum, and there's not a lot that happens between them (mostly just a multixact truncate). In a standby server, however, the window is larger because the truncation is executed as soon as the WAL record for it is replayed, but the advance of the oldest-Xid is not executed until the next checkpoint record. To fix in the primary, simply reverse the order of operations in vac_truncate_clog. To fix in the standby, augment the WAL truncation record so that the standby is aware of the new oldest-XID value and can apply the update immediately. WAL version bumped because of this. No backpatch, because of the low importance of the bug and its rarity. Author: Craig Ringer Reviewed-By: Petr Jelínek, Peter Eisentraut Discussion: https://postgr.es/m/CAMsr+YFhVtRQT1VAwC+WGbbxZZRzNou=N9Ed-FrCqkwQ8H8oJQ@mail.gmail.com	2017-01-19 18:24:17 -03:00
Robert Haas	cc144155f7	Avoid some code duplication in map_partition_varattnos(). Code to map attribute numbers in map_partition_varattnos() duplicates what convert_tuples_by_name_map() does. Avoid that. Amit Langote, per a report from Álvaro Herrera. Discussion: http://postgr.es/m/9ce97382-54c8-deb3-9ee9-a2ec271d866b%40lab.ntt.co.jp	2017-01-19 14:13:15 -05:00
Robert Haas	8a8afe2f54	Fix some problems in check_new_partition_bound(). Account for the fact that the highest bound less than or equal to the upper bound might be either the lower or the upper bound of the overlapping partition, depending on whether the proposed partition completely contains the existing partition or merely overlaps it. Also, we need not continue searching for even greater bound in partition_bound_bsearch() once we find the first bound that is equal to the probe, because we don't have duplicate datums. That spends cycles needlessly. Amit Langote, per a report from Amul Sul. Cosmetic changes by me. Discussion: http://postgr.es/m/CAAJ_b94XgbqVoXMyxxs63CaqWoMS1o2gpHiU0F7yGnJBnvDc_A%40mail.gmail.com	2017-01-19 14:00:55 -05:00
Robert Haas	05bd889904	Fix RETURNING to work correctly with partition tuple routing. In ExecInsert(), do not switch back to the root partitioned table ResultRelInfo until after we finish ExecProcessReturning(), so that RETURNING projection is done using the partition's descriptor. For the projection to work correctly, we must initialize the same for each leaf partition during ModifyTableState initialization. Amit Langote	2017-01-19 13:20:11 -05:00
Robert Haas	39162b2030	Fix failure to enforce partitioning contraint for internal partitions. When a tuple is inherited into a partitioning root, no partition constraints need to be enforced; when it is inserted into a leaf, the parent's partitioning quals needed to be enforced. The previous coding got both of those cases right. When a tuple is inserted into an intermediate level of the partitioning hierarchy (i.e. a table which is both a partition itself and in turn partitioned), it must enforce the partitioning qual inherited from its parent. That case got overlooked; repair. Amit Langote	2017-01-19 12:30:27 -05:00
Alvaro Herrera	30bcebbdcf	Allow negative years in make_date to represent BC years There doesn't seem to be any reason not to allow negative years to be interpreted as BC, so do that. The documentation is pretty vague on the details of this function, so nothing needs to change there. Reported-by: Andy Abelisto, in bug #14446	2017-01-19 09:45:38 -03:00
Andres Freund	69f4b9c85f	Move targetlist SRF handling from expression evaluation to new executor node. Evaluation of set returning functions (SRFs_ in the targetlist (like SELECT generate_series(1,5)) so far was done in the expression evaluation (i.e. ExecEvalExpr()) and projection (i.e. ExecProject/ExecTargetList) code. This meant that most executor nodes performing projection, and most expression evaluation functions, had to deal with the possibility that an evaluated expression could return a set of return values. That's bad because it leads to repeated code in a lot of places. It also, and that's my (Andres's) motivation, made it a lot harder to implement a more efficient way of doing expression evaluation. To fix this, introduce a new executor node (ProjectSet) that can evaluate targetlists containing one or more SRFs. To avoid the complexity of the old way of handling nested expressions returning sets (e.g. having to pass up ExprDoneCond, and dealing with arguments to functions returning sets etc.), those SRFs can only be at the top level of the node's targetlist. The planner makes sure (via split_pathtarget_at_srfs()) that SRF evaluation is only necessary in ProjectSet nodes and that SRFs are only present at the top level of the node's targetlist. If there are nested SRFs the planner creates multiple stacked ProjectSet nodes. The ProjectSet nodes always get input from an underlying node. We also discussed and prototyped evaluating targetlist SRFs using ROWS FROM(), but that turned out to be more complicated than we'd hoped. While moving SRF evaluation to ProjectSet would allow to retain the old "least common multiple" behavior when multiple SRFs are present in one targetlist (i.e. continue returning rows until all SRFs are at the end of their input at the same time), we decided to instead only return rows till all SRFs are exhausted, returning NULL for already exhausted ones. We deemed the previous behavior to be too confusing, unexpected and actually not particularly useful. As a side effect, the previously prohibited case of multiple set returning arguments to a function, is now allowed. Not because it's particularly desirable, but because it ends up working and there seems to be no argument for adding code to prohibit it. Currently the behavior for COALESCE and CASE containing SRFs has changed, returning multiple rows from the expression, even when the SRF containing "arm" of the expression is not evaluated. That's because the SRFs are evaluated in a separate ProjectSet node. As that's quite confusing, we're likely to instead prohibit SRFs in those places. But that's still being discussed, and the code would reside in places not touched here, so that's a task for later. There's a lot of, now superfluous, code dealing with set return expressions around. But as the changes to get rid of those are verbose largely boring, it seems better for readability to keep the cleanup as a separate commit. Author: Tom Lane and Andres Freund Discussion: https://postgr.es/m/20160822214023.aaxz5l4igypowyri@alap3.anarazel.de	2017-01-18 13:40:27 -08:00
Robert Haas	e37360d5df	Improve comment in hashsearch.c. Typo fix from Mithun Cy; other improvements by me.	2017-01-18 16:36:48 -05:00
Magnus Hagander	d00ca333c3	Implement array version of jsonb_delete and operator This makes it possible to delete multiple keys from a jsonb value by passing in an array of text values, which makes the operaiton much faster than individually deleting the keys (which would require copying the jsonb structure over and over again. Reviewed by Dmitry Dolgov and Michael Paquier	2017-01-18 21:37:59 +01:00
Tom Lane	c22ecc6562	Disable transforms that replaced AT TIME ZONE with RelabelType. These resulted in wrong answers if the relabeled argument could be matched to an index column, as shown in bug #14504 from Evgeniy Kozlov. We might be able to resurrect these optimizations by adjusting the planner's treatment of RelabelType, or by adjusting btree's rules for selecting comparison functions, but either solution will take careful analysis and does not sound like a fit candidate for backpatching. I left the catalog infrastructure in place and just reduced the transform functions to always-return-NULL. This would be necessary anyway in the back branches, and it doesn't seem important to be more invasive in HEAD. Bug introduced by commit `b8a18ad48`. Back-patch to 9.5 where that came in. Report: https://postgr.es/m/20170118144828.1432.52823@wrigleys.postgresql.org Discussion: https://postgr.es/m/18771.1484759439@sss.pgh.pa.us	2017-01-18 15:22:07 -05:00
Robert Haas	262e821dec	Update information_schema queries and system views for new relkind. The original table partitioning patch overlooked this. Discussion: http://postgr.es/m/CAG1_KcDJiZB=L6yOUO_bVufj2q2851_xdkfhw0JdcD_2VtKssw@mail.gmail.com Keith Fiske and Amit Langote, adjusted by me.	2017-01-18 14:29:23 -05:00
Alvaro Herrera	9a34123bc3	Make messages mentioning type names more uniform This avoids additional translatable strings for each distinct type, as well as making our quoting style around type names more consistent (namely, that we don't quote type names). This continues what started as `f402b99501`. Discussion: https://postgr.es/m/20160401170642.GA57509@alvherre.pgsql	2017-01-18 16:08:20 -03:00
Robert Haas	716c7d4b24	Factor out logic for computing number of parallel workers. Forthcoming patches to allow other types of parallel scans will need this logic, or something like it. Dilip Kumar	2017-01-18 13:54:45 -05:00
Tom Lane	0333a73400	Avoid conflicts with collation aliases generated by stripping. This resulted in failures depending on the order of "locale -a" output. The original coding in initdb sorted the results, but that should be unnecessary as long as "locale -a" doesn't print duplicate names. The original entries will then all be non-dups, and while we might generate duplicate aliases by stripping, they should be for different encodings and thus not conflict. Even if the latter assumption fails somehow, it won't be fatal because we're using if_not_exists mode for the aliases. Discussion: https://postgr.es/m/26116.1484751196%40sss.pgh.pa.us	2017-01-18 13:44:19 -05:00
Tom Lane	215b43cdc8	Improve RLS planning by marking individual quals with security levels. In an RLS query, we must ensure that security filter quals are evaluated before ordinary query quals, in case the latter contain "leaky" functions that could expose the contents of sensitive rows. The original implementation of RLS planning ensured this by pushing the scan of a secured table into a sub-query that it marked as a security-barrier view. Unfortunately this results in very inefficient plans in many cases, because the sub-query cannot be flattened and gets planned independently of the rest of the query. To fix, drop the use of sub-queries to enforce RLS qual order, and instead mark each qual (RestrictInfo) with a security_level field establishing its priority for evaluation. Quals must be evaluated in security_level order, except that "leakproof" quals can be allowed to go ahead of quals of lower security_level, if it's helpful to do so. This has to be enforced within the ordering of any one list of quals to be evaluated at a table scan node, and we also have to ensure that quals are not chosen for early evaluation (i.e., use as an index qual or TID scan qual) if they're not allowed to go ahead of other quals at the scan node. This is sufficient to fix the problem for RLS quals, since we only support RLS policies on simple tables and thus RLS quals will always exist at the table scan level only. Eventually these qual ordering rules should be enforced for join quals as well, which would permit improving planning for explicit security-barrier views; but that's a task for another patch. Note that FDWs would need to be aware of these rules --- and not, for example, send an insecure qual for remote execution --- but since we do not yet allow RLS policies on foreign tables, the case doesn't arise. This will need to be addressed before we can allow such policies. Patch by me, reviewed by Stephen Frost and Dean Rasheed. Discussion: https://postgr.es/m/8185.1477432701@sss.pgh.pa.us	2017-01-18 12:58:20 -05:00
Peter Eisentraut	aa17c06fb5	Add function to import operating system collations Move this logic out of initdb into a user-callable function. This simplifies the code and makes it possible to update the standard collations later on if additional operating system collations appear. Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Euler Taveira <euler@timbira.com.br>	2017-01-18 09:35:56 -05:00
Alvaro Herrera	193a7d791e	Remove dead code in bootstrap The bootstrap scanner/parser contains code to parse floating point values, but this is not exercised anywhere, so remove it. Reviewed-by: Jim Nasby Discussion: https://postgr.es/m/20170110051119.b5h7i3z5qagy35rb@alvherre.pgsql	2017-01-17 16:54:40 -03:00
Alvaro Herrera	593c75d5c3	Fix typo	2017-01-17 16:49:20 -03:00
Alvaro Herrera	dda7c34555	Fix typo	2017-01-17 16:33:10 -03:00
Peter Eisentraut	352a24a1f9	Generate fmgr prototypes automatically Gen_fmgrtab.pl creates a new file fmgrprotos.h, which contains prototypes for all functions registered in pg_proc.h. This avoids having to manually maintain these prototypes across a random variety of header files. It also automatically enforces a correct function signature, and since there are warnings about missing prototypes, it will detect functions that are defined but not registered in pg_proc.h (or otherwise used). Reviewed-by: Pavel Stehule <pavel.stehule@gmail.com>	2017-01-17 14:06:07 -05:00
Peter Eisentraut	6fc547960d	Rename C symbols for backend lo_ functions Rename the C symbols for lo_* to be_lo_*, so they don't conflict with libpq prototypes. Reviewed-by: Pavel Stehule <pavel.stehule@gmail.com>	2017-01-17 12:35:30 -05:00
Peter Eisentraut	30b9a4495a	Remove unnecessary include Between `6eeb95f0f5` and `7b1c2a0f20`, builtins.h contained additional prototypes that have now been moved elsewhere, so we don't need to include nodes/parsenodes.h anymore. Fix some files that were relying on builtins.h implicitly pulling in some unrelated stuff they needed. Reviewed-by: Pavel Stehule <pavel.stehule@gmail.com>	2017-01-17 12:35:19 -05:00
Fujii Masao	974ece58bb	Fix an assertion failure related to an exclusive backup. Previously multiple sessions could execute pg_start_backup() and pg_stop_backup() to start and stop an exclusive backup at the same time. This could trigger the assertion failure of "FailedAssertion("!(XLogCtl->Insert.exclusiveBackup)". This happend because, even while pg_start_backup() was starting an exclusive backup, other session could run pg_stop_backup() concurrently and mark the backup as not-in-progress unconditionally. This patch introduces ExclusiveBackupState indicating the state of an exclusive backup. This state is used to ensure that there is only one session running pg_start_backup() or pg_stop_backup() at the same time, to avoid the assertion failure. Back-patch to all supported versions. Author: Michael Paquier Reviewed-By: Kyotaro Horiguchi and me Reported-By: Andreas Seltenreich Discussion: <87mvktojme.fsf@credativ.de>	2017-01-17 17:27:32 +09:00
Tom Lane	d43a619c60	Fix check_srf_call_placement() to handle VALUES cases correctly. INSERT ... VALUES with a single VALUES row is implemented quite differently from the general VALUES case. A user-visible implication of that is that we accept SRFs in the single-row case, but not in the multi-row case. That's a historical artifact no doubt, but in view of the lack of field complaints, I'm not excited about fixing it right now. However, check_srf_call_placement() needs to know about this, first because it should throw an error in the unsupported case, and second because it should set p_hasTargetSRFs in the single-row case (because we treat that like a SELECT tlist). That's an oversight in commit `a4c35ea1c`. To fix, split EXPR_KIND_VALUES into two values. So far as I can see, this is the only place where we need to distinguish the two cases at present; but there might be more later. Patch by me, per report from Andres Freund. Discussion: https://postgr.es/m/20170116081548.zg63zltblwimpfgp@alap3.anarazel.de	2017-01-16 15:23:11 -05:00
Tom Lane	4e46c97fde	Fix NULL pointer dereference in tuplesort.c. Oversight in commit `e94568ecc`. This could cause a crash when an external datum tuplesort of a pass-by-value type required multiple passes. Per report from Mithun Cy. Peter Geoghegan Discussion: https://postgr.es/m/CAD__OujuhfWFULGFSt1fyHqUb8N-XafjJhudwt88V0Qs2o84qg@mail.gmail.com	2017-01-16 13:53:40 -05:00
Fujii Masao	8fa6019b40	Fix typos in comments. Masahiko Sawada	2017-01-16 18:55:34 +09:00
Tom Lane	0777f7a2e8	Fix matching of boolean index columns to sort ordering. Normally, if we have a WHERE clause like "indexcol = constant", the planner will figure out that that index column can be ignored when determining whether the index has a desired sort ordering. But this failed to work for boolean index columns, because a condition like "boolcol = true" is canonicalized to just "boolcol" which does not give rise to an EquivalenceClass. Add a check to allow the same type of deduction to be made in this case too. Per a complaint from Dima Pavlov. Arguably this is a bug, but given the limited impact and the small number of complaints so far, I won't risk destabilizing plans in stable branches by back-patching. Patch by me, reviewed by Michael Paquier Discussion: https://postgr.es/m/1788.1481605684@sss.pgh.pa.us	2017-01-15 14:09:35 -05:00
Tom Lane	ab1f0c8225	Change representation of statement lists, and add statement location info. This patch makes several changes that improve the consistency of representation of lists of statements. It's always been the case that the output of parse analysis is a list of Query nodes, whatever the types of the individual statements in the list. This patch brings similar consistency to the outputs of raw parsing and planning steps: * The output of raw parsing is now always a list of RawStmt nodes; the statement-type-dependent nodes are one level down from that. * The output of pg_plan_queries() is now always a list of PlannedStmt nodes, even for utility statements. In the case of a utility statement, "planning" just consists of wrapping a CMD_UTILITY PlannedStmt around the utility node. This list representation is now used in Portal and CachedPlan plan lists, replacing the former convention of intermixing PlannedStmts with bare utility-statement nodes. Now, every list of statements has a consistent head-node type depending on how far along it is in processing. This allows changing many places that formerly used generic "Node *" pointers to use a more specific pointer type, thus reducing the number of IsA() tests and casts needed, as well as improving code clarity. Also, the post-parse-analysis representation of DECLARE CURSOR is changed so that it looks more like EXPLAIN, PREPARE, etc. That is, the contained SELECT remains a child of the DeclareCursorStmt rather than getting flipped around to be the other way. It's now true for both Query and PlannedStmt that utilityStmt is non-null if and only if commandType is CMD_UTILITY. That allows simplifying a lot of places that were testing both fields. (I think some of those were just defensive programming, but in many places, it was actually necessary to avoid confusing DECLARE CURSOR with SELECT.) Because PlannedStmt carries a canSetTag field, we're also able to get rid of some ad-hoc rules about how to reconstruct canSetTag for a bare utility statement; specifically, the assumption that a utility is canSetTag if and only if it's the only one in its list. While I see no near-term need for relaxing that restriction, it's nice to get rid of the ad-hocery. The API of ProcessUtility() is changed so that what it's passed is the wrapper PlannedStmt not just the bare utility statement. This will affect all users of ProcessUtility_hook, but the changes are pretty trivial; see the affected contrib modules for examples of the minimum change needed. (Most compilers should give pointer-type-mismatch warnings for uncorrected code.) There's also a change in the API of ExplainOneQuery_hook, to pass through cursorOptions instead of expecting hook functions to know what to pick. This is needed because of the DECLARE CURSOR changes, but really should have been done in 9.6; it's unlikely that any extant hook functions know about using CURSOR_OPT_PARALLEL_OK. Finally, teach gram.y to save statement boundary locations in RawStmt nodes, and pass those through to Query and PlannedStmt nodes. This allows more intelligent handling of cases where a source query string contains multiple statements. This patch doesn't actually do anything with the information, but a follow-on patch will. (Passing this information through cleanly is the true motivation for these changes; while I think this is all good cleanup, it's unlikely we'd have bothered without this end goal.) catversion bump because addition of location fields to struct Query affects stored rules. This patch is by me, but it owes a good deal to Fabien Coelho who did a lot of preliminary work on the problem, and also reviewed the patch. Discussion: https://postgr.es/m/alpine.DEB.2.20.1612200926310.29821@lancre	2017-01-14 16:02:35 -05:00
Tom Lane	75abb955df	Throw suitable error for COPY TO STDOUT/FROM STDIN in a SQL function. A client copy can't work inside a function because the FE/BE wire protocol doesn't support nesting of a COPY operation within query results. (Maybe it could, but the protocol spec doesn't suggest that clients should support this, and libpq for one certainly doesn't.) In most PLs, this prohibition is enforced by spi.c, but SQL functions don't use SPI. A comparison of _SPI_execute_plan() and init_execution_state() shows that rejecting client COPY is the only discrepancy in what they allow, so there's no other similar bugs. This is an astonishingly ancient oversight, so back-patch to all supported branches. Report: https://postgr.es/m/BY2PR05MB2309EABA3DEFA0143F50F0D593780@BY2PR05MB2309.namprd05.prod.outlook.com	2017-01-14 13:27:47 -05:00
Magnus Hagander	f6d6d2920d	Change default values for backup and replication parameters This changes the default values of the following parameters: wal_level = replica max_wal_senders = 10 max_replication_slots = 10 in order to make it possible to make a backup and set up simple replication on the default settings, without requiring a system restart. Discussion: https://postgr.es/m/CABUevEy4PR_EAvZEzsbF5s+V0eEvw7shJ2t-AUwbHOjT+yRb3A@mail.gmail.com Reviewed by Peter Eisentraut. Benchmark help from Tomas Vondra.	2017-01-14 17:14:56 +01:00
Robert Haas	0563a3a8b5	Fix a bug in how we generate partition constraints. Move the code for doing parent attnos to child attnos mapping for Vars in partition constraint expressions to a separate function map_partition_varattnos() and call it from the appropriate places. Doing it in get_qual_from_partbound(), as is now, would produce wrong result in certain multi-level partitioning cases, because it only considers the current pair of parent-child relations. In certain multi-level partitioning cases, attnums for the same key attribute(s) might differ between various levels causing the same attribute to be numbered differently in different instances of the Var corresponding to a given attribute. With this commit, in generate_partition_qual(), we first generate the the whole partition constraint (considering all levels of partitioning) and then do the mapping, so that Vars in the final expression are numbered according the leaf relation (to which it is supposed to apply). Amit Langote, reviewed by me.	2017-01-13 14:04:35 -05:00
Robert Haas	0c2070cefa	Fix cardinality estimates for parallel joins. For a partial path, the cardinality estimate needs to reflect the number of rows we think each worker will see, rather than the total number of rows; otherwise, costing will go wrong. The previous coding got this completely wrong for parallel joins. Unfortunately, this change may destabilize plans for users of 9.6 who have enabled parallel query, but since 9.6 is still fairly new I'm hoping expectations won't be too settled yet. Also, this is really a brown-paper-bag bug, so leaving it unfixed for the entire lifetime of 9.6 seems unwise. Related reports (whose import I initially failed to recognize) by Tomas Vondra and Tom Lane. Discussion: http://postgr.es/m/CA+TgmoaDxZ5z5Kw_oCQoymNxNoVaTCXzPaODcOuao=CzK8dMZw@mail.gmail.com	2017-01-13 13:34:10 -05:00
Robert Haas	76568d3786	Fix incorrect function name in comment. Amit Langote	2017-01-12 09:05:14 -05:00
Magnus Hagander	268f9e3d92	Fix some typos in comments Masahiko Sawada	2017-01-11 10:03:03 +01:00
Alvaro Herrera	42f50cb8fa	Fix overflow check in StringInfo; add missing casts A few thinkos I introduced in `fa2fa99552`. Also, amend a similarly broken comment. Report by Daniel Vérité. Authors: Daniel Vérité, Álvaro Herrera Discussion: https://postgr.es/m/1706e85e-60d2-494e-8a64-9af1e1b2186e@manitou-mail.org	2017-01-10 11:41:13 -03:00
Robert Haas	e898437460	Improve coding in _hash_addovflpage. Instead of relying on the page contents to know whether we have advanced from the primary bucket page to an overflow page, track that explicitly. Amit Kapila, per a complaint by me.	2017-01-10 08:31:03 -05:00
Alvaro Herrera	3957b58b88	Fix ALTER TABLE / SET TYPE for irregular inheritance If inherited tables don't have exactly the same schema, the USING clause in an ALTER TABLE / SET DATA TYPE misbehaves when applied to the children tables since commit `9550e8348b`. Starting with that commit, the attribute numbers in the USING expression are fixed during parse analysis. This can lead to bogus errors being reported during execution, such as: ERROR: attribute 2 has wrong type DETAIL: Table has type smallint, but query expects integer. Since it wouldn't do to revert to the original coding, we now apply a transformation to map the attribute numbers to the correct ones for each child. Reported by Justin Pryzby Analysis by Tom Lane; patch by me. Discussion: https://postgr.es/m/20170102225618.GA10071@telsasoft.com	2017-01-09 19:26:58 -03:00
Alvaro Herrera	7403561c0f	BRIN revmap pages are not standard pages ... ... and therefore we ought not to tell XLogRegisterBuffer the opposite, when writing XLog for a brin update that moves the index tuple to a different page. Otherwise, xlog insertion would try to "compress the hole" when producing a full-page image for it; but since we don't update pd_lower/upper, the hole covers the whole page. On WAL replay, the revmap page becomes empty and so the entire portion of the index is useless and needs to be recomputed. This is low-probability: a BRIN update only moves an index tuple to a different page when the summary tuple is larger than the existing one, which doesn't happen with fixed-width datatypes. Also, the revmap page must be first after a checkpoint. Report and patch: Kuntal Ghosh Bug is alleged to have detected by a WAL-consistency-checking tool. Discussion: https://postgr.es/m/CAGz5QCJ=00UQjScSEFbV=0qO5ShTZB9WWz_Fm7+Wd83zPs9Geg@mail.gmail.com I posted a test case demonstrating the problem, but I'm refraining from adding it to the test suite; if the WAL consistency tool makes it in, that will be a better way to catch this from regressing. (We should definitely have someting that causes not-same-page updates, though.)	2017-01-09 18:19:29 -03:00
Tom Lane	7c3abe3c92	Get rid of ParseState.p_value_substitute; use a columnref hook instead. I noticed that p_value_substitute, which is a single-purpose kluge I added in 2002 (commit `b0422b215`), could be replaced by having domainAddConstraint install a parser hook that looks for the name "value". The parser hook code only dates back to 2009, so it's not surprising that we had to kluge this in 2002, but we can do it more cleanly now.	2017-01-07 16:02:16 -05:00
Tom Lane	c52d37c8b3	Invalidate cached plans on FDW option changes. This fixes problems where a plan must change but fails to do so, as seen in a bug report from Rajkumar Raghuwanshi. For ALTER FOREIGN TABLE OPTIONS, do this through the standard method of forcing a relcache flush on the table. For ALTER FOREIGN DATA WRAPPER and ALTER SERVER, just flush the whole plan cache on any change in pg_foreign_data_wrapper or pg_foreign_server. That matches the way we handle some other low-probability cases such as opclass changes, and it's unclear that the case arises often enough to be worth working harder. Besides, that gives a patch that is simple enough to back-patch with confidence. Back-patch to 9.3. In principle we could apply the code change to 9.2 as well, but (a) we lack postgres_fdw to test it with, (b) it's doubtful that anyone is doing anything exciting enough with FDWs that far back to need this desperately, and (c) the patch doesn't apply cleanly. Patch originally by Amit Langote, reviewed by Etsuro Fujita and Ashutosh Bapat, who each contributed substantial changes as well. Discussion: https://postgr.es/m/CAKcux6m5cA6rRPTKkqVdJ-R=KKDfe35Q_ZuUqxDSV_4hwga=og@mail.gmail.com	2017-01-06 14:12:52 -05:00
Robert Haas	0355e6f310	Repair commit `b81b5a96f4`. This commit purported to use a variable hash seed for Partial HashAggregate, but actually did the opposite - it made us use a variable seed for any HashAggregate that is NOT partial. Woops.	2017-01-06 09:34:26 -05:00
Robert Haas	e5b7451ea3	Fix possible leak of semaphore count. Commit `4aec49899e` reorganized the order of operations here so that we no longer increment the number of "extra waits" before locking the semaphore, but it did not change the starting value of extraWaits from 0 to -1 to compensate. In the worst case, this could leak a semaphore count, but that seems to be unlikely in practice. Discussion: http://postgr.es/m/CAA4eK1JyVqXiMba+-a589Rk0pyHsyKkGxeumVKjU6Y74hdrVLQ@mail.gmail.com Amit Kapila, per an off-list report by Dilip Kumar. Reviewed by me.	2017-01-05 14:33:14 -05:00
Robert Haas	175ff6598e	Fix possible crash reading pg_stat_activity. With the old code, a backend that read pg_stat_activity without ever having executed a parallel query might see a backend in the midst of executing one waiting on a DSA LWLock, resulting in a crash. The solution is for backends to register the tranche at startup time, not the first time a parallel query is executed. Report by Andreas Seltenreich. Patch by me, reviewed by Thomas Munro.	2017-01-05 12:27:09 -05:00
Tom Lane	82f8107b92	Fix handling of empty arrays in array_fill(). array_fill(..., array[0]) produced an empty array, which is probably what users expect, but it was a one-dimensional zero-length array which is not our standard representation of empty arrays. Also, for no very good reason, it rejected empty input arrays; that case should be allowed and produce an empty output array. In passing, remove the restriction that the input array(s) have lower bound 1. That seems rather pointless, and it would have needed extra complexity to make the check deal with empty input arrays. Per bug #14487 from Andrew Gierth. It's been broken all along, so back-patch to all supported branches. Discussion: https://postgr.es/m/20170105152156.10135.64195@wrigleys.postgresql.org	2017-01-05 11:33:51 -05:00
Tom Lane	d86f40009b	Handle OID column inheritance correctly in ALTER TABLE ... INHERIT. Inheritance operations must treat the OID column, if any, much like regular user columns. But MergeAttributesIntoExisting() neglected to do that, leading to weird results after a table with OIDs is associated to a parent with OIDs via ALTER TABLE ... INHERIT. Report and patch by Amit Langote, reviewed by Ashutosh Bapat, some adjustments by me. It's been broken all along, so back-patch to all supported branches. Discussion: https://postgr.es/m/cb13cfe7-a48c-5720-c383-bb843ab28298@lab.ntt.co.jp	2017-01-04 18:00:11 -05:00
Robert Haas	3633b3f656	Assorted code improvements for table partitioning. Michael Paquier, per Coverity.	2017-01-04 15:59:00 -05:00
Robert Haas	18fc5192a6	Remove unnecessary arguments from partitioning functions. RelationGetPartitionQual() and generate_partition_qual() are always called with recurse = true, so we don't need an argument for that. Extracted by me from a larger patch by Amit Langote.	2017-01-04 14:56:37 -05:00
Robert Haas	f1b4c771ea	Fix reporting of constraint violations for table partitioning. After a tuple is routed to a partition, it has been converted from the root table's row type to the partition's row type. ExecConstraints needs to report the failure using the original tuple and the parent's tuple descriptor rather than the ones for the selected partition. Amit Langote	2017-01-04 14:36:34 -05:00
Robert Haas	345b2dcf07	Move partition_tuple_slot out of EState. Commit `2ac3ef7a01` added a TupleTapleSlot for partition tuple slot to EState (es_partition_tuple_slot) but it's more logical to have it as part of ModifyTableState (mt_partition_tuple_slot) and CopyState (partition_tuple_slot). Discussion: http://postgr.es/m/1bd459d9-4c0c-197a-346e-e5e59e217d97@lab.ntt.co.jp Amit Langote, per a gripe from me	2017-01-04 13:16:59 -05:00
Tom Lane	6667d9a6d7	Re-allow SSL passphrase prompt at server start, but not thereafter. Leave OpenSSL's default passphrase collection callback in place during the first call of secure_initialize() in server startup. Although that doesn't work terribly well in daemon contexts, some people feel we should not break it for anyone who was successfully using it before. We still block passphrase demands during SIGHUP, meaning that you can't adjust SSL configuration on-the-fly if you used a passphrase, but this is no worse than what it was before commit `de41869b6`. And we block passphrase demands during EXEC_BACKEND reloads; that behavior wasn't useful either, but at least now it's documented. Tweak some related log messages for more readability, and avoid issuing essentially duplicate messages about reload failure caused by a passphrase. Discussion: https://postgr.es/m/29982.1483412575@sss.pgh.pa.us	2017-01-04 12:44:03 -05:00
Bruce Momjian	1d25779284	Update copyright via script for 2017	2017-01-03 13:48:53 -05:00
Tom Lane	1e942c7474	Disable prompting for passphrase while (re)loading SSL config files. OpenSSL's default behavior when loading a passphrase-protected key file is to open /dev/tty and demand the password from there. It was kinda sorta okay to allow that to happen at server start, but really that was never workable in standard daemon environments. And it was a complete fail on Windows, where the same thing would happen at every backend launch. Yesterday's commit `de41869b6` put the final nail in the coffin by causing that to happen at every SIGHUP; even if you've still got a terminal acting as the server's TTY, having the postmaster freeze until you enter the passphrase again isn't acceptable. Hence, override the default behavior with a callback that returns an empty string, ensuring failure. Change the documentation to say that you can't have a passphrase-protected server key, period. If we can think of a production-grade way of collecting a passphrase from somewhere, we might do that once at server startup and use this callback to feed it to OpenSSL, but it's far from clear that anyone cares enough to invest that much work in the feature. The lack of complaints about the existing fractionally-baked behavior suggests nobody's using it anyway. Discussion: https://postgr.es/m/29982.1483412575@sss.pgh.pa.us	2017-01-03 12:33:29 -05:00
Tom Lane	de41869b64	Allow SSL configuration to be updated at SIGHUP. It is no longer necessary to restart the server to enable, disable, or reconfigure SSL. Instead, we just create a new SSL_CTX struct (by re-reading all relevant files) whenever we get SIGHUP. Testing shows that this is fast enough that it shouldn't be a problem. In conjunction with that, downgrade the logic that complains about pg_hba.conf "hostssl" lines when SSL isn't active: now that's just a warning condition not an error. An issue that still needs to be addressed is what shall we do with passphrase-protected server keys? As this stands, the server would demand the passphrase again on every SIGHUP, which is certainly impractical. But the case was only barely supported before, so that does not seem a sufficient reason to hold up committing this patch. Andreas Karlsson, reviewed by Michael Banck and Michael Paquier Discussion: https://postgr.es/m/556A6E8A.9030400@proxel.se	2017-01-02 21:37:12 -05:00
Peter Eisentraut	db779d941e	Fix typo in comment	2016-12-29 11:27:41 -05:00
Peter Eisentraut	2e254130d1	Make more use of RoleSpec struct Most code was casting this through a generic Node. By declaring everything as RoleSpec appropriately, we can remove a bunch of casts and ad-hoc node type checking. Reviewed-by: Alvaro Herrera <alvherre@2ndquadrant.com>	2016-12-29 10:49:39 -05:00
Tom Lane	f0774abde8	Fix interval_transform so it doesn't throw away non-no-op casts. interval_transform() contained two separate bugs that caused it to sometimes mistakenly decide that a cast from interval to restricted interval is a no-op and throw it away. First, it was wrong to rely on dt.h's field type macros to have an ordering consistent with the field's significance; in one case they do not. This led to mistakenly treating YEAR as less significant than MONTH, so that a cast from INTERVAL MONTH to INTERVAL YEAR was incorrectly discarded. Second, fls(1<<k) produces k+1 not k, so comparing its output directly to SECOND was wrong. This led to supposing that a cast to INTERVAL MINUTE was really a cast to INTERVAL SECOND and so could be discarded. To fix, get rid of the use of fls(), and make a function based on intervaltypmodout to produce a field ID code adapted to the need here. Per bug #14479 from Piotr Stefaniak. Back-patch to 9.2 where transform functions were introduced, because this code was born broken. Discussion: https://postgr.es/m/20161227172307.10135.7747@wrigleys.postgresql.org	2016-12-27 15:43:54 -05:00
Magnus Hagander	6cfa54e384	Fix typo comments Erik Rijkers	2016-12-27 10:24:21 +01:00
Tom Lane	54386f3578	Remove triggerable Assert in hashname(). hashname() asserted that the key string it is given is shorter than NAMEDATALEN. That should surely always be true if the input is in fact a regular value of type "name". However, for reasons of coding convenience, we allow plain old C strings to be treated as "name" values in many places. Some SQL functions accept arbitrary "text" inputs, convert them to C strings, and pass them otherwise-untransformed to syscache lookups for name columns, allowing an overlength input value to trigger hashname's Assert. This would be a DOS problem, except that it only happens in assert-enabled builds which aren't recommended for production. In a production build, you'll just get a name lookup error, since regardless of the hash value computed by hashname, the later equality comparison checks can't match. Likewise, if the catalog lookup is done by seqscan or indexscan searches, there will just be a lookup error, since the name comparison functions don't contain any similar length checks, and will see an overlength input as unequal to any stored entry. After discussion we concluded that we should simply remove this Assert. It's inessential to hashname's own functionality, and having such an assertion in only some paths for name lookup is more of a foot-gun than a useful check. There may or may not be a case for the affected callers to do something other than let the name lookup fail, but we'll consider that separately; in any case we probably don't want to change such behavior in the back branches. Per report from Tushar Ahuja. Back-patch to all supported branches. Report: https://postgr.es/m/7d0809ee-6f25-c9d6-8e74-5b2967830d49@enterprisedb.com Discussion: https://postgr.es/m/17691.1482523168@sss.pgh.pa.us	2016-12-26 14:58:02 -05:00
Tom Lane	fe591f8bf6	Replace enum InhOption with simple boolean. Now that it has only INH_NO and INH_YES values, it's just weird that it's not a plain bool, so make it that way. Also rename RangeVar.inhOpt to "inh", to be like RangeTblEntry.inh. My recollection is that we gave it a different name specifically because it had a different representation than the derived bool value, but it no longer does. And this is a good forcing function to be sure we catch any places that are affected by the change. Bump catversion because of possible effect on stored RangeVar nodes. I'm not exactly convinced that we ever store RangeVar on disk, but we have a readfuncs function for it, so be cautious. (If we do do so, then commit `e13486eba` was in error not to bump catversion.) Follow-on to commit `e13486eba`. Discussion: http://postgr.es/m/CA+TgmoYe+EG7LdYX6pkcNxr4ygkP4+A=jm9o-CPXyOvRiCNwaQ@mail.gmail.com	2016-12-23 13:35:18 -05:00
Peter Eisentraut	158df30359	Remove unnecessary casts of makeNode() result makeNode() is already a macro that has the right result pointer type, so casting it again to the same type is unnecessary.	2016-12-23 13:17:20 -05:00
Tom Lane	ff33d1456e	Spellcheck: s/descendent/descendant/g I got a little annoyed by reading documentation paragraphs containing both spellings within a few lines of each other. My dictionary says "descendant" is the preferred spelling, and it's certainly the majority usage in our tree, so standardize on that. For one usage in parallel.sgml, I thought it better to rewrite to avoid the term altogether.	2016-12-23 11:53:35 -05:00
Robert Haas	e13486eba0	Remove sql_inheritance GUC. This backward-compatibility GUC is long overdue for removal. Discussion: http://postgr.es/m/CA+TgmoYe+EG7LdYX6pkcNxr4ygkP4+A=jm9o-CPXyOvRiCNwaQ@mail.gmail.com	2016-12-23 07:35:01 -05:00
Robert Haas	7819ba1ef6	Remove _hash_chgbufaccess(). This is basically for the same reasons I got rid of _hash_wrtbuf() in commit 25216c98938495fd741bf585dcbef45b3a9ffd40: it's not convenient to have a function which encapsulates MarkBufferDirty(), especially as we move towards having hash indexes be WAL-logged. Patch by me, reviewed (but not entirely endorsed) by Amit Kapila.	2016-12-23 07:14:37 -05:00
Robert Haas	2ac3ef7a01	Fix tuple routing in cases where tuple descriptors don't match. The previous coding failed to work correctly when we have a multi-level partitioned hierarchy where tables at successive levels have different attribute numbers for the partition key attributes. To fix, have each PartitionDispatch object store a standalone TupleTableSlot initialized with the TupleDesc of the corresponding partitioned table, along with a TupleConversionMap to map tuples from the its parent's rowtype to own rowtype. After tuple routing chooses a leaf partition, we must use the leaf partition's tuple descriptor, not the root table's. To that end, a dedicated TupleTableSlot for tuple routing is now allocated in EState. Amit Langote	2016-12-22 17:36:37 -05:00
Stephen Frost	12bd7dd317	Use TSConfigRelationId in AlterTSConfiguration() When we are altering a text search configuration, we are getting the tuple from pg_ts_config and using its OID, so use TSConfigRelationId when invoking any post-alter hooks and setting the object address. Further, in the functions called from AlterTSConfiguration(), we're saving information about the command via EventTriggerCollectAlterTSConfig(), so we should be setting commandCollected to true. Also add a regression test to test_ddl_deparse for ALTER TEXT SEARCH CONFIGURATION. Author: Artur Zakirov, a few additional comments by me Discussion: https://www.postgresql.org/message-id/57a71eba-f2c7-e7fd-6fc0-2126ec0b39bd%40postgrespro.ru Back-patch the fix for the InvokeObjectPostAlterHook() call to 9.3 where it was introduced, and the fix for the ObjectAddressSet() call and setting commandCollected to true to 9.5 where those changes to ProcessUtilitySlow() were introduced.	2016-12-22 17:08:43 -05:00
Tom Lane	1ead0208b2	Fix CREATE TABLE ... LIKE ... WITH OIDS. Having a WITH OIDS specification should result in the creation of an OID column, but commit `b943f502b` broke that in the case that there were LIKE tables without OIDS. Commentary in that patch makes it look like this was intentional, but if so it was based on a faulty reading of what inheritance does: the parent tables can add an OID column, but they can't subtract one. AFAICS, the behavior ought to be that you get an OID column if any of the inherited tables, LIKE tables, or WITH clause ask for one. Also, revert that patch's unnecessary split of transformCreateStmt's loop over the tableElts list into two passes. That seems to have been based on a misunderstanding as well: we already have two-pass processing here, we don't need three passes. Per bug #14474 from Jeff Dafoe. Back-patch to 9.6 where the misbehavior was introduced. Report: https://postgr.es/m/20161222145304.25620.47445@wrigleys.postgresql.org	2016-12-22 16:23:38 -05:00
Tom Lane	cd1b215692	Fix handling of expanded objects in CoerceToDomain and CASE execution. When the input value to a CoerceToDomain expression node is a read-write expanded datum, we should pass a read-only pointer to any domain CHECK expressions and then return the original read-write pointer as the expression result. Previously we were blindly passing the same pointer to all the consumers of the value, making it possible for a function in CHECK to modify or even delete the expanded value. (Since a plpgsql function will absorb a passed-in read-write expanded array as a local variable value, it will in fact delete the value on exit.) A similar hazard of passing the same read-write pointer to multiple consumers exists in domain_check() and in ExecEvalCase, so fix those too. The fix requires adding MakeExpandedObjectReadOnly calls at the appropriate places, which is simple enough except that we need to get the data type's typlen from somewhere. For the domain cases, solve this by redefining DomainConstraintRef.tcache as okay for callers to access; there wasn't any reason for the original convention against that, other than not wanting the API of typcache.c to be any wider than it had to be. For CASE, there's no good solution except to add a syscache lookup during executor start. Per bug #14472 from Marcos Castedo. Back-patch to 9.5 where expanded values were introduced. Discussion: https://postgr.es/m/15225.1482431619@sss.pgh.pa.us	2016-12-22 15:01:37 -05:00
Andres Freund	6ef2eba3f5	Skip checkpoints, archiving on idle systems. Some background activity (like checkpoints, archive timeout, standby snapshots) is not supposed to happen on an idle system. Unfortunately so far it was not easy to determine when a system is idle, which defeated some of the attempts to avoid redundant activity on an idle system. To make that easier, allow to make individual WAL insertions as not being "important". By checking whether any important activity happened since the last time an activity was performed, it now is easy to check whether some action needs to be repeated. Use the new facility for checkpoints, archive timeout and standby snapshots. The lack of a facility causes some issues in older releases, but in my opinion the consequences (superflous checkpoints / archived segments) aren't grave enough to warrant backpatching. Author: Michael Paquier, editorialized by Andres Freund Reviewed-By: Andres Freund, David Steele, Amit Kapila, Kyotaro HORIGUCHI Bug: #13685 Discussion: https://www.postgresql.org/message-id/20151016203031.3019.72930@wrigleys.postgresql.org https://www.postgresql.org/message-id/CAB7nPqQcPqxEM3S735Bd2RzApNqSNJVietAC=6kfkYv_45dKwA@mail.gmail.com Backpatch: -	2016-12-22 11:31:50 -08:00
Robert Haas	097e41439d	Fix broken error check in _hash_doinsert. You can't just cast a HashMetaPage to a Page, because the meta page data is stored after the page header, not at offset 0. Fortunately, this didn't break anything because it happens to find hashm_bsize at the offset at which it expects to find pd_pagesize_version, and the values are close enough to the same that this works out. Still, it's a bug, so back-patch to all supported versions. Mithun Cy, revised a bit by me.	2016-12-22 13:59:01 -05:00
Robert Haas	3ee8067284	Code review for ATExecAttachPartition. Amit Langote. Most of this reported by Álvaro Herrera.	2016-12-22 12:40:45 -05:00
Heikki Linnakangas	01ec25631f	Simplify tape block format. No more indirect blocks. The blocks form a linked list instead. This saves some memory, because we don't need to have a buffer in memory to hold the indirect block (or blocks). To reflect that, TAPE_BUFFER_OVERHEAD is reduced from 3 to 1 buffer, which allows using more memory for building the initial runs. Reviewed by Peter Geoghegan and Robert Haas. Discussion: https://www.postgresql.org/message-id/34678beb-938e-646e-db9f-a7def5c44ada%40iki.fi	2016-12-22 18:45:00 +02:00
Tom Lane	a8ae12322a	Fix detection of unfinished Unicode surrogate pair at end of string. The U&'...' and U&"..." syntaxes silently discarded a surrogate pair start (that is, a code between U+D800 and U+DBFF) if it occurred at the very end of the string. This seems like an obvious oversight, since we throw an error for every other invalid combination of surrogate characters, including the very same situation in E'...' syntax. This has been wrong since the pair processing was added (in 9.0), so back-patch to all supported branches. Discussion: https://postgr.es/m/19113.1482337898@sss.pgh.pa.us	2016-12-21 17:39:32 -05:00
Tom Lane	89fcea1ace	Fix strange behavior (and possible crashes) in full text phrase search. In an attempt to simplify the tsquery matching engine, the original phrase search patch invented rewrite rules that would rearrange a tsquery so that no AND/OR/NOT operator appeared below a PHRASE operator. But this approach had numerous problems. The rearrangement step was missed by ts_rewrite (and perhaps other places), allowing tsqueries to be created that would cause Assert failures or perhaps crashes at execution, as reported by Andreas Seltenreich. The rewrite rules effectively defined semantics for operators underneath PHRASE that were buggy, or at least unintuitive. And because rewriting was done in tsqueryin() rather than at execution, the rearrangement was user-visible, which is not very desirable --- for example, it might cause unexpected matches or failures to match in ts_rewrite. As a somewhat independent problem, the behavior of nested PHRASE operators was only sane for left-deep trees; queries like "x <-> (y <-> z)" did not behave intuitively at all. To fix, get rid of the rewrite logic altogether, and instead teach the tsquery execution engine to manage AND/OR/NOT below a PHRASE operator by explicitly computing the match location(s) and match widths for these operators. This requires introducing some additional fields into the publicly visible ExecPhraseData struct; but since there's no way for third-party code to pass such a struct to TS_phrase_execute, it shouldn't create an ABI problem as long as we don't move the offsets of the existing fields. Another related problem was that index searches supposed that "!x <-> y" could be lossily approximated as "!x & y", which isn't correct because the latter will reject, say, "x q y" which the query itself accepts. This required some tweaking in TS_execute_ternary along with the main tsquery engine. Back-patch to 9.6 where phrase operators were introduced. While this could be argued to change behavior more than we'd like in a stable branch, we have to do something about the crash hazards and index-vs-seqscan inconsistency, and it doesn't seem desirable to let the unintuitive behaviors induced by the rewriting implementation stand as precedent. Discussion: https://postgr.es/m/28215.1481999808@sss.pgh.pa.us Discussion: https://postgr.es/m/26706.1482087250@sss.pgh.pa.us	2016-12-21 15:18:39 -05:00
Dean Rasheed	58b1362642	Fix order of operations in CREATE OR REPLACE VIEW. When CREATE OR REPLACE VIEW acts on an existing view, don't update the view options until after the view query has been updated. This is necessary in the case where CREATE OR REPLACE VIEW is used on an existing view that is not updatable, and the new view is updatable and specifies the WITH CHECK OPTION. In this case, attempting to apply the new options to the view before updating its query fails, because the options are applied using the ALTER TABLE infrastructure which checks that WITH CHECK OPTION is only applied to an updatable view. If new columns are being added to the view, that is also done using the ALTER TABLE infrastructure, but it is important that that still be done before updating the view query, because the rules system checks that the query columns match those on the view relation. Added a comment to explain that, in case someone is tempted to move that to where the view options are now being set. Back-patch to 9.4 where WITH CHECK OPTION was added. Report: https://postgr.es/m/CAEZATCUp%3Dz%3Ds4SzZjr14bfct_bdJNwMPi-gFi3Xc5k1ntbsAgQ%40mail.gmail.com	2016-12-21 16:58:18 +00:00
Robert Haas	cd510f0413	Convert elog() to ereport() and do some wordsmithing. It's not entirely clear that we should log a message here at all, but it's certainly wrong to use elog() for a message that should clearly be translatable. Amit Langote	2016-12-21 11:47:50 -05:00
Robert Haas	1fc5c49450	Refactor partition tuple routing code to reduce duplication. Amit Langote	2016-12-21 11:36:10 -05:00
Robert Haas	3b790d256f	Fix corner-case bug in WaitEventSetWaitBlock on Windows. If we do not reset the FD_READ event, WaitForMultipleObjects won't return it again again unless we've meanwhile read from the socket, which is generally true but not guaranteed. WaitEventSetWaitBlock itself may fail to return the event to the caller if the latch is also set, and even if we changed that, the caller isn't obliged to handle all returned events at once. On non-Windows systems, the socket-read event is purely level-triggered, so this issue does not exist. To fix, make Windows reset the event when needed. This bug was introduced by `98a64d0bd7`, and causes hangs when trying to use the pldebugger extension. Patch by Amit Kapial. Reported and tested by Ashutosh Sharma, who also provided some analysis. Further analysis by Michael Paquier.	2016-12-21 11:01:48 -05:00
Robert Haas	59649c3f1c	Refactor merge path generation code. This shouldn't change the set of paths that get generated in any way, but it is preparatory work for further changes to allow a partial path to be merge-joined witih a non-partial path to produce a partial join path. Dilip Kumar, with cosmetic adjustments by me.	2016-12-21 09:45:50 -05:00
Peter Eisentraut	f3b421da5f	Reorder pg_sequence columns to avoid alignment issue On AIX, doubles are aligned at 4 bytes, but int64 is aligned at 8 bytes. Our code assumes that doubles have alignment that can also be applied to int64, but that fails in this case. One effect is that heap_form_tuple() writes tuples in a different layout than Form_pg_sequence expects. Rather than rewrite the whole alignment code, work around the issue by reordering the columns in pg_sequence so that the first int64 column naturally comes out at an 8-byte boundary.	2016-12-21 09:06:49 -05:00
Tom Lane	c080b223a7	Fix minor oversights in nodeAgg.c. aggstate->evalproj is always set up by ExecInitAgg, so there's no need to test. Doing so led Coverity to think that we might be intending "slot" to be possibly NULL here, and it quite properly complained that the rest of combine_aggregates() wasn't prepared for that. Also fix a couple of obvious thinkos in Asserts checking that "inputoff" isn't past the end of the slot. Errors introduced in commit `8ed3f11bb`, so no need for back-patch.	2016-12-20 19:22:02 -05:00
Tom Lane	7d41a2bd3e	Fix minor error message style violation. Primary error messages should not end with a period, since they're generally not written as full sentences. Oversight in `41493bac3`.	2016-12-20 18:54:13 -05:00
Peter Eisentraut	1753b1b027	Add pg_sequence system catalog Move sequence metadata (start, increment, etc.) into a proper system catalog instead of storing it in the sequence heap object. This separates the metadata from the sequence data. Sequence metadata is now operated on transactionally by DDL commands, whereas previously rollbacks of sequence-related DDL commands would be ignored. Reviewed-by: Andreas Karlsson <andreas@proxel.se>	2016-12-20 08:28:18 -05:00
Heikki Linnakangas	db80acfc9d	Fix sharing Agg transition state of DISTINCT or ordered aggs. If a query contained two aggregates that could share the transition value, we would correctly collect the input into a tuplesort only once, but incorrectly run the transition function over the accumulated input twice, in finalize_aggregates(). That caused a crash, when we tried to call tuplesort_performsort() on an already-freed NULL tuplestore. Backport to 9.6, where sharing of transition state and this bug were introduced. Analysis by Tom Lane. Discussion: https://www.postgresql.org/message-id/ac5b0b69-744c-9114-6218-8300ac920e61@iki.fi	2016-12-20 09:20:17 +02:00
Robert Haas	7cd0fd655d	Invalid parent's relcache after CREATE TABLE .. PARTITION OF. Otherwise, subsequent commands in the same transaction see the wrong partition descriptor. Amit Langote. Reported by Tomas Vondra and David Fetter. Reviewed by me. Discussion: http://postgr.es/m/22dd313b-d7fd-22b5-0787-654845c8f849%402ndquadrant.com Discussion: http://postgr.es/m/20161215090916.GB20659%40fetter.org	2016-12-19 22:53:30 -05:00
Robert Haas	e13029a5ce	Provide a DSA area for all parallel queries. This will allow future parallel query code to dynamically allocate storage shared by all participants. Thomas Munro, with assorted changes by me.	2016-12-19 17:11:46 -05:00
Tom Lane	2604438472	Fix handling of phrase operator removal while removing tsquery stopwords. The distance of a removed phrase operator should propagate up to a parent phrase operator if there is one, but this only worked correctly in left-deep trees. Throwing in a few parentheses confused it completely, as indeed was illustrated by bizarre results in existing regression test cases. To fix, track unaccounted-for distances that should propagate to the left and to the right of the current node, rather than trying to make it work with only one returned distance. Also make some adjustments to behave as well as we can for cases of intermixed phrase and regular (AND/OR) operators. I don't think it's possible to be 100% correct for that without a rethinking of the tsquery representation; for example, maybe we should just not drop stopword nodes at all underneath phrase operators. But this is better than it was, and changing tsquery representation wouldn't be safely back-patchable. While at it, I simplified the API of the clean_fakeval_intree function a bit by getting rid of the "char *result" output parameter; that wasn't doing anything that wasn't redundant with whether the result node is NULL or not, and testing for NULL seems a lot clearer/safer. This is part of a larger project to fix various infelicities in the phrase-search implementation, but this part seems comittable on its own. Back-patch to 9.6 where phrase operators were introduced. Discussion: https://postgr.es/m/28215.1481999808@sss.pgh.pa.us Discussion: https://postgr.es/m/26706.1482087250@sss.pgh.pa.us	2016-12-19 13:49:50 -05:00
Robert Haas	dd728826c5	Fix locking problem in _hash_squeezebucket() / _hash_freeovflpage(). A bucket squeeze operation needs to lock each page of the bucket before releasing the prior page, but the previous coding fumbled the locking when freeing an overflow page during a bucket squeeze operation. Commit `6d46f4783e` introduced this bug. Amit Kapila, with help from Kuntal Ghosh and Dilip Kumar, after an initial trouble report by Jeff Janes. Reviewed by me. I also fixed a problem with a comment.	2016-12-19 12:31:50 -05:00
Fujii Masao	3901fd70cc	Support quorum-based synchronous replication. This feature is also known as "quorum commit" especially in discussion on pgsql-hackers. This commit adds the following new syntaxes into synchronous_standby_names GUC. By using FIRST and ANY keywords, users can specify the method to choose synchronous standbys from the listed servers. FIRST num_sync (standby_name [, ...]) ANY num_sync (standby_name [, ...]) The keyword FIRST specifies a priority-based synchronous replication which was available also in 9.6 or before. This method makes transaction commits wait until their WAL records are replicated to num_sync synchronous standbys chosen based on their priorities. The keyword ANY specifies a quorum-based synchronous replication and makes transaction commits wait until their WAL records are replicated to at least num_sync listed standbys. In this method, the values of sync_state.pg_stat_replication for the listed standbys are reported as "quorum". The priority is still assigned to each standby, but not used in this method. The existing syntaxes having neither FIRST nor ANY keyword are still supported. They are the same as new syntax with FIRST keyword, i.e., a priorirty-based synchronous replication. Author: Masahiko Sawada Reviewed-By: Michael Paquier, Amit Kapila and me Discussion: <CAD21AoAACi9NeC_ecm+Vahm+MMA6nYh=Kqs3KB3np+MBOS_gZg@mail.gmail.com> Many thanks to the various individuals who were involved in discussing and developing this feature.	2016-12-19 21:15:30 +09:00
Magnus Hagander	10238fad03	Fix base backup rate limiting in presence of slow i/o When source i/o on disk was too slow compared to the rate limiting specified, the system could end up with a negative value for sleep that it never got out of, which caused rate limiting to effectively be turned off. Discussion: https://postgr.es/m/CABUevEy_-e0YvL4ayoX8bH_Ja9w%2BBHoP6jUgdxZuG2nEj3uAfQ%40mail.gmail.com Analysis by me, patch by Antonin Houska	2016-12-19 10:11:04 +01:00
Tom Lane	7fa93eec4e	Fix FK-based join selectivity estimation for semi/antijoins. This case wasn't thought through sufficiently in commit `100340e2d`. It's true that the FK proves that every outer row has a match in the inner table, but we forgot that some of the inner rows might be filtered away by WHERE conditions located within the semijoin's RHS. If the RHS is just one table, we can reasonably take the semijoin selectivity as equal to the fraction of the referenced table's rows that are expected to survive its restriction clauses. If the RHS is a join, it's not clear how much of the referenced table might get through the join, so fall back to the same rule we were already using for other outer-join cases: use the minimum of the regular per-clause selectivity estimates. This gives the same result as if we hadn't considered the FK at all when there's a single FK column, but it should still help for multi-column FKs, which is the case that `100340e2d` is really meant to help with. Back-patch to 9.6 where the previous commit came in. Discussion: https://postgr.es/m/16149.1481835103@sss.pgh.pa.us	2016-12-17 15:28:54 -05:00
Magnus Hagander	01776a07b3	Fix typos in comments Michael Paquier	2016-12-17 14:33:26 +01:00
Robert Haas	591ccb66d2	Fix outdated comment in lwlock.c Commit `3761fe3c20` should have made this change, but didn't. Reported by Álvaro Herrera.	2016-12-16 15:59:56 -05:00
Fujii Masao	93eb619cd3	Ensure that num_sync is greater than zero in synchronous_standby_names. Previously num_sync could be set to zero and this setting caused an assertion failure. This means that multiple synchronous standbys code should assume that num_sync is greater than zero. Also setting num_sync to zero is nonsense because it's basically the configuration for synchronous replication. If users want not to make transaction commits wait for any standbys, synchronous_standby_names should be emptied to disable synchronous replication instead of setting num_sync to zero. This patch forbids users from setting num_sync to zero in synchronous_standby_names. If zero is specified, an error will happen during processing the parameter settings. Back-patch to 9.6 where multiple synchronous standbys feature was added. Patch by me. Reviewed by Tom Lane. Discussion: <CAHGQGwHWB3izc6cXuFLh5kOcAbFXaRhhgwd-X5PeN9TEjxqXwg@mail.gmail.com>	2016-12-17 02:22:15 +09:00
Tom Lane	23c75b55aa	Improve documentation around TS_execute(). I got frustrated by the lack of commentary in this area, so here is some reverse-engineered documentation, along with minor stylistic cleanup. No code changes more significant than removal of unused variables. Back-patch to 9.6, not because that's useful in itself, but because we have some bugs to fix in phrase search and this would cause merge failures if it's only in HEAD.	2016-12-16 11:50:32 -05:00
Robert Haas	3761fe3c20	Simplify LWLock tranche machinery by removing array_base/array_stride. array_base and array_stride were added so that we could identify the offset of an LWLock within a tranche, but this facility is only very marginally used apart from the main tranche. So, give every lock in the main tranche its own tranche ID and get rid of array_base, array_stride, and all that's attached. For debugging facilities (Trace_lwlocks and LWLOCK_STATS) print the pointer address of the LWLock using %p instead of the offset. This is arguably more useful, and certainly a lot cheaper. Drop the offset-within-tranche from the information reported to dtrace and from one can't-happen message inside lwlock.c. The main user-visible impact of this change is that pg_stat_activity will now report all waits for LWLocks as "LWLock" rather than reporting some as "LWLockTranche" and others as "LWLockNamed". The main motivation for this change is that the need to specify an array_base and an array_stride is awkward for parallel query. There is only a very limited supply of tranche IDs so we can't just keep allocating new ones, and if we try to use the same tranche IDs every time then we run into trouble when multiple parallel contexts are use simultaneously. So if we didn't get rid of this mechanism we'd have to make it even more complicated. By simplifying it in this way, we instead reduce the size of the generated code for lwlock.c by about 5%. Discussion: http://postgr.es/m/CA+TgmoYsFn6NUW1x0AZtupJGUAs1UDY4dJtCN47_Q6D0sP80PA@mail.gmail.com	2016-12-16 11:29:23 -05:00
Robert Haas	b81b5a96f4	Unbreak Finalize HashAggregate over Partial HashAggregate. Commit `5dfc198146` introduced the use of a new type of hash table with linear reprobing for hash aggregates. Such a hash table behaves very poorly if keys are inserted in hash order, which does in fact happen in the case where a query use a Finalize HashAggregate node fed (via Gather) by a Partial HashAggregate node. In fact, queries with this type of plan tend to run effectively forever. Fix that by seeding the hash value differently in each worker (and in the leader, if it participates). Andres Freund and Robert Haas	2016-12-16 10:03:08 -05:00
Robert Haas	6a4fe1127c	Fix more hash index bugs around marking buffers dirty. In _hash_freeovflpage(), if we're freeing the overflow page that immediate follows the page to which tuples are being moved (the confusingly-named "write buffer"), don't forget to mark that page dirty after updating its hasho_nextblkno. In _hash_squeezebucket(), it's not necessary to mark the primary bucket page dirty if there are no overflow pages, because there's nothing to squeeze in that case. Amit Kapila, with help from Kuntal Ghosh and Dilip Kumar, after an initial trouble report by Jeff Janes.	2016-12-16 09:55:20 -05:00
Robert Haas	25216c9893	Remove _hash_wrtbuf() in favor of calling MarkBufferDirty(). The whole concept of _hash_wrtbuf() is that we need to know at the time we're releasing the buffer lock (and pin) whether we dirtied the buffer, but this is easy to get wrong. This patch actually fixes one non-obvious bug of that form: hashbucketcleanup forgot to signal _hash_squeezebucket, which gets the primary bucket page already locked, as to whether it had already dirtied the page. Calling MarkBufferDirty() at the places where we dirty the buffer is more intuitive and lets us simplify the code in various places as well. On top of all that, the ultimate goal here is to make hash indexes WAL-logged, and as the comments to _hash_wrtbuf() note, it should go away when that happens. Making it go away a little earlier than that seems like a good preparatory step. Report by Jeff Janes. Diagnosis by Amit Kapila, Kuntal Ghosh, and Dilip Kumar. Patch by me, after studying an alternative patch submitted by Amit Kapila. Discussion: http://postgr.es/m/CAA4eK1Kf6tOY0oVz_SEdngiNFkeXrA3xUSDPPORQvsWVPdKqnA@mail.gmail.com	2016-12-16 09:37:28 -05:00
Heikki Linnakangas	4f5182e18d	Fix off-by-one in memory allocation for quote_literal_cstr(). The calculation didn't take into account the NULL terminator. That lead to overwriting the palloc'd buffer by one byte, if the input consists entirely of backslashes. For example "format('%L', E'\\')". Fixes bug #14468. Backpatch to all supported versions. Report: https://www.postgresql.org/message-id/20161216105001.13334.42819%40wrigleys.postgresql.org	2016-12-16 12:53:04 +02:00
Tom Lane	1f542a2eac	Prevent planagg.c from failing on queries containing CTEs. The existing tests in preprocess_minmax_aggregates() usually prevent it from trying to do anything with queries containing CTEs, but there's an exception: a CTE could be present as a member of an appendrel, if we flattened a UNION ALL that contains CTE references. If it did try to generate an optimized path for a query using a CTE, it failed with "could not find plan for CTE", as reported by Torsten Förtsch. The proximate cause is an unwise decision in commit `3fc6e2d7f` to clear subroot->cte_plan_ids in build_minmax_path(). That left the subroot's cte_plan_ids list out of step with its parse->cteList. Removing the "subroot->cte_plan_ids = NIL;" assignment is enough to let the case work again, but really it's pretty silly to be expending any cycles at all in this module when there are CTEs: we always treat their outputs as unordered so there's no way for the optimization to win. Hence, also add an early-exit test so we don't waste time like that. Back-patch to 9.6 where the misbehavior was introduced. Report: https://postgr.es/m/CAKkG4_=gjY5QiHtqSZyWMwDuTd_CftKoTaCqxjJ7uUz1-Gw=qw@mail.gmail.com	2016-12-13 13:20:37 -05:00
Robert Haas	501c7b94bc	Fix bug in hashbulkdelete. Commit `6d46f4783e` failed to account for the possibility that hashbulkdelete() might encounter a bucket that has been split since it began scanning the bucket array. Repair. Extracted from a larger pathc by Amit Kapila; I rewrote the comment.	2016-12-13 12:16:02 -05:00
Robert Haas	a25665088d	Fix bugs in RelationGetPartitionDispatchInfo. The previous coding was not quite right for cases involving multiple levels of partitioning. Amit Langote	2016-12-13 11:29:08 -05:00
Robert Haas	4b9a98e154	Clean up code, comments, and formatting for table partitioning. Amit Langote, plus pgindent-ing by me. Inspired in part by review comments from Tomas Vondra.	2016-12-13 10:59:14 -05:00
Robert Haas	3856cf9607	Remove should_free arguments to tuplesort routines. Since commit `e94568ecc1`, the answer is always "false", and we do not need to complicate the API by arranging to return a constant value. Peter Geoghegan Discussion: http://postgr.es/m/CAM3SWZQWZZ_N=DmmL7tKy_OUjGH_5mN=N=A6h7kHyyDvEhg2DA@mail.gmail.com	2016-12-12 15:57:35 -05:00
Tom Lane	be7b2848c6	Make the different Unix-y semaphore implementations ABI-compatible. Previously, the "sem" field of PGPROC varied in size depending on which kernel semaphore API we were using. That was okay as long as there was only one likely choice per platform, but in the wake of commit `ecb0d20a9`, that assumption seems rather shaky. It doesn't seem out of the question anymore that an extension compiled against one API choice might be loaded into a postmaster built with another choice. Moreover, this prevents any possibility of selecting the semaphore API at postmaster startup, which might be something we want to do in future. Hence, change PGPROC.sem to be PGSemaphore (i.e. a pointer) for all Unix semaphore APIs, and turn the pointed-to data into an opaque struct whose contents are only known within the responsible modules. For the SysV and unnamed-POSIX APIs, the pointed-to data has to be allocated elsewhere in shared memory, which takes a little bit of rejiggering of the InitShmemAllocation code sequence. (I invented a ShmemAllocUnlocked() function to make that a little cleaner than it used to be. That function is not meant for any uses other than the ones it has now, but it beats having InitShmemAllocation() know explicitly about allocation of space for semaphores and spinlocks.) This change means an extra indirection to access the semaphore data, but since we only touch that when blocking or awakening a process, there shouldn't be any meaningful performance penalty. Moreover, at least for the unnamed-POSIX case on Linux, the sem_t type is quite a bit wider than a pointer, so this reduces sizeof(PGPROC) which seems like a good thing. For the named-POSIX API, there's effectively no change: the PGPROC.sem field was and still is a pointer to something returned by sem_open() in the postmaster's memory space. Document and check the pre-existing limitation that this case can't work in EXEC_BACKEND mode. It did not seem worth unifying the Windows semaphore ABI with the Unix cases, since there's no likelihood of needing ABI compatibility much less runtime switching across those cases. However, we can simplify the Windows code a bit if we define PGSemaphore as being directly a HANDLE, rather than pointer to HANDLE, so let's do that while we're here. (This also ends up being no change in what's physically stored in PGPROC.sem. We're just moving the HANDLE fetch from callees to callers.) It would take a bunch of additional code shuffling to get to the point of actually choosing a semaphore API at postmaster start, but the effects of that would now be localized in the port/XXX_sema.c files, so it seems like fit material for a separate patch. The need for it is unproven as yet, anyhow, whereas the ABI risk to extensions seems real enough. Discussion: https://postgr.es/m/4029.1481413370@sss.pgh.pa.us	2016-12-12 13:32:10 -05:00
Tom Lane	563d575fd7	Fix creative, but unportable, spelling of "ptr != NULL". Or at least I suppose that's what was really meant here. But even aside from the not-per-project-style use of "0" to mean "NULL", I doubt it's safe to assume that all valid pointers are > NULL. Per buildfarm member pademelon.	2016-12-12 11:23:23 -05:00
Peter Eisentraut	a924c327e2	Add support for temporary replication slots This allows creating temporary replication slots that are removed automatically at the end of the session or on error. From: Petr Jelinek <petr.jelinek@2ndquadrant.com>	2016-12-12 08:38:17 -05:00
Heikki Linnakangas	e7f051b8f9	Refactor the code for verifying user's password. Split md5_crypt_verify() into three functions: * get_role_password() to fetch user's password from pg_authid, and check its expiration. * md5_crypt_verify() to check an MD5 authentication challenge * plain_crypt_verify() to check a plaintext password. get_role_password() will be needed as a separate function by the upcoming SCRAM authentication patch set. Most of the remaining functionality in md5_crypt_verify() was different for MD5 and plaintext authentication, so split that for readability. While we're at it, simplify the *_crypt_verify functions by using stack-allocated buffers to hold the temporary MD5 hashes, instead of pallocing. Reviewed by Michael Paquier. Discussion: https://www.postgresql.org/message-id/3029e460-d47c-710e-507e-d8ba759d7cbb@iki.fi	2016-12-12 12:48:13 +02:00
Heikki Linnakangas	58445c5c8d	Further cleanup from the strong-random patch. Also use the new facility for generating RADIUS authenticator requests, and salt in chkpass extension. Reword the error messages to be nicer. Fix bogus error code used in the message in BackendStartup.	2016-12-12 11:55:32 +02:00
Heikki Linnakangas	41493bac36	Fix two thinkos related to strong random keys. pg_backend_random() is used for MD5 salt generation, but it can fail, and no checks were done on its status code. Fix memory leak, if generating a random number for a cancel key failed. Both issues were spotted by Coverity. Fix by Michael Paquier.	2016-12-12 09:58:32 +02:00
Tom Lane	0eaaaf00e2	Prevent crash when ts_rewrite() replaces a non-top-level subtree with null. When ts_rewrite()'s replacement argument is an empty tsquery, it's supposed to simplify any operator nodes whose operand(s) become NULL; but it failed to do that reliably, because dropvoidsubtree() only examined the top level of the result tree. Rather than make a second recursive pass, let's just give the responsibility to dofindsubquery() to simplify while it's doing the main replacement pass. Per report from Andreas Seltenreich. Artur Zakirov, with some cosmetic changes by me. Back-patch to all supported branches. Discussion: https://postgr.es/m/8737i01dew.fsf@credativ.de	2016-12-11 13:09:57 -05:00
Alvaro Herrera	a73491e5fe	Fix crasher bug in array_position(s) array_position and its cousin array_positions were caching the element type equality function's FmgrInfo without being careful enough to put it in a long-lived context. This is obviously broken but it didn't matter in most cases; only when using arrays of records (involving record_eq) it becomes a problem. The fix is to ensure that the type's equality function's FmgrInfo is cached in the array_position's flinfo->fn_mcxt rather than the current memory context. Apart from record types, the only other case that seems complex enough to possibly cause the same problem are range types. I didn't find a way to reproduce the problem with those, so I only include the test case submitted with the bug report as regression test. Bug report and patch: Junseok Yang Discussion: https://postgr.es/m/CAE+byMupUURYiZ6bKYgMZb9pgV1CYAijJGqWj-90W=nS7uEOeA@mail.gmail.com Backpatch to 9.5, where array_position appeared.	2016-12-09 12:42:17 -03:00
Heikki Linnakangas	64bc26f90d	Fix thinko in safeguard for negative availMem. Also, use pass read_buffer_size * numInputTapes rather than just availMem to USEMEM, to be neat. Peter Geoghegan.	2016-12-08 23:05:21 +02:00
Robert Haas	01ae881e1c	Fix bogus comment. Commit `4212cb7326` rendered a comment in execMain.c incorrect. Per complaint from Tom Lane, repair. Patch from Amit Kapila, per wording suggested by Tom Lane and me.	2016-12-08 14:59:46 -05:00
Robert Haas	ab4575dcf1	Silence compiler warning. Per report from Stephen Frost.	2016-12-08 14:55:47 -05:00
Robert Haas	fa0f466d53	Log the creation of an init fork unconditionally. Previously, it was thought that this only needed to be done for the benefit of possible standbys, so wal_level = minimal skipped it. But that's not safe, because during crash recovery we might replay XLOG_DBASE_CREATE or XLOG_TBLSPC_CREATE record which recursively removes the directory that contains the new init fork. So log it always. The user-visible effect of this bug is that if you create a database or tablespace, then create an unlogged table, then crash without checkpointing, then restart, accessing the table will fail, because the it won't have been properly reset. This commit fixes that. Michael Paquier, per a report from Konstantin Knizhnik. Wording of the comments per a suggestion from me.	2016-12-08 14:12:08 -05:00
Tom Lane	0b78106cd4	Fix reporting of column typmods for multi-row VALUES constructs. expandRTE() and get_rte_attribute_type() reported the exprType() and exprTypmod() values of the expressions in the first row of the VALUES as being the column type/typmod returned by the VALUES RTE. That's fine for the data type, since we coerce all expressions in a column to have the same common type. But we don't coerce them to have a common typmod, so it was possible for rows after the first one to return values that violate the claimed column typmod. This leads to the incorrect result seen in bug #14448 from Hassan Mahmood, as well as some other corner-case misbehaviors. The desired behavior is the same as we use in other type-unification cases: report the common typmod if there is one, but otherwise return -1 indicating no particular constraint. It's cheap for transformValuesClause to determine the common typmod while transforming a multi-row VALUES, but it'd be less cheap for expandRTE() and get_rte_attribute_type() to re-determine that info every time they're asked --- possibly a lot less cheap, if the VALUES has many rows. Therefore, the best fix is to record the common typmods explicitly in a list in the VALUES RTE, as we were already doing for column collations. This looks quite a bit like what we're doing for CTE RTEs, so we can save a little bit of space and code by unifying the representation for those two RTE types. They both now share coltypes/coltypmods/colcollations fields. (At some point it might seem desirable to populate those fields for all RTE types; but right now it looks like constructing them for other RTE types would add more code and cycles than it would save.) The RTE change requires a catversion bump, so this fix is only usable in HEAD. If we fix this at all in the back branches, the patch will need to look quite different. Report: https://postgr.es/m/20161205143037.4377.60754@wrigleys.postgresql.org Discussion: https://postgr.es/m/27429.1480968538@sss.pgh.pa.us	2016-12-08 11:40:02 -05:00
Heikki Linnakangas	fe7bdf0bf6	Clean up password authentication code a bit. Commit `fe0a0b59`, which moved code to do MD5 authentication to a separate CheckMD5Auth() function, left behind a comment that really belongs inside the function, too. Also move the check for db_user_namespace inside the function, seems clearer that way. Now that the md5 salt is passed as argument to md5_crypt_verify, it's a bit silly that it peeks into the Port struct to see if MD5 authentication was used. Seems more straightforward to treat it as an MD5 authentication, if the md5 salt argument is given. And after that, md5_crypt_verify only used the Port argument to look at port->user_name, but that is redundant, because it is also passed as a separate 'role' argument. So remove the Port argument altogether.	2016-12-08 13:44:47 +02:00
Heikki Linnakangas	f7d54f4f7d	Fix accounting of memory needed for merge heap. We allegedly allocated all remaining memory for the read buffers of the sort tapes, but we allocated the merge heap only after that. That means that the allocation of the merge heap was guaranteed to go over the memory limit. Fix by allocating the merge heap first. This makes little difference in practice, because the merge heap is tiny, but let's tidy. While we're at it, add a safeguard for the case that we are already over the limit when allocating the read buffers. That shouldn't happen, but better safe than sorry. The memory accounting error was reported off-list by Peter Geoghegan.	2016-12-08 10:15:57 +02:00
Robert Haas	f0e44751d7	Implement table partitioning. Table partitioning is like table inheritance and reuses much of the existing infrastructure, but there are some important differences. The parent is called a partitioned table and is always empty; it may not have indexes or non-inherited constraints, since those make no sense for a relation with no data of its own. The children are called partitions and contain all of the actual data. Each partition has an implicit partitioning constraint. Multiple inheritance is not allowed, and partitioning and inheritance can't be mixed. Partitions can't have extra columns and may not allow nulls unless the parent does. Tuples inserted into the parent are automatically routed to the correct partition, so tuple-routing ON INSERT triggers are not needed. Tuple routing isn't yet supported for partitions which are foreign tables, and it doesn't handle updates that cross partition boundaries. Currently, tables can be range-partitioned or list-partitioned. List partitioning is limited to a single column, but range partitioning can involve multiple columns. A partitioning "column" can be an expression. Because table partitioning is less general than table inheritance, it is hoped that it will be easier to reason about properties of partitions, and therefore that this will serve as a better foundation for a variety of possible optimizations, including query planner optimizations. The tuple routing based which this patch does based on the implicit partitioning constraints is an example of this, but it seems likely that many other useful optimizations are also possible. Amit Langote, reviewed and tested by Robert Haas, Ashutosh Bapat, Amit Kapila, Rajkumar Raghuwanshi, Corey Huinker, Jaime Casanova, Rushabh Lathia, Erik Rijkers, among others. Minor revisions by me.	2016-12-07 13:17:55 -05:00
Heikki Linnakangas	81f2e514a9	Fix query cancellation. In commit `fe0a0b59`, the datatype used for MyCancelKey and other variables that store cancel keys were changed from long to uint32, but I missed this one. That broke query cancellation on platforms where long is wider than 32 bits. Report by Andres Freund, fix by Michael Paquier.	2016-12-07 09:47:43 +02:00
Heikki Linnakangas	9790b87f59	Fix whitespace. Thomas Munro	2016-12-07 08:40:43 +02:00
Stephen Frost	d97b14ddab	Silence compiler warnings Rearrange a bit of code to ensure that 'mode' in LWLockRelease is obviously always set, which seems a bit cleaner and avoids a compiler warning (thanks to Robert for the suggestion!). In GetCachedPlan(), initialize 'plan' to silence a compiler warning, but also add an Assert() to make sure we don't ever actually fall through with 'plan' still being set to NULL, since we are about to dereference it. Neither of these appear to be live bugs but at least gcc 5.4.0-6ubuntu1~16.04.4 doesn't quite have the smarts to realize that. Discussion: https://www.postgresql.org/message-id/20161129152102.GR13284%40tamriel.snowman.net	2016-12-06 23:02:38 -05:00
Tom Lane	0645dacc37	Fix unsafe assumption that struct timeval.tv_sec is a "long". It typically is a "long", but it seems possible that on some platforms it wouldn't be. In any case, this silences a compiler warning on OpenBSD (cf buildfarm member curculio). While at it, use snprintf not sprintf. This format string couldn't possibly overrun the supplied buffer, but that doesn't seem like a good reason not to use the safer style. Oversight in commit `f828654e1`. Back-patch to 9.6 where that came in.	2016-12-06 19:52:34 -05:00
Robert Haas	4212cb7326	Fix interaction of parallel query with prepared statements. Previously, a prepared statement created via a Parse message could get a parallel plan, but one created with a PREPARE statement could not. This state of affairs was due to confusion on my (rhaas) part: I erroneously believed that a CREATE TABLE .. AS EXECUTE statement could only be performed with a prepared statement by PREPARE, but in fact one created by a Prepare message works just as well. Therefore, it makes no sense to allow parallel query in one case but not the other. To fix, allow parallel query with all prepared statements, but run the parallel plan serially (i.e. without workers) in the case of CREATE TABLE .. AS EXECUTE. Also, document this. Amit Kapila and Tobias Bussman, plus an extra sentence of documentation by me.	2016-12-06 11:11:54 -05:00
Robert Haas	53c7cff720	Ensure gatherstate->nextreader is properly initialized. The previously code worked OK as long as a Gather node was never rescanned, or if it was rescanned, as long as it got at least as many workers on rescan as it had originally. But if the number of workers ever decreased on a rescan, then it could crash. Andreas Seltenreich	2016-12-05 15:54:28 -05:00
Stephen Frost	093129c9d9	Add support for restrictive RLS policies We have had support for restrictive RLS policies since 9.5, but they were only available through extensions which use the appropriate hooks. This adds support into the grammer, catalog, psql and pg_dump for restrictive RLS policies, thus reducing the cases where an extension is necessary. In passing, also move away from using "AND"d and "OR"d in comments. As pointed out by Alvaro, it's not really appropriate to attempt to make verbs out of "AND" and "OR", so reword those comments which attempted to. Reviewed By: Jeevan Chalke, Dean Rasheed Discussion: https://postgr.es/m/20160901063404.GY4028@tamriel.snowman.net	2016-12-05 15:50:55 -05:00
Robert Haas	2bbdc6875d	dsa: Cope with the possibility that SIZE_MAX is not defined. Per buildfarm member gaur and Tom Lane.	2016-12-05 15:22:33 -05:00
Robert Haas	2f4193c350	Fix race introduced by `6d46f4783e`. It's possible for the metapage contents to change after we release the lock, so we must read them before releasing the lock. Amit Kapila. Submitted in response to a trouble report from Andreas Seltenreich, though it is not certain this fixes the problem.	2016-12-05 11:43:37 -05:00
Robert Haas	2b959d4957	Reduce the default for max_worker_processes back to 8. Commit `b460f5d669` -- at my suggestion -- increased the default value of max_worker_processes from 8 to 16, on the theory that this would be harmless and convenient for users. Unfortunately, this caused some buildfarm machines with low connection limits to start failing, so apparently it's not harmless after all.	2016-12-05 10:53:21 -05:00
Robert Haas	88f626f868	Fix more DSA problems uncovered by the buildfarm. On 32-bit systems, don't try to use 64-bit DSA pointers, because the computation of DSA_MAX_SEGMENT_SIZE overflows Size. Cast 1 to Size before shifting it, so that the compiler doesn't produce a result of the wrong width. In passing, change one use of size_t to Size.	2016-12-05 10:38:08 -05:00
Robert Haas	670b3bc8f5	Try to fix some DSA-related compiler warnings. Commit `13df76a537` was overconfident about how portable %016lx is. Some compilers complain because they need %016llx, while platforms where DSA pointers are only 32 bits get unhappy about using a 64-bit format for a 32-bit quantity. Thomas Munro, per an off-list suggestion from me.	2016-12-05 10:01:08 -05:00
Heikki Linnakangas	fe0a0b5993	Replace PostmasterRandom() with a stronger source, second attempt. This adds a new routine, pg_strong_random() for generating random bytes, for use in both frontend and backend. At the moment, it's only used in the backend, but the upcoming SCRAM authentication patches need strong random numbers in libpq as well. pg_strong_random() is based on, and replaces, the existing implementation in pgcrypto. It can acquire strong random numbers from a number of sources, depending on what's available: - OpenSSL RAND_bytes(), if built with OpenSSL - On Windows, the native cryptographic functions are used - /dev/urandom Unlike the current pgcrypto function, the source is chosen by configure. That makes it easier to test different implementations, and ensures that we don't accidentally fall back to a less secure implementation, if the primary source fails. All of those methods are quite reliable, it would be pretty surprising for them to fail, so we'd rather find out by failing hard. If no strong random source is available, we fall back to using erand48(), seeded from current timestamp, like PostmasterRandom() was. That isn't cryptographically secure, but allows us to still work on platforms that don't have any of the above stronger sources. Because it's not very secure, the built-in implementation is only used if explicitly requested with --disable-strong-random. This replaces the more complicated Fortuna algorithm we used to have in pgcrypto, which is unfortunate, but all modern platforms have /dev/urandom, so it doesn't seem worth the maintenance effort to keep that. pgcrypto functions that require strong random numbers will be disabled with --disable-strong-random. Original patch by Magnus Hagander, tons of further work by Michael Paquier and me. Discussion: https://www.postgresql.org/message-id/CAB7nPqRy3krN8quR9XujMVVHYtXJ0_60nqgVc6oUk8ygyVkZsA@mail.gmail.com Discussion: https://www.postgresql.org/message-id/CAB7nPqRWkNYRRPJA7-cF+LfroYV10pvjdz6GNvxk-Eee9FypKA@mail.gmail.com	2016-12-05 13:42:59 +02:00
Fujii Masao	5dc851afde	Fix incorrect output from gin_desc(). Previously gin_desc() displayed incorrect output "unknown action 0" for XLOG_GIN_INSERT and XLOG_GIN_VACUUM_DATA_LEAF_PAGE records with valid actions. The cause of this problem was that gin_desc() wrongly used XLogRecGetData() to extract data from those records. Since they were registered by XLogRegisterBufData(), gin_desc() should have used XLogRecGetBlockData(), instead, like gin_redo(). Also there were other differences about how to treat XLOG_GIN_INSERT record between gin_desc() and gin_redo(). This commit fixes gin_desc() routine so that it treats those records in the same way as gin_redo(). Batch-patch to 9.5 where WAL record format was revamped and XLogRegisterBufData() was added. Reported-By: Andres Freund Reviewed-By: Tom Lane Discussion: <20160509194645.7lewnpw647zegx2m@alap3.anarazel.de>	2016-12-05 20:29:41 +09:00
Tom Lane	3850723208	Don't mess up pstate->p_next_resno in transformOnConflictClause(). transformOnConflictClause incremented p_next_resno while generating the phony targetlist for the EXCLUDED pseudo-rel. Then that field got incremented some more during transformTargetList, possibly leading to free_parsestate concluding that we'd overrun the allowed length of a tlist, as reported by Justin Pryzby. We could fix this by resetting p_next_resno to 1 after using it for the EXCLUDED pseudo-rel tlist, but it seems easier and less coupled to other places if we just don't use that field at all in this loop. (Note that this doesn't change anything about the resnos that end up appearing in the main target list, because those are all replaced with target-column numbers by updateTargetListEntry.) In passing, fix incorrect type OID assigned to the whole-row Var for "EXCLUDED.*" (somehow this escaped having any bad consequences so far, but it's certainly wrong); remove useless assignment to var->location; pstrdup the column names in case of a relcache flush; and improve nearby comments. Back-patch to 9.5 where ON CONFLICT was introduced. Report: https://postgr.es/m/20161204163237.GA8030@telsasoft.com	2016-12-04 15:02:45 -05:00
Tom Lane	b3427dade1	Delete deleteWhatDependsOn() in favor of more performDeletion() flag bits. deleteWhatDependsOn() had grown an uncomfortably large number of assumptions about what it's used for. There are actually only two minor differences between what it does and what a regular performDeletion() call can do, so let's invent additional bits in performDeletion's existing flags argument that specify those behaviors, and get rid of deleteWhatDependsOn() as such. (We'd probably have done it this way from the start, except that performDeletion didn't originally have a flags argument, IIRC.) Also, add a SKIP_EXTENSIONS flag bit that prevents ever recursing to an extension, and use that when dropping temporary objects at session end. This provides a more general solution to the problem addressed in a hacky way in commit 08dd23cec: if an extension script creates temp objects and forgets to remove them again, the whole extension went away when its contained temp objects were deleted. The previous solution only covered temp relations, but this solves it for all object types. These changes require minor additions in dependency.c to pass the flags to subroutines that previously didn't get them, but it's still a net savings of code, and it seems cleaner than before. Having done this, revert the special-case code added in `08dd23cec` that prevented addition of pg_depend records for temp table extension membership, because that caused its own oddities: dropping an extension that had created such a table didn't automatically remove the table, leading to a failure if the table had another dependency on the extension (such as use of an extension data type), or to a duplicate-name failure if you then tried to recreate the extension. But we keep the part that prevents the pg_temp_nnn schema from becoming an extension member; we never want that to happen. Add a regression test case covering these behaviors. Although this fixes some arguable bugs, we've heard few field complaints, and any such problems are easily worked around by explicitly dropping temp objects at the end of extension scripts (which seems like good practice anyway). So I won't risk a back-patch. Discussion: https://postgr.es/m/e51f4311-f483-4dd0-1ccc-abec3c405110@BlueTreble.com	2016-12-02 14:57:55 -05:00
Robert Haas	13df76a537	Introduce dynamic shared memory areas. Programmers discovered decades ago that it was useful to have a simple interface for allocating and freeing memory, which is why malloc() and free() were invented. Unfortunately, those handy tools don't work with dynamic shared memory segments because those are specific to PostgreSQL and are not necessarily mapped at the same address in every cooperating process. So invent our own allocator instead. This makes it possible for processes cooperating as part of parallel query execution to allocate and free chunks of memory without having to reserve them prior to the start of execution. It could also be used for longer lived objects; for example, we could consider storing data for pg_stat_statements or the stats collector in shared memory using these interfaces, rather than writing them to files. Basically, anything that needs shared memory but can't predict in advance how much it's going to need might find this useful. Thomas Munro and Robert Haas. The original code (of mine) on which Thomas based his work was actually designed to be a new backend-local memory allocator for PostgreSQL, but that hasn't gone anywhere - or not yet, anyway. Thomas took that work and performed major refactoring and extensive modifications to make it work with dynamic shared memory, including the addition of appropriate locking. Discussion: CA+TgmobkeWptGwiNa+SGFWsTLzTzD-CeLz0KcE-y6LFgoUus4A@mail.gmail.com Discussion: CAEepm=1z5WLuNoJ80PaCvz6EtG9dN0j-KuHcHtU6QEfcPP5-qA@mail.gmail.com	2016-12-02 12:34:36 -05:00
Robert Haas	13e14a78ea	Management of free memory pages. This is intended as infrastructure for a full-fledged allocator for dynamic shared memory. The interface looks a bit like a real allocator, but only supports allocating and freeing memory in multiples of the 4kB page size. Further, to free memory, you must know the size of the span you wish to free, in pages. While these are make it unsuitable as an allocator in and of itself, it still serves as very useful scaffolding for a full-fledged allocator. Robert Haas and Thomas Munro. This code is mostly the same as my 2014 submission, but Thomas fixed quite a few bugs and made some changes to the interface. Discussion: CA+TgmobkeWptGwiNa+SGFWsTLzTzD-CeLz0KcE-y6LFgoUus4A@mail.gmail.com Discussion: CAEepm=1z5WLuNoJ80PaCvz6EtG9dN0j-KuHcHtU6QEfcPP5-qA@mail.gmail.com	2016-12-02 12:03:30 -05:00
Alvaro Herrera	5e5986b6cb	Fix outdated comments Commit `597a87ccc9` neglected to update some comments; fix. Report and patch by Thomas Munro. Reviewed by Petr Jelínek.	2016-12-02 10:15:36 -03:00
Robert Haas	b460f5d669	Add max_parallel_workers GUC. Increase the default value of the existing max_worker_processes GUC from 8 to 16, and add a new max_parallel_workers GUC with a maximum of 8. This way, even if the maximum amount of parallel query is happening, there is still room for background workers that do other things, as originally envisioned when max_worker_processes was added. Julien Rouhaud, reviewed by Amit Kapila and by revised by me.	2016-12-02 07:42:58 -05:00
Alvaro Herrera	fa2fa99552	Permit dump/reload of not-too-large >1GB tuples Our documentation states that our maximum field size is 1 GB, and that our maximum row size of 1.6 TB. However, while this might be attainable in theory with enough contortions, it is not workable in practice; for starters, pg_dump fails to dump tables containing rows larger than 1 GB, even if individual columns are well below the limit; and even if one does manage to manufacture a dump file containing a row that large, the server refuses to load it anyway. This commit enables dumping and reloading of such tuples, provided two conditions are met: 1. no single column is larger than 1 GB (in output size -- for bytea this includes the formatting overhead) 2. the whole row is not larger than 2 GB There are three related changes to enable this: a. StringInfo's API now has two additional functions that allow creating a string that grows beyond the typical 1GB limit (and "long" string). ABI compatibility is maintained. We still limit these strings to 2 GB, though, for reasons explained below. b. COPY now uses long StringInfos, so that pg_dump doesn't choke trying to emit rows longer than 1GB. c. heap_form_tuple now uses the MCXT_ALLOW_HUGE flag in its allocation for the input tuple, which means that large tuples are accepted on input. Note that at this point we do not apply any further limit to the input tuple size. The main reason to limit to 2 GB is that the FE/BE protocol uses 32 bit length words to describe each row; and because the documentation is ambiguous on its signedness and libpq does consider it signed, we cannot use the highest-order bit. Additionally, the StringInfo API uses "int" (which is 4 bytes wide in most platforms) in many places, so we'd need to change that API too in order to improve, which has lots of fallout. Backpatch to 9.5, which is the oldest that has MemoryContextAllocExtended, a necessary piece of infrastructure. We could apply to 9.4 with very minimal additional effort, but any further than that would require backpatching "huge" allocations too. This is the largest set of changes we could find that can be back-patched without breaking compatibility with existing systems. Fixing a bigger set of problems (for example, dumping tuples bigger than 2GB, or dumping fields bigger than 1GB) would require changing the FE/BE protocol and/or changing the StringInfo API in an ABI-incompatible way, neither of which would be back-patchable. Authors: Daniel Vérité, Álvaro Herrera Reviewed by: Tomas Vondra Discussion: https://postgr.es/m/20160229183023.GA286012@alvherre.pgsql	2016-12-02 00:34:01 -03:00
Peter Eisentraut	78c8c81439	Refactor libpqwalreceiver The whole walreceiver API is now wrapped into a struct, like most of our other loadable module APIs. The libpq connection is no longer a global variable in libpqwalreceiver. Instead, it is encapsulated into a struct that is passed around the functions. This allows multiple walreceivers to run at the same time. Add some rudimentary support for logical replication connections to libpqwalreceiver. These changes are mostly cosmetic and are going to be useful for the future logical replication patches. From: Petr Jelinek <petr@2ndquadrant.com>	2016-12-01 20:23:28 -05:00
Peter Eisentraut	597a87ccc9	Use latch instead of select() in walreceiver Replace use of poll()/select() by WaitLatchOrSocket(), which is more portable and flexible. Also change walreceiver to use its procLatch instead of a custom latch. From: Petr Jelinek <petr@2ndquadrant.com>	2016-12-01 20:23:28 -05:00
Peter Eisentraut	b999c247a5	Add aggregate_with_argtypes and use it consistently This works like function_with_argtypes, but aggregates allow slightly different arguments. Reviewed-by: Alvaro Herrera <alvherre@2ndquadrant.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>	2016-12-01 17:38:49 -05:00
Peter Eisentraut	e696dccec1	Move function_with_argtypes to a better location It was apparently added for use by GRANT/REVOKE, but move it closer to where other function signature related things are kept. Reviewed-by: Alvaro Herrera <alvherre@2ndquadrant.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>	2016-12-01 17:38:44 -05:00
Peter Eisentraut	0aff9293bf	Use grammar symbol function_with_argtypes consistently Instead of sometimes referring to a function signature like func_name func_args, use the existing function_with_argtypes symbol, which combines the two. Reviewed-by: Alvaro Herrera <alvherre@2ndquadrant.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>	2016-12-01 17:37:42 -05:00
Andres Freund	fc4b3dea29	User narrower representative tuples in the hash-agg hashtable. So far the hashtable stored representative tuples in the form of its input slot, with all columns in the hashtable that are not needed (i.e. not grouped upon or functionally dependent) set to NULL. Thats good for saving memory, but it turns out that having tuples full of NULL isn't free. slot_deform_tuple is faster if there's no NULL bitmap even if no NULLs are encountered, and skipping over leading NULLs isn't free. So compute a separate tuple descriptor that only contains the needed columns. As columns have already been moved in/out the slot for the hashtable that does not imply additional per-row overhead. Author: Andres Freund Reviewed-By: Heikki Linnakangas Discussion: https://postgr.es/m/20161103110721.h5i5t5saxfk5eeik@alap3.anarazel.de	2016-11-30 17:30:09 -08:00
Andres Freund	8ed3f11bb0	Perform one only projection to compute agg arguments. Previously we did a ExecProject() for each individual aggregate argument. That turned out to be a performance bottleneck in queries with multiple aggregates. Doing all the argument computations in one ExecProject() is quite a bit cheaper because ExecProject's fastpath can do the work at once in a relatively tight loop, and because it can get all the required columns with a single slot_getsomeattr and save some other redundant setup costs. Author: Andres Freund Reviewed-By: Heikki Linnakangas Discussion: https://postgr.es/m/20161103110721.h5i5t5saxfk5eeik@alap3.anarazel.de	2016-11-30 16:20:24 -08:00
Robert Haas	6d46f4783e	Improve hash index bucket split behavior. Previously, the right to split a bucket was represented by a heavyweight lock on the page number of the primary bucket page. Unfortunately, this meant that every scan needed to take a heavyweight lock on that bucket also, which was bad for concurrency. Instead, use a cleanup lock on the primary bucket page to indicate the right to begin a split, so that scans only need to retain a pin on that page, which is they would have to acquire anyway, and which is also much cheaper. In addition to reducing the locking cost, this also avoids locking out scans and inserts for the entire lifetime of the split: while the new bucket is being populated with copies of the appropriate tuples from the old bucket, scans and inserts can happen in parallel. There are minor concurrency improvements for vacuum operations as well, though the situation there is still far from ideal. This patch also removes the unworldly assumption that a split will never be interrupted. With the new code, a split is done in a series of small steps and the system can pick up where it left off if it is interrupted prior to completion. While this patch does not itself add write-ahead logging for hash indexes, it is clearly a necessary first step, since one of the things that could interrupt a split is the removal of electrical power from the machine performing it. Amit Kapila. I wrote the original design on which this patch is based, and did a good bit of work on the comments and README through multiple rounds of review, but all of the code is Amit's. Also reviewed by Jesper Pedersen, Jeff Janes, and others. Discussion: http://postgr.es/m/CAA4eK1LfzcZYxLoXS874Ad0+S-ZM60U9bwcyiUZx9mHZ-KCWhw@mail.gmail.com	2016-11-30 15:39:21 -05:00
Heikki Linnakangas	021d254d9a	Make all unicode perl scripts to use strict, rearrange logic for clarity. The loops were a bit difficult to understand, due to breaking out of them early. Also fix things that perlcritic complained about. Daniel Gustafsson	2016-11-30 18:06:34 +02:00
Heikki Linnakangas	1de9cc0dcc	Rewrite the perl scripts to produce our Unicode conversion tables. Generate EUC_CN mappings from gb-18030-2000.xml, because GB2312.TXT is no longer available. Get UHC from windows-949-2000.xml, it's more up-to-date. Plus tons more small changes. With these changes, the perl scripts faithfully produce the *.map files we have in the repository, from the external source files. In the passing, fix the Makefile to also download CP932.TXT and CP950.TXT. Based on patches by Kyotaro Horiguchi, reviewed by Daniel Gustafsson. Discussion: https://postgr.es/m/08e7892a-d55c-eefe-76e6-7910bc8dd1f3@iki.fi	2016-11-30 14:54:52 +02:00
Heikki Linnakangas	6c303223be	Remove leading zeros, for consistency with other map files. The common style is to pad to 4 digits. Running the current perl scripts to generate these map files would override this change, but the next commit will rewrite the perl scripts to produce this style. I'm doing this as a separate commit, to make it more clear what non-cosmetic changes the next commit makes to the map files. Discussion: https://postgr.es/m/08e7892a-d55c-eefe-76e6-7910bc8dd1f3@iki.fi	2016-11-30 14:54:41 +02:00
Heikki Linnakangas	2c09c93ce1	Remove code points < 0x80 from character conversion tables. PostgreSQL treats characters with < 0x80 leading byte as plain ASCII, and they are not even passed to the conversion routines. There is no point in having them in the conversion tables. Everything in the tables were direct ASCII-ASCII mappings, except for two: * SHIFT_JIS_2004 code point 0x5C (backslash in ASCII) was mapped to Unicode YEN SIGN character. * Unicode 0x5C (backslash again) was mapped to "REVERSE SOLIDUS" in SHIFT_JIS_2004 These mappings never had any effect, so there's no functional change from removing them. Discussion: https://postgr.es/m/08e7892a-d55c-eefe-76e6-7910bc8dd1f3@iki.fi	2016-11-30 14:53:57 +02:00
Tom Lane	41e2b84ce1	Fix bogus handling of JOIN_UNIQUE_OUTER/INNER cases for parallel joins. consider_parallel_nestloop passed the wrong jointype down to its subroutines for JOIN_UNIQUE_INNER cases (it should pass JOIN_INNER), and it thought that it could pass paths other than innerrel->cheapest_total_path to create_unique_path, which create_unique_path is not on board with. These bugs would lead to assertion failures or other errors, suggesting that this code path hasn't been tested much. hash_inner_and_outer's code for parallel join effectively treated both JOIN_UNIQUE_OUTER and JOIN_UNIQUE_INNER the same as JOIN_INNER (for different reasons :-(), leading to incorrect plans that treated a semijoin as if it were a plain join. Michael Day submitted a test case demonstrating that hash_inner_and_outer failed for JOIN_UNIQUE_OUTER, and I found the other cases through code review. Report: https://postgr.es/m/D0E8A029-D1AC-42E8-979A-5DE4A77E4413@rcmail.com	2016-11-29 19:32:35 -05:00
Tom Lane	ca5f885020	Improve eqjoinsel_semi's behavior for small inner relations with no stats. If we don't have any MCV statistics for the inner relation, and we don't trust its numdistinct estimate either, eqjoinsel_semi falls back to a very conservative estimate (that 50% of the outer rows have matches). This is particularly problematic if the inner relation is completely empty, since then even an explicit ANALYZE won't produce any pg_statistic entries, so there's no way to budge the planner off the bad estimate. We'd produce a better estimate in such cases if we used the nd2/nd1 selectivity heuristic, so an easy fix is to treat the nd2 estimate as non-default if we derive it from clamping to the inner rel's rowcount estimate. This won't fix every related case (mainly because the rowcount estimate might be larger than DEFAULT_NUM_DISTINCT), but it seems like a sane extension of the existing logic, so let's apply the change in HEAD and see if anyone complains. Per bug #14438 from Nikolay Nikitin. Report: https://postgr.es/m/20161128182113.6527.58926@wrigleys.postgresql.org Discussion: https://postgr.es/m/31089.1480384713@sss.pgh.pa.us	2016-11-29 18:00:56 -05:00
Peter Eisentraut	96fb4c90e3	Straighten out some whitespace	2016-11-29 15:08:14 -05:00
Tom Lane	11da83a0e7	Add uuid to the set of types supported by contrib/btree_gist. Paul Jungwirth, reviewed and hacked on by Teodor Sigaev, Ildus Kurbangaliev, Adam Brusselback, Chris Bandy, and myself. Discussion: https://postgr.es/m/CA+renyUEE29=X01JXdz8_TQvo6n9=2XoEBBRnQ8rkLyr+kjPxQ@mail.gmail.com Discussion: https://postgr.es/m/55F6EE82.8080209@sigaev.ru	2016-11-29 14:08:34 -05:00
Tom Lane	d6c8b34e95	Fix incorrect variable type in set_rel_consider_parallel(). func_parallel() returns char not Oid. Harmless, but still wrong. Amit Langote	2016-11-29 11:07:02 -05:00
Tom Lane	4e20511d5b	Fix estimate_expression_value to constant-fold SQLValueFunction nodes. Oversight in my commit `0bb51aa96`. Noted while poking at a recent bug report --- HEAD's estimates for a query using CURRENT_DATE were unexpectedly much worse than 9.6's.	2016-11-28 19:08:45 -05:00
Alvaro Herrera	eb68141688	Fix get_relation_info name typo'ed in a comment Plus add a missing comment about this in get_relation_info itself. Author: Amit Langote Discussion: https://postgr.es/m/e46c0569-0449-afa0-e2fe-f3776e4b3fd5@lab.ntt.co.jp	2016-11-28 15:56:00 -03:00
Tom Lane	dafa0848da	Code review for early drop of orphaned temp relations in autovacuum. Commit `a734fd5d1` exposed some race conditions that existed previously in the autovac code, but were basically harmless because autovac would not try to delete orphaned relations immediately. Specifically, the test for orphaned-ness was made on a pg_class tuple that might be dead by now, allowing autovac to try to remove a table that the owning backend had just finished deleting. This resulted in a hard crash due to inadequate caution about accessing the table's catalog entries without any lock. We must take a relation lock and then recheck whether the table is still present and still looks deletable before we do anything. Also, it seemed to me that deleting multiple tables per transaction, and trying to continue after errors, represented unjustifiable complexity. We do not expect this code path to be taken often in the field, nor even during testing, which means that prioritizing performance over correctness is a bad tradeoff. Rip all that out in favor of just starting a new transaction after each successful temp table deletion. If we're unlucky enough to get an error, which shouldn't happen anyway now that we're being more cautious, let the autovacuum worker fail as it normally would. In passing, improve the order of operations in the initial scan loop. Now that we don't care about whether a temp table is a wraparound hazard, there's no need to perform extract_autovac_opts, get_pgstat_tabentry_relid, or relation_needs_vacanalyze for temp tables. Also, if GetTempNamespaceBackendId returns InvalidBackendId (indicating it doesn't recognize the schema as temp), treat that as meaning it's NOT an orphaned temp table, not that it IS one, which is what happened before because BackendIdGetProc necessarily failed. The case really shouldn't come up for a table that has RELPERSISTENCE_TEMP, but the consequences if it did seem undesirable. (This might represent a back-patchable bug fix; not sure if it's worth the trouble.) Discussion: https://postgr.es/m/21299.1480272347@sss.pgh.pa.us	2016-11-27 21:23:39 -05:00
Tom Lane	182db07040	Fix test about ignoring extension dependencies during extension scripts. Commit `08dd23cec` introduced an exception to the rule that extension member objects can only be dropped as part of dropping the whole extension, intending to allow such drops while running the extension's own creation or update scripts. However, the exception was only applied at the outermost recursion level, because it was modeled on a pre-existing check to ignore dependencies on objects listed in pendingObjects. Bug #14434 from Philippe Beaudoin shows that this is inadequate: in some cases we can reach an extension member object by recursion from another one. (The bug concerns the serial-sequence case; I'm not sure if there are other cases, but there might well be.) To fix, revert 08dd23cec's changes to findDependentObjects() and instead apply the creating_extension exception regardless of stack level. Having seen this example, I'm a bit suspicious that the pendingObjects logic is also wrong and such cases should likewise be allowed at any recursion level. However, changing that would interact in subtle ways with the recursion logic (at least it would need to be moved to after the recursing-from check). Given that the code's been like that a long time, I'll refrain from touching it without a clear example showing it's wrong. Back-patch to all active branches. In HEAD and 9.6, where suitable test infrastructure exists, add a regression test case based on the bug report. Report: <20161125151448.6529.33039@wrigleys.postgresql.org> Discussion: <13224.1480177514@sss.pgh.pa.us>	2016-11-26 13:31:35 -05:00
Tom Lane	dbdfd114f3	Bring some clarity to the defaults for the xxx_flush_after parameters. Instead of confusingly stating platform-dependent defaults for these parameters in the comments in postgresql.conf.sample (with the main entry being a lie on Linux), teach initdb to install the correct platform-dependent value in postgresql.conf, similarly to the way we handle other platform-dependent defaults. This won't do anything for existing 9.6 installations, but since it's effectively only a documentation improvement, that seems OK. Since this requires initdb to have access to the default values, move the #define's for those to pg_config_manual.h; the original placement in bufmgr.h is unworkable because that file can't be included by frontend programs. Adjust the default value for wal_writer_flush_after so that it is 1MB regardless of XLOG_BLCKSZ, conforming to what is stated in both the SGML docs and postgresql.conf. (We could alternatively make it scale with XLOG_BLCKSZ, but I'm not sure I see the point.) Copy-edit related SGML documentation. Fabien Coelho and Tom Lane, per a gripe from Tomas Vondra. Discussion: <30ebc6e3-8358-09cf-44a8-578252938424@2ndquadrant.com>	2016-11-25 18:36:10 -05:00
Tom Lane	ab77a5a456	Mark a query's topmost Paths parallel-unsafe if they will have initPlans. Andreas Seltenreich found another case where we were being too optimistic about allowing a plan to be considered parallelizable despite it containing initPlans. It seems like the real issue here is that if we know we are going to tack initPlans onto the topmost Plan node for a subquery, we had better mark that subquery's result Paths as not-parallel-safe. That fixes this problem and allows reversion of a kluge (added in commit `7b67a0a49` and extended in `f24cf960d`) to not trust the parallel_safe flag at top level. Discussion: <874m2w4k5d.fsf@ex.ansel.ydns.eu>	2016-11-25 16:20:12 -05:00
Tom Lane	4e026b32d4	Check for pending trigger events on far end when dropping an FK constraint. When dropping a foreign key constraint with ALTER TABLE DROP CONSTRAINT, we refuse the drop if there are any pending trigger events on the named table; this ensures that we won't remove the pg_trigger row that will be consulted by those events. But we should make the same check for the referenced relation, else we might remove a due-to-be-referenced pg_trigger row for that relation too, resulting in "could not find trigger NNN" or "relation NNN has no triggers" errors at commit. Per bug #14431 from Benjie Gillam. Back-patch to all supported branches. Report: <20161124114911.6530.31200@wrigleys.postgresql.org>	2016-11-25 13:44:47 -05:00
Magnus Hagander	8afb811088	Fix typo in comment Thomas Munro	2016-11-25 13:06:19 +01:00
Alvaro Herrera	4aaddf2f00	Fix commit_ts for FrozenXid and BootstrapXid Previously, requesting commit timestamp for transactions FrozenTransactionId and BootstrapTransactionId resulted in an error. But since those values can validly appear in committed tuples' Xmin, this behavior is unhelpful and error prone: each caller would have to special-case those values before requesting timestamp data for an Xid. We already have a perfectly good interface for returning "the Xid you requested is too old for us to have commit TS data for it", so let's use that instead. Backpatch to 9.5, where commit timestamps appeared. Author: Craig Ringer Discussion: https://www.postgresql.org/message-id/CAMsr+YFM5Q=+ry3mKvWEqRTxrB0iU3qUSRnS28nz6FJYtBwhJg@mail.gmail.com	2016-11-24 15:39:55 -03:00
Tom Lane	6fa391be4e	Avoid masking a function parameter name with a local variable name. No actual bug here, but it might confuse readers, so change the name of the local variable. Ashutosh Bapat	2016-11-23 16:26:40 -05:00
Tom Lane	bd673e8e86	Make sure ALTER TABLE preserves index tablespaces. When rebuilding an existing index, ALTER TABLE correctly kept the physical file in the same tablespace, but it messed up the pg_class entry if the index had been in the database's default tablespace and "default_tablespace" was set to some non-default tablespace. This led to an inaccessible index. Fix by fixing pg_get_indexdef_string() to always include a tablespace clause, whether or not the index is in the default tablespace. The previous behavior was installed in commit `537e92e41`, and I think it just wasn't thought through very clearly; certainly the possible effect of default_tablespace wasn't considered. There's some risk in changing the behavior of this function, but there are no other call sites in the core code. Even if it's being used by some third party extension, it's fairly hard to envision a usage that is okay with a tablespace clause being appended some of the time but can't handle it being appended all the time. Back-patch to all supported versions. Code fix by me, investigation and test cases by Michael Paquier. Discussion: <1479294998857-5930602.post@n3.nabble.com>	2016-11-23 13:45:55 -05:00
Robert Haas	e343dfa42b	Remove barrier.h A new thing also called a "barrier" is proposed, but whether we decide to take that patch or not, this file seems to have outlived its usefulness. Thomas Munro	2016-11-22 20:28:24 -05:00
Tom Lane	906bfcad7b	Improve handling of "UPDATE ... SET (column_list) = row_constructor". Previously, the right-hand side of a multiple-column assignment, if it wasn't a sub-SELECT, had to be a simple parenthesized expression list, because gram.y was responsible for "bursting" the construct into independent column assignments. This had the minor defect that you couldn't write ROW (though you should be able to, since the standard says this is a row constructor), and the rather larger defect that unlike other uses of row constructors, we would not expand a "foo." item into multiple columns. Fix that by changing the RHS to be just "a_expr" in the grammar, leaving it to transformMultiAssignRef to separate the elements of a RowExpr; which it will do only after performing standard transformation of the RowExpr, so that "foo." behaves as expected. The key reason we didn't do that before was the hard-wired handling of DEFAULT tokens (SetToDefault nodes). This patch deals with that issue by allowing DEFAULT in any a_expr and having parse analysis throw an error if SetToDefault is found in an unexpected place. That's an improvement anyway since the error can be more specific than just "syntax error". The SQL standard suggests that the RHS could be any a_expr yielding a suitable row value. This patch doesn't really move the goal posts in that respect --- you're still limited to RowExpr or a sub-SELECT --- but it does fix the grammar restriction, so it provides some tangible progress towards a full implementation. And the limitation is now documented by an explicit error message rather than an unhelpful "syntax error". Discussion: <8542.1479742008@sss.pgh.pa.us>	2016-11-22 15:20:10 -05:00
Robert Haas	e8ac886c24	Support condition variables. Condition variables provide a flexible way to sleep until a cooperating process causes an arbitrary condition to become true. In simple cases, this can be accomplished with a WaitLatch/ResetLatch loop; the cooperating process can call SetLatch after performing work that might cause the condition to be satisfied, and the waiting process can recheck the condition each time. However, if the process performing the work doesn't have an easy way to identify which processes might be waiting, this doesn't work, because it can't identify which latches to set. Condition variables solve that problem by internally maintaining a list of waiters; a process that may have caused some waiter's condition to be satisfied must "signal" or "broadcast" on the condition variable. Robert Haas and Thomas Munro	2016-11-22 14:27:11 -05:00
Tom Lane	ae92a9a380	Fix uninitialized variable. Oversight in `a734fd5d1`. Michael Paquier	2016-11-21 19:59:56 -05:00
Tom Lane	a4930e7ca2	Fix PGLC_localeconv() to handle errors better. The code was intentionally not very careful about leaking strdup'd strings in case of an error. That was forgivable probably, but it also failed to notice strdup() failures, which could lead to subsequent null-pointer-dereference crashes, since many callers unsurprisingly didn't check for null pointers in the struct lconv fields. An even worse problem is that it could throw error while we were setlocale'd to a non-C locale, causing unwanted behavior in subsequent libc calls. Rewrite to ensure that we cannot throw elog(ERROR) until after we've restored the previous locale settings, or at least attempted to. (I'm sorely tempted to make restore failure be a FATAL error, but will refrain for the moment.) Having done that, it's not much more work to ensure that we clean up strdup'd storage on the way out, too. This code is substantially the same in all supported branches, so back-patch all the way. Michael Paquier and Tom Lane Discussion: <CAB7nPqRMbGqa_mesopcn4MPyTs34eqtVEK7ELYxvvV=oqS00YA@mail.gmail.com>	2016-11-21 18:21:55 -05:00
Tom Lane	4324ade9a6	Fix optimization for skipping searches for parallel-query hazards. Fix thinko in commit da1c91631: even if the original query was free of parallel hazards, we might introduce such a hazard by adding PARAM_EXEC Param nodes. Adjust is_parallel_safe() so that it will scan the given expression whenever any such nodes have been created. Per report from Andreas Seltenreich. Discussion: <878tse6yvf.fsf@credativ.de>	2016-11-21 13:19:23 -05:00
Robert Haas	a734fd5d1c	autovacuum: Drop orphan temp tables more quickly but with more caution. Previously, we only dropped an orphan temp table when it became old enough to threaten wraparound; instead, doing it immediately. The only value of waiting is that someone might be able to examine the contents of the orphan temp table for forensic purposes, but it's pretty difficult to actually do that and few users will wish to do so. On the flip side, not performing the drop immediately generates log spam and bloats pg_class. In addition, per a report from Grigory Smolkin, if a temporary schema contains a very large number of temporary tables, a backend attempting to clear the temporary schema might fail due to lock table exhaustion. It's helpful for autovacuum to clean up after such cases, and we don't want it to wait for wraparound to threaten before doing so. To prevent autovacuum from failing in the same manner as a backend trying to drop an entire temp schema, remove orphan temp tables in batches of 50, committing after each batch, so that we don't accumulate an unbounded number of locks. If a drop fails, retry other orphan tables that need to be dropped up to 10 times before giving up. With this system, if a backend does fail to clean a temporary schema due to lock table exhaustion, autovacuum should hopefully put things right the next time it processes the database. Discussion: CAB7nPqSbYT6dRwsXVgiKmBdL_ARemfDZMPA+RPeC_ge0GK70hA@mail.gmail.com Michael Paquier, with a bunch of comment changes by me.	2016-11-21 13:01:50 -05:00

... 2 3 4 5 6 ...

16712 Commits