postgresql

mirror of https://git.postgresql.org/git/postgresql.git synced 2024-07-20 12:59:38 +02:00

Author	SHA1	Message	Date
Michael Paquier	803b1301e8	Add option SKIP_LOCKED to VACUUM and ANALYZE When specified, this option allows VACUUM to skip the work on a relation if there is a conflicting lock on it when trying to open it at the beginning of its processing. Similarly to autovacuum, this comes with a couple of limitations while the relation is processed which can cause the process to still block: - when opening the relation indexes. - when acquiring row samples for table inheritance trees, partition trees or certain types of foreign tables, and that a lock is taken on some leaves of such trees. Author: Nathan Bossart Reviewed-by: Michael Paquier, Andres Freund, Masahiko Sawada Discussion: https://postgr.es/m/9EF7EBE4-720D-4CF1-9D0E-4403D7E92990@amazon.com Discussion: https://postgr.es/m/20171201160907.27110.74730@wrigleys.postgresql.org	2018-10-04 09:00:33 +09:00
Andres Freund	c03c1449c0	Fix issues around EXPLAIN with JIT. I (Andres) was more than a bit hasty in committing `33001fd7a7` after last minute changes, leading to a number of problems (jit output was only shown for JIT in parallel workers, and just EXPLAIN without ANALYZE didn't work). Lukas luckily found these issues quickly. Instead of combining instrumentation in in standard_ExecutorEnd(), do so on demand in the new ExplainPrintJITSummary(). Also update a documentation example of the JIT output, changed in `52050ad8eb`. Author: Lukas Fittl, with minor changes by me Discussion: https://postgr.es/m/CAP53PkxmgJht69pabxBXJBM+0oc6kf3KHMborLP7H2ouJ0CCtQ@mail.gmail.com Backpatch: 11, where JIT compilation was introduced	2018-10-03 12:48:37 -07:00
Andres Freund	cc2905e963	Use slots more widely in tuple mapping code and make naming more consistent. It's inefficient to use a single slot for mapping between tuple descriptors for multiple tuples, as previously done when using ConvertPartitionTupleSlot(), as that means the slot's tuple descriptors change for every tuple. Previously we also, via ConvertPartitionTupleSlot(), built new tuples after the mapping even in cases where we, immediately afterwards, access individual columns again. Refactor the code so one slot, on demand, is used for each partition. That avoids having to change the descriptor (and allows to use the more efficient "fixed" tuple slots). Then use slot->slot mapping, to avoid unnecessarily forming a tuple. As the naming between the tuple and slot mapping functions wasn't consistent, rename them to execute_attr_map_{tuple,slot}. It's likely that we'll also rename convert_tuples_by_* to denote that these functions "only" build a map, but that's left for later. Author: Amit Khandekar and Amit Langote, editorialized by me Reviewed-By: Amit Langote, Amit Khandekar, Andres Freund Discussion: https://postgr.es/m/CAJ3gD9fR0wRNeAE8VqffNTyONS_UfFPRpqxhnD9Q42vZB+Jvpg@mail.gmail.com https://postgr.es/m/e4f9d743-cd4b-efb0-7574-da21d86a7f36%40lab.ntt.co.jp Backpatch: -	2018-10-02 11:14:26 -07:00
Michael Paquier	e3a25ab9ea	Refactor relation opening for VACUUM and ANALYZE VACUUM and ANALYZE share similar logic when it comes to opening a relation to work on in terms of how the relation is opened, in which order locks are tried and how logs should be generated when something does not work as expected. This commit refactors things so as both use the same code path to handle the way a relation is opened, so as the integration of new options becomes easier. Author: Michael Paquier Reviewed-by: Nathan Bossart Discussion: https://postgr.es/m/20180927075152.GT1659@paquier.xyz	2018-10-02 08:53:38 +09:00
Tom Lane	e27453bd83	Fix ALTER COLUMN TYPE to not open a relation without any lock. If the column being modified is referenced by a foreign key constraint of another table, ALTER TABLE would open the other table (to re-parse the constraint's definition) without having first obtained a lock on it. This was evidently intentional, but that doesn't mean it's really safe. It's especially not safe in 9.3, which pre-dates use of MVCC scans for catalog reads, but even in current releases it doesn't seem like a good idea. We know we'll need AccessExclusiveLock shortly to drop the obsoleted constraint, so just get that a little sooner to close the hole. Per testing with a patch that complains if we open a relation without holding any lock on it. I don't plan to back-patch that patch, but we should close the holes it identifies in all supported branches. Discussion: https://postgr.es/m/2038.1538335244@sss.pgh.pa.us	2018-10-01 11:39:13 -04:00
Tom Lane	fdba460a26	Create an RTE field to record the query's lock mode for each relation. Add RangeTblEntry.rellockmode, which records the appropriate lock mode for each RTE_RELATION rangetable entry (either AccessShareLock, RowShareLock, or RowExclusiveLock depending on the RTE's role in the query). This patch creates the field and makes all creators of RTE nodes fill it in reasonably, but for the moment nothing much is done with it. The plan is to replace assorted post-parser logic that re-determines the right lockmode to use with simple uses of rte->rellockmode. For now, just add Asserts in each of those places that the rellockmode matches what they are computing today. (In some cases the match isn't perfect, so the Asserts are weaker than you might expect; but this seems OK, as per discussion.) This passes check-world for me, but it seems worth pushing in this state to see if the buildfarm finds any problems in cases I failed to test. catversion bump due to change of stored rules. Amit Langote, reviewed by David Rowley and Jesper Pedersen, and whacked around a bit more by me Discussion: https://postgr.es/m/468c85d9-540e-66a2-1dde-fec2b741e688@lab.ntt.co.jp	2018-09-30 13:55:51 -04:00
Peter Eisentraut	0320ddaf3c	Recurse to sequences on ownership change for all relkinds When a table ownership is changed, we must apply that also to any owned sequences. (Otherwise, it would result in a situation that cannot be restored, because linked sequences must have the same owner as the table.) But this was previously only applied to regular tables and materialized views. But it should also apply to at least foreign tables. This patch removes the relkind check altogether, because it doesn't save very much and just introduces the possibility of similar omissions. Bug: #15238 Reported-by: Christoph Berg <christoph.berg@credativ.de>	2018-09-26 20:19:15 +02:00
Andres Freund	29c94e03c7	Split ExecStoreTuple into ExecStoreHeapTuple and ExecStoreBufferHeapTuple. Upcoming changes introduce further types of tuple table slots, in preparation of making table storage pluggable. New storage methods will have different representation of tuples, therefore the slot accessor should refer explicitly to heap tuples. Instead of just renaming the functions, split it into one function that accepts heap tuples not residing in buffers, and one accepting ones in buffers. Previously one function was used for both, but that was a bit awkward already, and splitting will allow us to represent slot types for tuples in buffers and normal memory separately. This is split out from the patch introducing abstract slots, as this largely consists out of mechanical changes. Author: Ashutosh Bapat Reviewed-By: Andres Freund Discussion: https://postgr.es/m/20180220224318.gw4oe5jadhpmcdnm@alap3.anarazel.de	2018-09-25 16:27:48 -07:00
Andres Freund	33001fd7a7	Collect JIT instrumentation from workers. Previously, when using parallel query, EXPLAIN (ANALYZE)'s JIT compilation timings did not include the overhead from doing so on the workers. Fix that. We do so by simply aggregating the cost of doing JIT compilation on workers and the leader together. Arguably that's not quite accurate, because the total time spend doing so is spent in parallel - but it's hard to do much better. For additional detail, when VERBOSE is specified, the stats for workers are displayed separately. Author: Amit Khandekar and Andres Freund Discussion: https://postgr.es/m/CAJ3gD9eLrz51RK_gTkod+71iDcjpB_N8eC6vU2AW-VicsAERpQ@mail.gmail.com Backpatch: 11-	2018-09-25 13:12:44 -07:00
Andres Freund	52050ad8eb	Make EXPLAIN output for JIT compilation more dense. A discussion about also reporting JIT compilation overhead on workers brought unhappiness with the verbosity of the current explain format to light. Make the text format more dense, and restructure the structured output to mirror that more closely. As we're re-jiggering the output format anyway: The denser format allows us to report all flags for JIT compilation (now also reporting PGJIT_EXPR and PGJIT_DEFORM), and report the total time in addition to the individual times. Per complaint from Tom Lane. Author: Andres Freund Discussion: https://postgr.es/m/27812.1537221015@sss.pgh.pa.us Backpatch: 11-, where JIT compilation was introduced	2018-09-24 13:35:45 -07:00
Andrew Dunstan	7636e5c60f	Fast default trigger and expand_tuple fixes Ensure that triggers get properly filled in tuples for the OLD value. Also fix the logic of detecting missing null values. The previous logic failed to detect a missing null column before the first missing column with a default. Fixing this has simplified the logic a bit. Regression tests are added to test changes. This should ensure better coverage of expand_tuple(). Original bug reports, and some code and test scripts from Tomas Vondra Backpatch to release 11.	2018-09-24 16:11:24 -04:00
Alvaro Herrera	20bef2c311	Fix ALTER/TYPE on columns referenced by FKs in partitioned tables When ALTER TABLE ... SET DATA TYPE affects a column referenced by constraints and indexes, it drop those constraints and indexes and recreates them afterwards, so that the definitions match the new data type. The original code did this by dropping one object at a time (commit `077db40fa1` of May 2004), which worked fine because the dependencies between the objects were pretty straightforward, and ordering the objects in a specific way was enough to make this work. However, when there are foreign key constraints in partitioned tables, the dependencies are no longer so straightforward, and we were getting errors when attempted: ERROR: cache lookup failed for constraint 16398 This can be fixed by doing all the drops in one pass instead, using performMultipleDeletions (introduced by `df18c51f29` of Aug 2006). With this change we can also remove the code to carefully order the list of objects to be deleted. Reported-by: Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/CAKcux6nWS_m+s=1Udk_U9B+QY7pA-Ac58qR5BdUfOyrwnWHDew@mail.gmail.com	2018-09-14 13:41:20 -03:00
Michael Paquier	28a8fa984c	Improve autovacuum logging for aggressive and anti-wraparound runs A log message was being generated when log_min_duration is reached for autovacuum on a given relation to indicate if it was an aggressive run, and missed the point of mentioning if it is doing an anti-wrapround run. The log message generated is improved so as one, both or no extra details are added depending on the option set. Author: Sergei Kornilov Reviewed-by: Masahiko Sawada, Michael Paquier Discussion: https://postgr.es/m/11587951532155118@sas1-19a94364928d.qloud-c.yandex.net	2018-09-14 07:35:39 +09:00
Michael Paquier	514a731ddc	Simplify static function in extension.c An extra argument for the filename defining the extension script location was present, aimed at being used for error reporting, but has never been used. This was around since extensions have been added in `d9572c4`. Author: Yugo Nagata Reviewed-by: Tatsuo Ishii Discussion: https://postgr.es/m/20180907180504.1ff19e1675bb44a67e9c7ab1@sraoss.co.jp	2018-09-13 16:56:57 +09:00
Michael Paquier	9226a3b89b	Remove duplicated words split across lines in comments This has been detected using some interesting tricks with sed, and the method used is mentioned in details in the discussion below. Author: Justin Pryzby Discussion: https://postgr.es/m/20180908013109.GB15350@telsasoft.com	2018-09-08 12:24:19 -07:00
Tom Lane	54b01b9293	Remove no-longer-used variable. Oversight in `2fbdf1b38`. Per buildfarm.	2018-09-05 14:29:58 -04:00
Alvaro Herrera	2fbdf1b38b	Simplify partitioned table creation vs. relcache In the original code, we were storing the pg_inherits row for a partitioned table too early: enough that we had a hack for relcache to avoid falling flat on its face while reading such a partial entry. If we finish the pg_class creation first and then store the pg_inherits entry, we don't need that hack. Also recognize that pg_class.relpartbound is not marked NOT NULL and therefore it's entirely possible to read null values, so having only Assert() protection isn't enough. Change those to if/elog tests instead. This qualifies as a robustness fix, so backpatch to pg11. In passing, remove one access that wasn't actually needed, and reword one message to be like all the others that check for the same thing. Reviewed-by: Amit Langote Discussion: https://postgr.es/m/20180903213916.hh6wasnrdg6xv2ud@alvherre.pgsql	2018-09-05 14:36:13 -03:00
Tom Lane	17b7c302b5	Fully enforce uniqueness of constraint names. It's been true for a long time that we expect names of table and domain constraints to be unique among the constraints of that table or domain. However, the enforcement of that has been pretty haphazard, and it missed some corner cases such as creating a CHECK constraint and then an index constraint of the same name (as per recent report from André Hänsel). Also, due to the lack of an actual unique index enforcing this, duplicates could be created through race conditions. Moreover, the code that searches pg_constraint has been quite inconsistent about how to handle duplicate names if one did occur: some places checked and threw errors if there was more than one match, while others just processed the first match they came to. To fix, create a unique index on (conrelid, contypid, conname). Since either conrelid or contypid is zero, this will separately enforce uniqueness of constraint names among constraints of any one table and any one domain. (If we ever implement SQL assertions, and put them into this catalog, more thought might be needed. But it'd be at least as reasonable to put them into a new catalog; having overloaded this one catalog with two kinds of constraints was a mistake already IMO.) This index can replace the existing non-unique index on conrelid, though we need to keep the one on contypid for query performance reasons. Having done that, we can simplify the logic in various places that either coped with duplicates or neglected to, as well as potentially improve lookup performance when searching for a constraint by name. Also, as per our usual practice, install a preliminary check so that you get something more friendly than a unique-index violation report in the case complained of by André. And teach ChooseIndexName to avoid choosing autogenerated names that would draw such a failure. While it's not possible to make such a change in the back branches, it doesn't seem quite too late to put this into v11, so do so. Discussion: https://postgr.es/m/0c1001d4428f$0942b430$1bc81c90$@webkr.de	2018-09-04 13:45:35 -04:00
Tom Lane	44cac93464	Avoid using potentially-under-aligned page buffers. There's a project policy against using plain "char buf[BLCKSZ]" local or static variables as page buffers; preferred style is to palloc or malloc each buffer to ensure it is MAXALIGN'd. However, that policy's been ignored in an increasing number of places. We've apparently got away with it so far, probably because (a) relatively few people use platforms on which misalignment causes core dumps and/or (b) the variables chance to be sufficiently aligned anyway. But this is not something to rely on. Moreover, even if we don't get a core dump, we might be paying a lot of cycles for misaligned accesses. To fix, invent new union types PGAlignedBlock and PGAlignedXLogBlock that the compiler must allocate with sufficient alignment, and use those in place of plain char arrays. I used these types even for variables where there's no risk of a misaligned access, since ensuring proper alignment should make kernel data transfers faster. I also changed some places where we had been palloc'ing short-lived buffers, for coding style uniformity and to save palloc/pfree overhead. Since this seems to be a live portability hazard (despite the lack of field reports), back-patch to all supported versions. Patch by me; thanks to Michael Paquier for review. Discussion: https://postgr.es/m/1535618100.1286.3.camel@credativ.de	2018-09-01 15:27:17 -04:00
Peter Eisentraut	1e5e4efd02	Error position support for partition specifications Add support for error position reporting for partition specifications. Reviewed-by: Fabien COELHO <coelho@cri.ensmp.fr>	2018-08-30 08:20:23 +02:00
Peter Eisentraut	a4a232b1e7	Error position support for defaults and check constraints Add support for error position reporting for the expressions contained in defaults and check constraint definitions. This currently works only for CREATE TABLE, not ALTER TABLE, because the latter is not set up to pass around the original query string. Reviewed-by: Fabien COELHO <coelho@cri.ensmp.fr>	2018-08-30 08:20:23 +02:00
Michael Paquier	a556549d7e	Improve VACUUM and ANALYZE by avoiding early lock queue A caller of VACUUM can perform early lookup obtention which can cause other sessions to block on the request done, causing potentially DOS attacks as even a non-privileged user can attempt a vacuum fill of a critical catalog table to block even all incoming connection attempts. Contrary to TRUNCATE, a client could attempt a system-wide VACUUM after building the list of relations to VACUUM, which can cause vacuum_rel() or analyze_rel() to try to lock the relation but the operation would just block. When the client specifies a list of relations and the relation needs to be skipped, ownership checks are done when building the list of relations to work on, preventing a later lock attempt. vacuum_rel() already had the sanity checks needed, except that those were applied too late. This commit refactors the code so as relation skips are checked beforehand, making it safer to avoid too early locks, for both manual VACUUM with and without a list of relations specified. An isolation test is added emulating the fact that early locks do not happen anymore, issuing a WARNING message earlier if the user calling VACUUM is not a relation owner. When a partitioned table is listed in a manual VACUUM or ANALYZE command, its full list of partitions is fetched, all partitions get added to the list to work on, and then each one of them is processed one by one, with ownership checks happening at the later phase of vacuum_rel() or analyze_rel(). Trying to do early ownership checks for each partition is proving to be tedious as this would result in deadlock risks with lock upgrades, and skipping all partitions if the listed partitioned table is not owned would result in a behavior change compared to how Postgres 10 has implemented vacuum for partitioned tables. The original problem reported related to early lock queue for critical relations is fixed anyway, so priority is given to avoiding a backward-incompatible behavior. Reported-by: Lloyd Albin, Jeremy Schneider Author: Michael Paquier Reviewed by: Nathan Bossart, Kyotaro Horiguchi Discussion: https://postgr.es/m/152512087100.19803.12733865831237526317@wrigleys.postgresql.org Discussion: https://postgr.es/m/20180812222142.GA6097@paquier.xyz	2018-08-27 09:11:12 +09:00
Peter Eisentraut	d12782898e	Change PROCEDURE to FUNCTION in CREATE OPERATOR syntax Since procedures are now a different thing from functions, change the CREATE OPERATOR syntax to use FUNCTION in the clause that specifies the function. PROCEDURE is still accepted for compatibility. Reported-by: Peter Geoghegan <pg@bowt.ie> Reviewed-by: Jonathan S. Katz <jonathan.katz@excoventures.com>	2018-08-22 14:44:49 +02:00
Peter Eisentraut	b19495772e	doc: Update uses of the word "procedure" Historically, the term procedure was used as a synonym for function in Postgres/PostgreSQL. Now we have procedures as separate objects from functions, so we need to clean up the documentation to not mix those terms. In particular, mentions of "trigger procedures" are changed to "trigger functions", and access method "support procedures" are changed to "support functions". (The latter already used FUNCTION in the SQL syntax anyway.) Also, the terminology in the SPI chapter has been cleaned up. A few tests, examples, and code comments are also adjusted to be consistent with documentation changes, but not everything. Reported-by: Peter Geoghegan <pg@bowt.ie> Reviewed-by: Jonathan S. Katz <jonathan.katz@excoventures.com>	2018-08-22 14:44:49 +02:00
Michael Paquier	72be8c29a1	Fix set of NLS translation issues While monitoring the code, a couple of issues related to string translation has showed up: - Some routines for auto-updatable views return an error string, which sometimes missed the shot. A comment regarding string translation is added for each routine to help with future features. - GSSAPI authentication missed two translations. - vacuumdb handles non-translated strings. - GetConfigOptionByNum should translate strings. This part is not back-patched as after a minor upgrade this could be surprising for users. Reported-by: Kyotaro Horiguchi Author: Kyotaro Horiguchi Reviewed-by: Michael Paquier, Tom Lane Discussion: https://postgr.es/m/20180810.152131.31921918.horiguchi.kyotaro@lab.ntt.co.jp Backpatch-through: 9.3	2018-08-21 15:17:13 +09:00
Peter Eisentraut	e4597ee65d	InsertPgAttributeTuple() to set attcacheoff InsertPgAttributeTuple() is the interface between in-memory tuple descriptors and on-disk pg_attribute, so it makes sense to give it the job of resetting attcacheoff. This avoids having all the callers having to do so. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>	2018-08-17 22:08:21 +02:00
Peter Eisentraut	3ebdd21b79	Remove obsolete comment The sequence name is no longer stored in the sequence relation, since `1753b1b027`.	2018-08-13 21:07:31 +02:00
Michael Paquier	f841ceb26d	Improve TRUNCATE by avoiding early lock queue A caller of TRUNCATE could previously queue for an access exclusive lock on a relation it may not have permission to truncate, potentially interfering with users authorized to work on it. This can be very intrusive depending on the lock attempted to be taken. For example, pg_authid could be blocked, preventing any authentication attempt to happen on a PostgreSQL instance. This commit fixes the case of TRUNCATE so as RangeVarGetRelidExtended is used with a callback doing the necessary ACL checks at an earlier stage, avoiding lock queuing issues, so as an immediate failure happens for unprivileged users instead of waiting on a lock that would not be taken. This is rather similar to the type of work done in `cbe24a6` for CLUSTER, and the code of TRUNCATE is this time refactored so as there is no user-facing changes. As the commit for CLUSTER, no back-patch is done. Reported-by: Lloyd Albin, Jeremy Schneider Author: Michael Paquier Reviewed by: Nathan Bossart, Kyotaro Horiguchi Discussion: https://postgr.es/m/152512087100.19803.12733865831237526317@wrigleys.postgresql.org Discussion: https://postgr.es/m/20180806165816.GA19883@paquier.xyz	2018-08-10 18:26:59 +02:00
Michael Paquier	661dd23950	Restrict access to reindex of shared catalogs for non-privileged users A database owner running a database-level REINDEX has the possibility to also do the operation on shared system catalogs without being an owner of them, which allows him to block resources it should not have access to. The same goes for a schema owner. For example, PostgreSQL would go unresponsive and even block authentication if a lock is waited for pg_authid. This commit makes sure that a user running a REINDEX SYSTEM, DATABASE or SCHEMA only works on the following relations: - The user is a superuser - The user is the table owner - The user is the database/schema owner, only if the relation worked on is not shared. Robert has worded most the documentation changes, and I have coded the core part. Reported-by: Lloyd Albin, Jeremy Schneider Author: Michael Paquier, Robert Haas Reviewed by: Nathan Bossart, Kyotaro Horiguchi Discussion: https://postgr.es/m/152512087100.19803.12733865831237526317@wrigleys.postgresql.org Discussion: https://postgr.es/m/20180805211059.GA2185@paquier.xyz Backpatch-through: 11- as the current behavior has been around for a very long time and could be disruptive for already released branches.	2018-08-09 09:40:15 +02:00
Tom Lane	9b7c56d6cb	Don't record FDW user mappings as members of extensions. CreateUserMapping has a recordDependencyOnCurrentExtension call that's been there since extensions were introduced (very possibly my fault). However, there's no support anywhere else for user mappings as members of extensions, nor are they listed as a possible member object type in the documentation. Nor does it really seem like a good idea for user mappings to belong to extensions when roles don't. Hence, remove the bogus call. (As we saw in bug #15310, the lack of any pg_dump support for this case ensures that any such membership record would silently disappear during pg_upgrade. So there's probably no need for us to do anything else about cleaning up after this mistake.) Discussion: https://postgr.es/m/27952.1533667213@sss.pgh.pa.us	2018-08-07 16:32:50 -04:00
Peter Eisentraut	0d5f05cde0	Allow multi-inserts during COPY into a partitioned table CopyFrom allows multi-inserts to be used for non-partitioned tables, but this was disabled for partitioned tables. The reason for this appeared to be that the tuple may not belong to the same partition as the previous tuple did. Not allowing multi-inserts here greatly slowed down imports into partitioned tables. These could take twice as long as a copy to an equivalent non-partitioned table. It seems wise to do something about this, so this change allows the multi-inserts by flushing the so-far inserted tuples to the partition when the next tuple does not belong to the same partition, or when the buffer fills. This improves performance when the next tuple in the stream commonly belongs to the same partition as the previous tuple. In cases where the target partition changes on every tuple, using multi-inserts slightly slows the performance. To get around this we track the average size of the batches that have been inserted and adaptively enable or disable multi-inserts based on the size of the batch. Some testing was done and the regression only seems to exist when the average size of the insert batch is close to 1, so let's just enable multi-inserts when the average size is at least 1.3. More performance testing might reveal a better number for, this, but since the slowdown was only 1-2% it does not seem critical enough to spend too much time calculating it. In any case it may depend on other factors rather than just the size of the batch. Allowing multi-inserts for partitions required a bit of work around the per-tuple memory contexts as we must flush the tuples when the next tuple does not belong the same partition. In which case there is no good time to reset the per-tuple context, as we've already built the new tuple by this time. In order to work around this we maintain two per-tuple contexts and just switch between them every time the partition changes and reset the old one. This does mean that the first of each batch of tuples is not allocated in the same memory context as the others, but that does not matter since we only reset the context once the previous batch has been inserted. Author: David Rowley <david.rowley@2ndquadrant.com> Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>	2018-08-01 10:23:09 +02:00
Michael Paquier	9f7ba88aa4	Fix two oversights from `9ebe0572` which refactored cluster_rel The recheck option became a no-op as ClusterOption failed to set proper values for each element. There was a second code path where local options got overwritten. Both issues have been spotted by Coverity.	2018-07-29 22:00:42 +09:00
Michael Paquier	9ebe0572ce	Refactor cluster_rel() to handle more options This extends cluster_rel() in such a way that more options can be added in the future, which will reduce the amount of chunk code for an upcoming SKIP_LOCKED aimed for VACUUM. As VACUUM FULL is a different flavor of CLUSTER, we want to make that extensible to ease integration. This only reworks the API and its callers, without providing anything user-facing. Two options are present now: verbose mode and relation recheck when doing the cluster command work across multiple transactions. This could be used as well as a base to extend the grammar of CLUSTER later on. Author: Michael Paquier Reviewed-by: Nathan Bossart Discussion: https://postgr.es/m/20180723031058.GE2854@paquier.xyz	2018-07-24 11:37:32 +09:00
Tom Lane	2409716755	Remove undocumented restriction against duplicate partition key columns. transformPartitionSpec rejected duplicate simple partition columns (e.g., "PARTITION BY RANGE (x,x)") but paid no attention to expression columns, resulting in inconsistent behavior. Worse, cases like "PARTITION BY RANGE (x,(x))") were accepted but would then result in dump/reload failures, since the expression (x) would get simplified to a plain column later. There seems no better reason for this restriction than there was for the one against duplicate included index columns (cf commit `701fd0bbc`), so let's just remove it. Back-patch to v10 where this code was added. Report and patch by Yugo Nagata. Discussion: https://postgr.es/m/20180712165939.36b12aff.nagata@sraoss.co.jp	2018-07-19 15:41:46 -04:00
Tom Lane	701fd0bbc9	Drop the rule against included index columns duplicating key columns. The initial version of the included-index-column feature stated that included columns couldn't be the same as any key column of the index. While it'd be pretty silly to do that, since the included column would be entirely redundant, we've never prohibited redundant index columns before so it's not very consistent to do so here. Moreover, the prohibition was itself badly implemented, so that it failed to reject columns that were effectively identical but not spelled quite alike, as reported by Aditya Toshniwal. (Moreover, it's not hard to imagine that for some non-btree index types, such cases would be non-silly anyhow: the index might use a lossy representation for key columns but be able to support retrieval of the original form of included columns.) Hence, let's just drop the prohibition. In passing, do some copy-editing on the documentation for the included-column feature. Yugo Nagata; documentation and test corrections by me Discussion: https://postgr.es/m/CAM9w-_mhBCys4fQNfaiQKTRrVWtoFrZ-wXmDuE9Nj5y-Y7aDKQ@mail.gmail.com	2018-07-18 14:43:03 -04:00
Heikki Linnakangas	6b387179ba	Fix misc typos, mostly in comments. A collection of typos I happened to spot while reading code, as well as grepping for common mistakes. Backpatch to all supported versions, as applicable, to avoid conflicts when backporting other commits in the future.	2018-07-18 16:17:32 +03:00
Alvaro Herrera	cb9db2ab06	Fix ALTER TABLE...SET STATS error message for included columns The existing error message was complaining that the column is not an expression, which is not correct. Introduce a suitable wording variation and a test. Co-authored-by: Yugo Nagata <nagata@sraoss.co.jp> Discussion: https://postgr.es/m/20180628182803.e4632d5a.nagata@sraoss.co.jp Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>	2018-07-16 20:00:39 -04:00
Peter Eisentraut	3884072329	Prohibit transaction commands in security definer procedures Starting and aborting transactions in security definer procedures doesn't work. StartTransaction() insists that the security context stack is empty, so this would currently cause a crash, and AbortTransaction() resets it. This could be made to work by reorganizing the code, but right now we just prohibit it. Reported-by: amul sul <sulamul@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/CAAJ_b96Gupt_LFL7uNyy3c50-wbhA68NUjiK5%3DrF6_w%3Dpq_T%3DQ%40mail.gmail.com	2018-07-13 10:41:32 +02:00
Alvaro Herrera	cd073d8f70	Fix FK checks of TRUNCATE involving partitioned tables When truncating a table that is referenced by foreign keys in partitioned tables, the check to ensure the referencing table are also truncated spuriously failed. This is because it was relying on relhastriggers as a proxy for the table having FKs, and that's wrong for partitioned tables. Fix it to consider such tables separately. There may be a better way ... but this code is pretty inefficient already. Author: Álvaro Herrera <alvherre@alvh.no-ip.org> Reviewed-by: Michael Paquiër <michael@paquier.xyz> Discussion: https://postgr.es/m/20180711000624.zmeizicibxeehhsg@alvherre.pgsql	2018-07-12 12:09:08 -04:00
Amit Kapila	40ca70ebcc	Allow using the updated tuple while moving it to a different partition. An update that causes the tuple to be moved to a different partition was missing out on re-constructing the to-be-updated tuple, based on the latest tuple in the update chain. Instead, it's simply deleting the latest tuple and inserting a new tuple in the new partition based on the old tuple. Commit `2f17844104` didn't consider this case, so some of the updates were getting lost. In passing, change the argument order for output parameter in ExecDelete and add some commentary about it. Reported-by: Pavan Deolasee Author: Amit Khandekar, with minor changes by me Reviewed-by: Dilip Kumar, Amit Kapila and Alvaro Herrera Backpatch-through: 11 Discussion: https://postgr.es/m/CAJ3gD9fRbEzDqdeDq1jxqZUb47kJn+tQ7=Bcgjc8quqKsDViKQ@mail.gmail.com	2018-07-12 12:51:39 +05:30
Michael Paquier	edc6b41bd4	Rename VACOPT_NOWAIT to VACOPT_SKIP_LOCKED When it comes to SELECT ... FOR or LOCK, NOWAIT means to not wait for something to happen, and issue an error. SKIP LOCKED means to not wait for something to happen but to move on without issuing an error. The internal option of autovacuum and autoanalyze mentioned above, used only when wraparound is not involved was named NOWAIT, but behaves like SKIP LOCKED which is confusing. Author: Nathan Bossart Discussion: https://postgr.es/m/20180307050345.GA3095@paquier.xyz	2018-07-12 14:28:28 +09:00
Michael Paquier	6551f3daa2	Add assertion in expand_vacuum_rel() for non-autovacuum path The code path where the assertion is added helps to check that autovacuum always includes a relation OID when doing a vacuum on it. Extracted from a larger patch set to add support for SKIP LOCKED with manual VACUUM commands. Author: Nathan Bossart Discussion: https://postgr.es/m/9EF7EBE4-720D-4CF1-9D0E-4403D7E92990@amazon.com	2018-07-12 13:50:17 +09:00
Alexander Korotkov	edf59c40dd	Fix more wrong paths in header comments It appears that there are more files, whose header comment paths are wrong. So, fix those paths. No backpatching per proposal of Tom Lane. Discussion: https://postgr.es/m/CAPpHfdsJyYbOj59MOQL%2B4XxdcomLSLfLqBtAvwR%2BpsCqj3ELdQ%40mail.gmail.com	2018-07-11 17:57:04 +03:00
Peter Eisentraut	ec67b89816	Add UtilityReturnsTuples() support for CALL This ensures that prepared statements for CALL can return tuples.	2018-07-09 13:58:08 +02:00
Peter Eisentraut	e34ec13620	Allow CALL with polymorphic type arguments In order to be able to resolve polymorphic types, we need to set fn_expr before invoking the procedure.	2018-07-06 22:40:16 +02:00
Andrew Dunstan	1e9c858090	pgindent run prior to branching	2018-06-30 12:25:49 -04:00
Alvaro Herrera	41372071df	Fix crash when ALTER TABLE recreates indexes on partitions The skip_build flag was not being passed correctly when recursing to indexes on partitions, leading to attempts to rebuild indexes when they were not yet ready to be rebuilt. Reported-by: Rajkumar Raghuwanshi Discussion: https://postgr.es/m/CAKcux6mxNCGsgATwf5CGMF8g4WSupCXicCVMeKUTuWbyxHOMsQ@mail.gmail.com	2018-06-29 11:49:30 -04:00
Alvaro Herrera	475be5e790	When index recurses to a partition, map columns numbers Two out of three code paths were mapping column numbers correctly if a partition had different column numbers than parent table, but the most commonly used one (recursing in CREATE INDEX to a new index on a partition) failed to map attribute numbers in expressions. Oddly enough, attnums in WHERE clauses are already handled correctly everywhere. Reported-by: Amit Langote Author: Amit Langote Discussion: https://postgr.es/m/dce1fda4-e0f0-94c9-6abb-f5956a98c057@lab.ntt.co.jp Reviewed-by: Álvaro Herrera	2018-06-22 16:45:48 -04:00
Michael Paquier	1c7c317cd9	Clarify use of temporary tables within partition trees Since their introduction, partition trees have been a bit lossy regarding temporary relations. Inheritance trees respect the following patterns: 1) a child relation can be temporary if the parent is permanent. 2) a child relation can be temporary if the parent is temporary. 3) a child relation cannot be permanent if the parent is temporary. 4) The use of temporary relations also imply that when both parent and child need to be from the same sessions. Partitions share many similar patterns with inheritance, however the handling of the partition bounds make the situation a bit tricky for case 1) as the partition code bases a lot of its lookup code upon PartitionDesc which does not really look after relpersistence. This causes for example a temporary partition created by session A to be visible by another session B, preventing this session B to create an extra partition which overlaps with the temporary one created by A with a non-intuitive error message. There could be use-cases where mixing permanent partitioned tables with temporary partitions make sense, but that would be a new feature. Partitions respect 2), 3) and 4) already. It is a bit depressing to see those error checks happening in MergeAttributes() whose purpose is different, but that's left as future refactoring work. Back-patch down to 10, which is where partitioning has been introduced, except that default partitions do not apply there. Documentation also includes limitations related to the use of temporary tables with partition trees. Reported-by: David Rowley Author: Amit Langote, Michael Paquier Reviewed-by: Ashutosh Bapat, Amit Langote, Michael Paquier Discussion: https://postgr.es/m/CAKJS1f94Ojk0og9GMkRHGt8wHTW=ijq5KzJKuoBoqWLwSVwGmw@mail.gmail.com	2018-06-20 10:42:25 +09:00
Tom Lane	19832753f1	Fix some ill-chosen names for globally-visible partition support functions. "compute_hash_value" is particularly gratuitously generic, but IMO all of these ought to have names clearly related to partitioning.	2018-06-13 13:18:02 -04:00
Tom Lane	bdc643e5e4	Fix access to just-closed relcache entry. It might be impossible for this to cause a problem in non-debug builds, since there'd be no opportunity for the relcache entry to get recycled before the fetch. It blows up nicely with -DRELCACHE_FORCE_RELEASE plus valgrind, though. Evidently introduced by careless refactoring in commit `f0e44751d`. Back-patch accordingly. Discussion: https://postgr.es/m/27543.1528758304@sss.pgh.pa.us	2018-06-11 19:18:04 -04:00
Alvaro Herrera	5b0c7e2f75	Don't needlessly check the partition contraint twice Starting with commit `f0e44751d7`, ExecConstraints was in charge of running the partition constraint; commit `19c47e7c82` modified that so that caller could request to skip that checking depending on some conditions, but that commit and `15ce775faa` together introduced a small bug there which caused ExecInsert to request skipping the constraint check but have this not be honored -- in effect doing the check twice. This could have been fixed in a very small patch, but on further analysis of the involved function and its callsites, it turns out to be simpler to give the responsibility of checking the partition constraint fully to the caller, and return ExecConstraints to its original (pre-partitioning) shape where it only checked tuple descriptor-related constraints. Each caller must do partition constraint checking on its own schedule, which is more convenient after commit `2f17844104` anyway. Reported-by: David Rowley Author: David Rowley, Álvaro Herrera Reviewed-by: Amit Langote, Amit Khandekar, Simon Riggs Discussion: https://postgr.es/m/CAKJS1f8w8+awsxgea8wt7_UX8qzOQ=Tm1LD+U1fHqBAkXxkW2w@mail.gmail.com	2018-06-11 17:12:16 -04:00
Tom Lane	ecac23511e	Widen COPY FROM's current-line-number counter from 32 to 64 bits. Because the code for the HEADER option skips a line when this counter is zero, a very long COPY FROM WITH HEADER operation would drop a line every 2^32 lines. A lesser but still unfortunate problem is that errors would show a wrong input line number for errors occurring beyond the 2^31'st input line. While such large input streams seemed impractical when this code was first written, they're not any more. Widening the counter (and some associated variables) to uint64 should be enough to prevent problems for the foreseeable future. David Rowley Discussion: https://postgr.es/m/CAKJS1f88yh-6wwEfO6QLEEvH3BEugOq2QX1TOja0vCauoynmOQ@mail.gmail.com	2018-05-22 13:32:52 -04:00
Tom Lane	f755a152d4	Improve spelling of new FINALFUNC_MODIFY aggregate attribute. I'd used SHARABLE as a value originally, but Peter Eisentraut points out that dictionaries agree that SHAREABLE is the preferred spelling. Run around and change that before it's too late. Discussion: https://postgr.es/m/d2e1afd4-659c-50d6-1b20-7cfd3675e909@2ndquadrant.com	2018-05-21 11:41:42 -04:00
Tom Lane	41c912cad1	Clean up warnings from -Wimplicit-fallthrough. Recent gcc can warn about switch-case fall throughs that are not explicitly labeled as intentional. This seems like a good thing, so clean up the warnings exposed thereby by labeling all such cases with comments that gcc will recognize. In files that already had one or more suitable comments, I generally matched the existing style of those. Otherwise I went with /* FALLTHROUGH /, which is one of the spellings approved at the more-restrictive-than-default level -Wimplicit-fallthrough=4. (At the default level you can also spell it / FALL ?THRU */, and it's not picky about case. What you can't do is include additional text in the same comment, so some existing comments containing versions of this aren't good enough.) Testing with gcc 8.0.1 (Fedora 28's current version), I found that I also had to put explicit "break"s after elog(ERROR) or ereport(ERROR); apparently, for this purpose gcc doesn't recognize that those don't return. That seems like possibly a gcc bug, but it's fine because in most places we did that anyway; so this amounts to a visit from the style police. Discussion: https://postgr.es/m/15083.1525207729@sss.pgh.pa.us	2018-05-01 19:35:08 -04:00
Tom Lane	2e83e6bd74	Adjust hints and docs to suggest CREATE EXTENSION not CREATE LANGUAGE. The core PLs have been extension-ified for seven years now, and we can reasonably hope that all out-of-core PLs have been too. So adjust a few places that were still recommending CREATE LANGUAGE as the user-level way to install a PL. Discussion: https://postgr.es/m/CA+TgmoaJTUDMSuSCg4k08Dv8vhbrJq9nP3ZfPbmysVz_616qxw@mail.gmail.com	2018-04-27 13:42:03 -04:00
Tom Lane	bdf46af748	Post-feature-freeze pgindent run. Discussion: https://postgr.es/m/15719.1523984266@sss.pgh.pa.us	2018-04-26 14:47:16 -04:00
Alvaro Herrera	bc972072a3	Add missing pstrdup Lifetime of the input string is not right, so create a separate copy. Author: Amit Langote Discussion: https://postgr.es/m/a2773420-50d1-0a42-3396-fe42b0921134@lab.ntt.co.jp	2018-04-23 12:11:41 -03:00
Stephen Frost	a0fefbcb71	Fix a couple minor typos In commit `f0e4475`, GetIndexOpClass was renamed to ResolveOpClass, but the comment in typecmds.c didn't get the memo. In objectaddress.c, missing 'of' in a comment. Both noticed by Vik Fearing, patch is mine though.	2018-04-20 19:04:54 -04:00
Tom Lane	b1b71f1658	Fix race conditions when an event trigger is added concurrently with DDL. EventTriggerTableRewrite crashed if there were table_rewrite triggers present, but there had not been when the calling command started. EventTriggerDDLCommandEnd called ddl_command_end triggers if present, even if there had been no such triggers when the calling command started, which would lead to a failure in pg_event_trigger_ddl_commands. In both cases, fix by doing nothing; it's better to wait till the next command when things will be properly initialized. In passing, remove an elog(DEBUG1) call that might have seemed interesting four years ago but surely isn't today. We found this because of intermittent failures in the buildfarm. Thanks to Alvaro Herrera and Andrew Gierth for analysis. Back-patch to 9.5; some of this code exists before that, but the specific hazards we need to guard against don't. Discussion: https://postgr.es/m/5767.1523995174@sss.pgh.pa.us	2018-04-20 17:15:31 -04:00
Tom Lane	1dec82068b	Better fix for deadlock hazard in CREATE INDEX CONCURRENTLY. Commit `54eff5311` did not account for the possibility that we'd have a transaction snapshot due to default_transaction_isolation being set high enough to require one. The transaction snapshot is enough to hold back our advertised xmin and thus risk deadlock anyway. The only way to get rid of that snap is to start a new transaction, so let's do that instead. Also throw in an assert checking that we really have gotten to a state where no xmin is being advertised. Back-patch to 9.4, like the previous commit. Discussion: https://postgr.es/m/CAMkU=1ztk3TpQdcUNbxq93pc80FrXUjpDWLGMeVBDx71GHNwZQ@mail.gmail.com	2018-04-18 12:07:37 -04:00
Tatsuo Ishii	03030512d1	Add more infinite recursion detection while locking a view. Also add regression test cases for detecting infinite recursion in locking view tests. Some document enhancements. Patch by Yugo Nagata.	2018-04-17 16:59:17 +09:00
Alvaro Herrera	da6f3e45dd	Reorganize partitioning code There's been a massive addition of partitioning code in PostgreSQL 11, with little oversight on its placement, resulting in a catalog/partition.c with poorly defined boundaries and responsibilities. This commit tries to set a couple of distinct modules to separate things a little bit. There are no code changes here, only code movement. There are three new files: src/backend/utils/cache/partcache.c src/include/partitioning/partdefs.h src/include/utils/partcache.h The previous arrangement of #including catalog/partition.h almost everywhere is no more. Authors: Amit Langote and Álvaro Herrera Discussion: https://postgr.es/m/98e8d509-790a-128c-be7f-e48a5b2d8d97@lab.ntt.co.jp https://postgr.es/m/11aa0c50-316b-18bb-722d-c23814f39059@lab.ntt.co.jp https://postgr.es/m/143ed9a4-6038-76d4-9a55-502035815e68@lab.ntt.co.jp https://postgr.es/m/20180413193503.nynq7bnmgh6vs5vm@alvherre.pgsql	2018-04-14 21:12:14 -03:00
Peter Eisentraut	a8677e3ff6	Support named and default arguments in CALL We need to call expand_function_arguments() to expand named and default arguments. In PL/pgSQL, we also need to deal with named and default INOUT arguments when receiving the output values into variables. Author: Pavel Stehule <pavel.stehule@gmail.com>	2018-04-14 09:13:53 -04:00
Alvaro Herrera	b8ca984b2c	Revert lowering of lock level for ATTACH PARTITION I lowered the lock level for partitions being scanned from AccessExclusive to ShareLock in the course of `72cf7f310c`, but that was bogus, as pointed out by Robert Haas. Revert that bit. Doing this is possible, but requires more work. Discussion: https://postgr.es/m/CA+TgmobV7Nfmqv+TZXcdSsb9Bjc-OL-Anv6BNmCbfJVZLYPE4Q@mail.gmail.com	2018-04-12 16:53:27 -03:00
Teodor Sigaev	524054598f	Fix interference between covering indexes and partitioned tables The bug is caused due to the original IndexStmt that DefineIndex receives being overwritten when processing the INCLUDE columns. Use separate list of index params to propagate to child tables. Add tests covering this case. Amit Langote and Alexander Korotkov. Re-commit `5c6110c6a9` because it discovered a bug fixed in `c266ed31a8` Discussion: https://www.postgresql.org/message-id/CAJGNTeO%3DBguEyG8wxMpU_Vgvg3nGGzy71zUQ0RpzEn_mb0bSWA%40mail.gmail.com	2018-04-12 17:25:13 +03:00
Teodor Sigaev	c266ed31a8	Cleanup covering infrastructure - Explicitly forbids opclass, collation and indoptions (like DESC/ASC etc) for including columns. Throw an error if user points that. - Truncated storage arrays for such attributes to store only key atrributes, added assertion checks. - Do not check opfamily and collation for including columns in CompareIndexInfo() Discussion: https://www.postgresql.org/message-id/5ee72852-3c4e-ee35-e2ed-c1d053d45c08@sigaev.ru	2018-04-12 16:37:22 +03:00
Simon Riggs	08ea7a2291	Revert MERGE patch This reverts commits `d204ef6377`, `83454e3c2b` and a few more commits thereafter (complete list at the end) related to MERGE feature. While the feature was fully functional, with sufficient test coverage and necessary documentation, it was felt that some parts of the executor and parse-analyzer can use a different design and it wasn't possible to do that in the available time. So it was decided to revert the patch for PG11 and retry again in the future. Thanks again to all reviewers and bug reporters. List of commits reverted, in reverse chronological order: `f1464c5380` Improve parse representation for MERGE `ddb4158579` MERGE syntax diagram correction `530e69e59b` Allow cpluspluscheck to pass by renaming variable `01b88b4df5` MERGE minor errata `3af7b2b0d4` MERGE fix variable warning in non-assert builds `a5d86181ec` MERGE INSERT allows only one VALUES clause `4b2d44031f` MERGE post-commit review `4923550c20` Tab completion for MERGE `aa3faa3c7a` WITH support in MERGE `83454e3c2b` New files for MERGE `d204ef6377` MERGE SQL Command following SQL:2016 Author: Pavan Deolasee Reviewed-by: Michael Paquier	2018-04-12 11:22:56 +01:00
Teodor Sigaev	c9c875a28f	Rename IndexInfo.ii_KeyAttrNumbers array Rename ii_KeyAttrNumbers to ii_IndexAttrNumbers to prevent confusion with ii_NumIndexAttrs/ii_NumIndexKeyAttrs. ii_IndexAttrNumbers contains all attributes including "including" columns, not only key attribute. Discussion: https://www.postgresql.org/message-id/13123421-1d52-d0e4-c95c-6d69011e0595%40sigaev.ru	2018-04-12 13:02:45 +03:00
Alvaro Herrera	9e9befac4a	Set relispartition correctly for index partitions Oversight in commit `8b08f7d482`: pg_class.relispartition was not being set for index partitions, which is a bit odd, and was also causing the code to unnecessarily call has_superclass() when simply checking the flag was enough. Author: Álvaro Herrera Reported-by: Amit Langote Discussion: https://postgr.es/m/12085bc4-0bc6-0f3a-4c43-57fe0681772b@lab.ntt.co.jp	2018-04-11 21:27:12 -03:00
Alvaro Herrera	72cf7f310c	Fix ALTER TABLE .. ATTACH PARTITION ... DEFAULT If the table being attached contained values that contradict the default partition's partition constraint, it would fail to complain, because CommandCounterIncrement changes in `4dba331cb3` coupled with some bogus coding in the existing ValidatePartitionConstraints prevented the partition constraint from being validated after all -- or rather, it caused to constraint to become an empty one, always succeeding. Fix by not re-reading the OID of the default partition in ATExecAttachPartition. To forestall similar problems, revise the existing code: * rename routine from ValidatePartitionConstraints() to QueuePartitionConstraintValidation, to better represent what it actually does. * add an Assert() to make sure that when queueing a constraint for a partition we're not overwriting a constraint previously queued. * add an Assert() that we don't try to invoke the special-purpose validation of the default partition when attaching the default partition itself. While at it, change some loops to obtain partition OIDs from partdesc->oids rather than find_all_inheritors; reduce the lock level of partitions being scanned from AccessExclusiveLock to ShareLock; rewrite QueuePartitionConstraintValidation in a recursive fashion rather than repetitive. Author: Álvaro Herrera. Tests written by Amit Langote Reported-by: Rushabh Lathia Diagnosed-by: Kyotaro HORIGUCHI, who also provided the initial fix. Reviewed-by: Kyotaro HORIGUCHI, Amit Langote, Jeevan Ladhe Discussion: https://postgr.es/m/CAGPqQf0W+v-Ci_qNV_5R3A=Z9LsK4+jO7LzgddRncpp_rrnJqQ@mail.gmail.com	2018-04-11 15:32:46 -03:00
Teodor Sigaev	92899992e1	Temporary revert `5c6110c6a9` It discovers one more bug in CompareIndexInfo(), should be fixed first.	2018-04-11 19:32:19 +03:00
Teodor Sigaev	5c6110c6a9	Fix interference between cavering indexes and partitioned tables The bug is caused due to the original IndexStmt that DefineIndex receives being overwritten when processing the INCLUDE columns. Use separate list of index params to propagate to child tables. Add tests covering this case. Amit Langote and Alexander Korotkov.	2018-04-11 16:44:26 +03:00
Alvaro Herrera	15a8f8caad	Fix IndexOnlyScan counter for heap fetches in parallel mode The HeapFetches counter was using a simple value in IndexOnlyScanState, which fails to propagate values from parallel workers; so the counts are wrong when IndexOnlyScan runs in parallel. Move it to Instrumentation, like all the other counters. While at it, change INSERT ON CONFLICT conflicting tuple counter to use the new ntuples2 instead of nfiltered2, which is a blatant misuse. Discussion: https://postgr.es/m/20180409215851.idwc75ct2bzi6tea@alvherre.pgsql	2018-04-10 15:56:15 -03:00
Tom Lane	af1a949109	Further cleanup of client dependencies on src/include/catalog headers. In commit `9c0a0de4c`, I'd failed to notice that catalog/catalog.h should also be considered a frontend-unsafe header, because it includes (and needs) the full form of pg_class.h, not to mention relcache.h. However, various frontend code was depending on it to get TABLESPACE_VERSION_DIRECTORY, so refactoring of some sort is called for. The cleanest answer seems to be to move TABLESPACE_VERSION_DIRECTORY, as well as the OIDCHARS symbol, to common/relpath.h. Do that, and mop up inclusions as necessary. (I found that quite a few current users of catalog/catalog.h don't seem to need it at all anymore, apparently as a result of the refactorings that created common/relpath.[hc]. And initdb.c needed it only as a route to pg_class_d.h.) Discussion: https://postgr.es/m/6629.1523294509@sss.pgh.pa.us	2018-04-09 14:39:58 -04:00
Tom Lane	b3b7f7898f	Fix additional breakage in covering-index patch. CheckIndexCompatible() misused ComputeIndexAttrs() by not bothering to fill ii_NumIndexAttrs and ii_NumIndexKeyAttrs in the passed IndexInfo. Omission of ii_NumIndexAttrs was previously unimportant, but now this matters because ComputeIndexAttrs depends on ii_NumIndexKeyAttrs to decide how many columns it needs to report on. (BTW, the fact that this oversight wasn't detected earlier implies that we have no regression test verifying whether CheckIndexCompatible ever succeeds. Bad dog. Not the job of this patch to fix it, though.) Also, change the API of ComputeIndexAttrs so that it fills the opclass output array for all column positions, as it does for the options output array; positions for non-key index columns are filled with zeroes. This isn't directly fixing any bug, but it seems like a good idea. Per valgrind failure reports from buildfarm. Alexander Korotkov, tweaked a bit by me Discussion: https://postgr.es/m/CAPpHfduWrysrT-qAhn+3Ea5+Mg6Vhc-oA6o2Z-hRCPRdvf3tiw@mail.gmail.com	2018-04-08 17:23:39 -04:00
Tom Lane	cefa387153	Merge catalog/pg_foo_fn.h headers back into pg_foo.h headers. Traditionally, include/catalog/pg_foo.h contains extern declarations for functions in backend/catalog/pg_foo.c, in addition to its function as the authoritative definition of the pg_foo catalog's rowtype. In some cases, we'd been forced to split out those extern declarations into separate pg_foo_fn.h headers so that the catalog definitions could be #include'd by frontend code. That problem is gone as of commit `9c0a0de4c`, so let's undo the splits to make things less confusing. Discussion: https://postgr.es/m/23690.1523031777@sss.pgh.pa.us	2018-04-08 14:35:29 -04:00
Stephen Frost	da9b580d89	Refactor dir/file permissions Consolidate directory and file create permissions for tools which work with the PG data directory by adding a new module (common/file_perm.c) that contains variables (pg_file_create_mode, pg_dir_create_mode) and constants to initialize them (0600 for files and 0700 for directories). Convert mkdir() calls in the backend to MakePGDirectory() if the original call used default permissions (always the case for regular PG directories). Add tests to make sure permissions in PGDATA are set correctly by the tools which modify the PG data directory. Authors: David Steele <david@pgmasters.net>, Adam Brightwell <adam.brightwell@crunchydata.com> Reviewed-By: Michael Paquier, with discussion amongst many others. Discussion: https://postgr.es/m/ad346fe6-b23e-59f1-ecb7-0e08390ad629%40pgmasters.net	2018-04-07 17:45:39 -04:00
Alvaro Herrera	499be013de	Support partition pruning at execution time Existing partition pruning is only able to work at plan time, for query quals that appear in the parsed query. This is good but limiting, as there can be parameters that appear later that can be usefully used to further prune partitions. This commit adds support for pruning subnodes of Append which cannot possibly contain any matching tuples, during execution, by evaluating Params to determine the minimum set of subnodes that can possibly match. We support more than just simple Params in WHERE clauses. Support additionally includes: 1. Parameterized Nested Loop Joins: The parameter from the outer side of the join can be used to determine the minimum set of inner side partitions to scan. 2. Initplans: Once an initplan has been executed we can then determine which partitions match the value from the initplan. Partition pruning is performed in two ways. When Params external to the plan are found to match the partition key we attempt to prune away unneeded Append subplans during the initialization of the executor. This allows us to bypass the initialization of non-matching subplans meaning they won't appear in the EXPLAIN or EXPLAIN ANALYZE output. For parameters whose value is only known during the actual execution then the pruning of these subplans must wait. Subplans which are eliminated during this stage of pruning are still visible in the EXPLAIN output. In order to determine if pruning has actually taken place, the EXPLAIN ANALYZE must be viewed. If a certain Append subplan was never executed due to the elimination of the partition then the execution timing area will state "(never executed)". Whereas, if, for example in the case of parameterized nested loops, the number of loops stated in the EXPLAIN ANALYZE output for certain subplans may appear lower than others due to the subplan having been scanned fewer times. This is due to the list of matching subnodes having to be evaluated whenever a parameter which was found to match the partition key changes. This commit required some additional infrastructure that permits the building of a data structure which is able to perform the translation of the matching partition IDs, as returned by get_matching_partitions, into the list index of a subpaths list, as exist in node types such as Append, MergeAppend and ModifyTable. This allows us to translate a list of clauses into a Bitmapset of all the subpath indexes which must be included to satisfy the clause list. Author: David Rowley, based on an earlier effort by Beena Emerson Reviewers: Amit Langote, Robert Haas, Amul Sul, Rajkumar Raghuwanshi, Jesper Pedersen Discussion: https://postgr.es/m/CAOG9ApE16ac-_VVZVvv0gePSgkg_BwYEV1NBqZFqDR2bBE0X0A@mail.gmail.com	2018-04-07 17:54:39 -03:00
Andres Freund	f16241bef7	Raise error when affecting tuple moved into different partition. When an update moves a row between partitions (supported since `2f17844104`), our normal logic for following update chains in READ COMMITTED mode doesn't work anymore. Cross partition updates are modeled as an delete from the old and insert into the new partition. No ctid chain exists across partitions, and there's no convenient space to introduce that link. Not throwing an error in a partitioned context when one would have been thrown without partitioning is obviously problematic. This commit introduces infrastructure to detect when a tuple has been moved, not just plainly deleted. That allows to throw an error when encountering a deletion that's actually a move, while attempting to following a ctid chain. The row deleted as part of a cross partition update is marked by pointing it's t_ctid to an invalid block, instead of self as a normal update would. That was deemed to be the least invasive and most future proof way to represent the knowledge, given how few infomask bits are there to be recycled (there's also some locking issues with using infomask bits). External code following ctid chains should be updated to check for moved tuples. The most likely consequence of not doing so is a missed error. Author: Amul Sul, editorialized by me Reviewed-By: Amit Kapila, Pavan Deolasee, Andres Freund, Robert Haas Discussion: http://postgr.es/m/CAAJ_b95PkwojoYfz0bzXU8OokcTVGzN6vYGCNVUukeUDrnF3dw@mail.gmail.com	2018-04-07 13:24:27 -07:00
Teodor Sigaev	8224de4f42	Indexes with INCLUDE columns and their support in B-tree This patch introduces INCLUDE clause to index definition. This clause specifies a list of columns which will be included as a non-key part in the index. The INCLUDE columns exist solely to allow more queries to benefit from index-only scans. Also, such columns don't need to have appropriate operator classes. Expressions are not supported as INCLUDE columns since they cannot be used in index-only scans. Index access methods supporting INCLUDE are indicated by amcaninclude flag in IndexAmRoutine. For now, only B-tree indexes support INCLUDE clause. In B-tree indexes INCLUDE columns are truncated from pivot index tuples (tuples located in non-leaf pages and high keys). Therefore, B-tree indexes now might have variable number of attributes. This patch also provides generic facility to support that: pivot tuples contain number of their attributes in t_tid.ip_posid. Free 13th bit of t_info is used for indicating that. This facility will simplify further support of index suffix truncation. The changes of above are backward-compatible, pg_upgrade doesn't need special handling of B-tree indexes for that. Bump catalog version Author: Anastasia Lubennikova with contribition by Alexander Korotkov and me Reviewed by: Peter Geoghegan, Tomas Vondra, Antonin Houska, Jeff Janes, David Rowley, Alexander Korotkov Discussion: https://www.postgresql.org/message-id/flat/56168952.4010101@postgrespro.ru	2018-04-07 23:00:39 +03:00
Peter Eisentraut	039eb6e92f	Logical replication support for TRUNCATE Update the built-in logical replication system to make use of the previously added logical decoding for TRUNCATE support. Add the required truncate callback to pgoutput and a new logical replication protocol message. Publications get a new attribute to determine whether to replicate truncate actions. When updating a publication via pg_dump from an older version, this is not set, thus preserving the previous behavior. Author: Simon Riggs <simon@2ndquadrant.com> Author: Marco Nenciarini <marco.nenciarini@2ndquadrant.it> Author: Peter Eisentraut <peter.eisentraut@2ndquadrant.com> Reviewed-by: Petr Jelinek <petr.jelinek@2ndquadrant.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org>	2018-04-07 11:34:11 -04:00
Peter Eisentraut	5dfd1e5a66	Logical decoding of TRUNCATE Add a new WAL record type for TRUNCATE, which is only used when wal_level >= logical. (For physical replication, TRUNCATE is already replicated via SMGR records.) Add new callback for logical decoding output plugins to receive TRUNCATE actions. Author: Simon Riggs <simon@2ndquadrant.com> Author: Marco Nenciarini <marco.nenciarini@2ndquadrant.it> Author: Peter Eisentraut <peter.eisentraut@2ndquadrant.com> Reviewed-by: Petr Jelinek <petr.jelinek@2ndquadrant.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org>	2018-04-07 11:34:10 -04:00
Robert Haas	3d956d9562	Allow insert and update tuple routing and COPY for foreign tables. Also enable this for postgres_fdw. Etsuro Fujita, based on an earlier patch by Amit Langote. The larger patch series of which this is a part has been reviewed by Amit Langote, David Fetter, Maksim Milyutin, Álvaro Herrera, Stephen Frost, and me. Minor documentation changes to the final version by me. Discussion: http://postgr.es/m/29906a26-da12-8c86-4fb9-d8f88442f2b9@lab.ntt.co.jp	2018-04-06 19:22:03 -04:00
Stephen Frost	0fdc8495bf	Add default roles for file/program access This patch adds new default roles named 'pg_read_server_files', 'pg_write_server_files', 'pg_execute_server_program' which allow an administrator to GRANT to a non-superuser role the ability to access server-side files or run programs through PostgreSQL (as the user the database is running as). Having one of these roles allows a non-superuser to use server-side COPY to read, write, or with a program, and to use file_fdw (if installed by a superuser and GRANT'd USAGE on it) to read from files or run a program. The existing misc file functions are also changed to allow a user with the 'pg_read_server_files' default role to read any files on the filesystem, matching the privileges given to that role through COPY and file_fdw from above. Reviewed-By: Michael Paquier Discussion: https://postgr.es/m/20171231191939.GR2416%40tamriel.snowman.net	2018-04-06 14:47:10 -04:00
Peter Eisentraut	bbca77623f	Rename MemoryContextCopySetIdentifier() for clarity MemoryContextCopySetIdentifier -> MemoryContextCopyAndSetIdentifier Discussion: https://www.postgresql.org/message-id/6421.1522194949@sss.pgh.pa.us	2018-04-06 12:37:54 -04:00
Robert Haas	cfbecf8100	Enforce child constraints during COPY TO a partitioned table. The previous coding inadvertently checked the constraints for the partitioned table rather than the target partition, which could lead to data in a partition that fails to satisfy some constraint on that partition. This problem seems to date back to when table partitioning was introduced; prior to that, there was only one target table for a COPY, so the problem didn't occur, and the code just didn't get updated. Etsuro Fujita, reviewed by Amit Langote and Ashutosh Bapat Discussion: https://postgr.es/message-id/5ABA4074.1090500%40lab.ntt.co.jp	2018-04-06 11:42:28 -04:00
Peter Eisentraut	bcf79b5bb6	Split the SetSubscriptionRelState function into two We don't actually need the insert-or-update logic, so it's clearer to have separate functions for the inserting and updating. Author: Petr Jelinek <petr.jelinek@2ndquadrant.com> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>	2018-04-06 10:00:26 -04:00
Simon Riggs	4b2d44031f	MERGE post-commit review Review comments from Andres Freund * Consolidate code into AfterTriggerGetTransitionTable() * Rename nodeMerge.c to execMerge.c * Rename nodeMerge.h to execMerge.h * Move MERGE handling in ExecInitModifyTable() into a execMerge.c ExecInitMerge() * Move mt_merge_subcommands flags into execMerge.h * Rename opt_and_condition to opt_merge_when_and_condition * Wordsmith various comments Author: Pavan Deolasee Reviewer: Simon Riggs	2018-04-05 09:54:07 +01:00
Alvaro Herrera	3de241dba8	Foreign keys on partitioned tables Author: Álvaro Herrera Discussion: https://postgr.es/m/20171231194359.cvojcour423ulha4@alvherre.pgsql Reviewed-by: Peter Eisentraut	2018-04-04 14:02:49 -03:00
Alvaro Herrera	851f4b4e14	Don't clone internal triggers to partitions Trigger cloning to partitions was supposed to occur for user-visible triggers only, but during development the protection that prevented it from occurring to internal triggers was lost. Reinstate it, as well as add a test case to ensure internal triggers (in the tested case, triggers implementing a deferred unique constraint) are not cloned. Without the code fix, the partitions in the test end up with different numbers of triggers, which is clearly wrong ... Bug in `86f575948c`. Discussion: https://postgr.es/m/20180403214903.ozfagwjcpk337uw7@alvherre.pgsql	2018-04-03 19:08:25 -03:00
Simon Riggs	d204ef6377	MERGE SQL Command following SQL:2016 MERGE performs actions that modify rows in the target table using a source table or query. MERGE provides a single SQL statement that can conditionally INSERT/UPDATE/DELETE rows a task that would other require multiple PL statements. e.g. MERGE INTO target AS t USING source AS s ON t.tid = s.sid WHEN MATCHED AND t.balance > s.delta THEN UPDATE SET balance = t.balance - s.delta WHEN MATCHED THEN DELETE WHEN NOT MATCHED AND s.delta > 0 THEN INSERT VALUES (s.sid, s.delta) WHEN NOT MATCHED THEN DO NOTHING; MERGE works with regular and partitioned tables, including column and row security enforcement, as well as support for row, statement and transition triggers. MERGE is optimized for OLTP and is parameterizable, though also useful for large scale ETL/ELT. MERGE is not intended to be used in preference to existing single SQL commands for INSERT, UPDATE or DELETE since there is some overhead. MERGE can be used statically from PL/pgSQL. MERGE does not yet support inheritance, write rules, RETURNING clauses, updatable views or foreign tables. MERGE follows SQL Standard per the most recent SQL:2016. Includes full tests and documentation, including full isolation tests to demonstrate the concurrent behavior. This version written from scratch in 2017 by Simon Riggs, using docs and tests originally written in 2009. Later work from Pavan Deolasee has been both complex and deep, leaving the lead author credit now in his hands. Extensive discussion of concurrency from Peter Geoghegan, with thanks for the time and effort contributed. Various issues reported via sqlsmith by Andreas Seltenreich Authors: Pavan Deolasee, Simon Riggs Reviewer: Peter Geoghegan, Amit Langote, Tomas Vondra, Simon Riggs Discussion: https://postgr.es/m/CANP8+jKitBSrB7oTgT9CY2i1ObfOt36z0XMraQc+Xrz8QB0nXA@mail.gmail.com https://postgr.es/m/CAH2-WzkJdBuxj9PO=2QaO9-3h3xGbQPZ34kJH=HukRekwM-GZg@mail.gmail.com	2018-04-03 09:28:16 +01:00
Simon Riggs	7cf8a5c302	Revert "Modified files for MERGE" This reverts commit `354f13855e`.	2018-04-02 21:34:15 +01:00
Simon Riggs	354f13855e	Modified files for MERGE	2018-04-02 21:12:47 +01:00
Tatsuo Ishii	1b26bd4089	Fix bug with view locking code. LockViewRecurese() obtains view relation using heap_open() and passes it to get_view_query() to get view info. It immediately closes the relation then uses the returned view info by calling LockViewRecurse_walker(). Since get_view_query() returns a pointer within the relcache, the relcache should be kept until LockViewRecurse_walker() returns. Otherwise the relation could point to a garbage memory area. Fix is moving the heap_close() call after LockViewRecurse_walker(). Problem reported by Tom Lane (buildfarm is unhappy, especially prion since it enables -DRELCACHE_FORCE_RELEASE cpp flag), fix by me.	2018-03-31 09:26:43 +09:00
Andres Freund	d87510a524	Combine options for RangeVarGetRelidExtended() into a flags argument. A followup patch will add a SKIP_LOCKED option. To avoid introducing evermore arguments, breaking existing callers each time, introduce a flags argument. This'll no doubt break a few external users... Also change the MISSING_OK behaviour so a DEBUG1 debug message is emitted when a relation is not found. Author: Nathan Bossart Reviewed-By: Michael Paquier and Andres Freund Discussion: https://postgr.es/m/20180306005349.b65whmvj7z6hbe2y@alap3.anarazel.de	2018-03-30 17:05:16 -07:00
Tatsuo Ishii	34c20de4d0	Allow to lock views. Now all tables used in view definitions can be recursively locked by a LOCK command. Author: Yugo Nagata Reviewed by Robert Haas, Thomas Munro and me. Discussion: https://postgr.es/m/20171011183629.eb2817b3.nagata%40sraoss.co.jp	2018-03-30 09:18:02 +09:00
Bruce Momjian	20b4323bd1	C comments: "a" <--> "an" corrections Reported-by: Michael Paquier, Abhijit Menon-Sen Discussion: https://postgr.es/m/20180305045854.GB2266@paquier.xyz Author: Michael Paquier, Abhijit Menon-Sen, me	2018-03-29 15:18:53 -04:00
Tom Lane	851a26e266	While vacuuming a large table, update upper-level FSM data every so often. VACUUM updates leaf-level FSM entries immediately after cleaning the corresponding heap blocks. fsmpage.c updates the intra-page search trees on the leaf-level FSM pages when this happens, but it does not touch the upper-level FSM pages, so that the released space might not actually be findable by searchers. Previously, updating the upper-level pages happened only at the conclusion of the VACUUM run, in a single FreeSpaceMapVacuum() call. This is bad because the VACUUM might get canceled before ever reaching that point, so that from the point of view of searchers no space has been freed at all, leading to table bloat. We can improve matters by updating the upper pages immediately after each cycle of index-cleaning and heap-cleaning, processing just the FSM pages corresponding to the range of heap blocks we have now fully cleaned. This adds a small amount of extra work, since the FSM pages leading down to each range boundary will be touched twice, but it's pretty negligible compared to everything else going on in a large VACUUM. If there are no indexes, VACUUM doesn't work in cycles but just cleans each heap page on first visit. In that case we just arbitrarily update upper FSM pages after each 8GB of heap. That maintains the goal of not letting all this work slide until the very end, and it doesn't seem worth expending extra complexity on a case that so seldom occurs in practice. In either case, the FSM is fully up to date before any attempt is made to truncate the relation, so that the most likely scenario for VACUUM cancellation no longer results in out-of-date upper FSM pages. When we do successfully truncate, adjusting the FSM to reflect that is now fully handled within FreeSpaceMapTruncateRel. Claudio Freire, reviewed by Masahiko Sawada and Jing Wang, some additional tweaks by me Discussion: https://postgr.es/m/CAGTBQpYR0uJCNTt3M5GOzBRHo+-GccNO1nCaQ8yEJmZKSW5q1A@mail.gmail.com	2018-03-29 11:29:54 -04:00
Andres Freund	1f0c6a9e7d	Add EXPLAIN support for JIT. This just shows a few details about JITing, e.g. how many functions have been JITed, and how long that took. To avoid noise in regression tests with functions sometimes being JITed in --with-llvm builds, disable display when COSTS OFF is specified. Author: Andres Freund Discussion: https://postgr.es/m/20170901064131.tazjxwus3k2w3ybh@alap3.anarazel.de	2018-03-28 13:26:51 -07:00
Andrew Dunstan	16828d5c02	Fast ALTER TABLE ADD COLUMN with a non-NULL default Currently adding a column to a table with a non-NULL default results in a rewrite of the table. For large tables this can be both expensive and disruptive. This patch removes the need for the rewrite as long as the default value is not volatile. The default expression is evaluated at the time of the ALTER TABLE and the result stored in a new column (attmissingval) in pg_attribute, and a new column (atthasmissing) is set to true. Any existing row when fetched will be supplied with the attmissingval. New rows will have the supplied value or the default and so will never need the attmissingval. Any time the table is rewritten all the atthasmissing and attmissingval settings for the attributes are cleared, as they are no longer needed. The most visible code change from this is in heap_attisnull, which acquires a third TupleDesc argument, allowing it to detect a missing value if there is one. In many cases where it is known that there will not be any (e.g. catalog relations) NULL can be passed for this argument. Andrew Dunstan, heavily modified from an original patch from Serge Rielau. Reviewed by Tom Lane, Andres Freund, Tomas Vondra and David Rowley. Discussion: https://postgr.es/m/31e2e921-7002-4c27-59f5-51f08404c858@2ndQuadrant.com	2018-03-28 10:43:52 +10:30
Tom Lane	442accc3fe	Allow memory contexts to have both fixed and variable ident strings. Originally, we treated memory context names as potentially variable in all cases, and therefore always copied them into the context header. Commit `9fa6f00b1` rethought this a little bit and invented a distinction between fixed and variable names, skipping the copy step for the former. But we can make things both simpler and more useful by instead allowing there to be two parts to a context's identification, a fixed "name" and an optional, variable "ident". The name supplied in the context create call is now required to be a compile-time-constant string in all cases, as it is never copied but just pointed to. The "ident" string, if wanted, is supplied later. This is needed because typically we want the ident to be stored inside the context so that it's cleaned up automatically on context deletion; that means it has to be copied into the context before we can set the pointer. The cost of this approach is basically just an additional pointer field in struct MemoryContextData, which isn't much overhead, and is bought back entirely in the AllocSet case by not needing a headerSize field anymore, since we no longer have to cope with variable header length. In addition, we can simplify the internal interfaces for memory context creation still further, saving a few cycles there. And it's no longer true that a custom identifier disqualifies a context from participating in aset.c's freelist scheme, so possibly there's some win on that end. All the places that were using non-compile-time-constant context names are adjusted to put the variable info into the "ident" instead. This allows more effective identification of those contexts in many cases; for example, subsidary contexts of relcache entries are now identified by both type (e.g. "index info") and relname, where before you got only one or the other. Contexts associated with PL function cache entries are now identified more fully and uniformly, too. I also arranged for plancache contexts to use the query source string as their identifier. This is basically free for CachedPlanSources, as they contained a copy of that string already. We pay an extra pstrdup to do it for CachedPlans. That could perhaps be avoided, but it would make things more fragile (since the CachedPlanSource is sometimes destroyed first). I suspect future improvements in error reporting will require CachedPlans to have a copy of that string anyway, so it's not clear that it's worth moving mountains to avoid it now. This also changes the APIs for context statistics routines so that the context-specific routines no longer assume that output goes straight to stderr, nor do they know all details of the output format. This is useful immediately to reduce code duplication, and it also allows for external code to do something with stats output that's different from printing to stderr. The reason for pushing this now rather than waiting for v12 is that it rethinks some of the API changes made by commit `9fa6f00b1`. Seems better for extension authors to endure just one round of API changes not two. Discussion: https://postgr.es/m/CAB=Je-FdtmFZ9y9REHD7VsSrnCkiBhsA4mdsLKSPauwXtQBeNA@mail.gmail.com	2018-03-27 16:46:51 -04:00
Alvaro Herrera	86f575948c	Allow FOR EACH ROW triggers on partitioned tables Previously, FOR EACH ROW triggers were not allowed in partitioned tables. Now we allow AFTER triggers on them, and on trigger creation we cascade to create an identical trigger in each partition. We also clone the triggers to each partition that is created or attached later. This means that deferred unique keys are allowed on partitioned tables, too. Author: Álvaro Herrera Reviewed-by: Peter Eisentraut, Simon Riggs, Amit Langote, Robert Haas, Thomas Munro Discussion: https://postgr.es/m/20171229225319.ajltgss2ojkfd3kp@alvherre.pgsql	2018-03-23 10:48:22 -03:00
Tom Lane	7c91a0364f	Sync up our various ways of estimating pg_class.reltuples. VACUUM thought that reltuples represents the total number of tuples in the relation, while ANALYZE counted only live tuples. This can cause "flapping" in the value when background vacuums and analyzes happen separately. The planner's use of reltuples essentially assumes that it's the count of live (visible) tuples, so let's standardize on having it mean live tuples. Another issue is that the definition of "live tuple" isn't totally clear; what should be done with INSERT_IN_PROGRESS or DELETE_IN_PROGRESS tuples? ANALYZE's choices in this regard are made on the assumption that if the originating transaction commits at all, it will happen after ANALYZE finishes, so we should ignore the effects of the in-progress transaction --- unless it is our own transaction, and then we should count it. Let's propagate this definition into VACUUM, too. Likewise propagate this definition into CREATE INDEX, and into contrib/pgstattuple's pgstattuple_approx() function. Tomas Vondra, reviewed by Haribabu Kommi, some corrections by me Discussion: https://postgr.es/m/16db4468-edfa-830a-f921-39a50498e77e@2ndquadrant.com	2018-03-22 15:47:41 -04:00
Dean Rasheed	b5db1d93d2	Improve ANALYZE's strategy for finding MCVs. Previously, a value was included in the MCV list if its frequency was 25% larger than the estimated average frequency of all nonnull values in the table. For uniform distributions, that can lead to values being included in the MCV list and significantly overestimated on the basis of relatively few (sometimes just 2) instances being seen in the sample. For non-uniform distributions, it can lead to too few values being included in the MCV list, since the overall average frequency may be dominated by a small number of very common values, while the remaining values may still have a large spread of frequencies, causing both substantial overestimation and underestimation of the remaining values. Furthermore, increasing the statistics target may have little effect because the overall average frequency will remain relatively unchanged. Instead, populate the MCV list with the largest set of common values that are statistically significantly more common than the average frequency of the remaining values. This takes into account the variance of the sample counts, which depends on the counts themselves and on the proportion of the table that was sampled. As a result, it constrains the relative standard error of estimates based on the frequencies of values in the list, reducing the chances of too many values being included. At the same time, it allows more values to be included, since the MCVs need only be more common than the remaining non-MCVs, rather than the overall average. Thus it tends to produce fewer MCVs than the previous code for uniform distributions, and more for non-uniform distributions, reducing estimation errors in both cases. In addition, the algorithm responds better to increasing the statistics target, allowing more values to be included in the MCV list when more of the table is sampled. Jeff Janes, substantially modified by me. Reviewed by John Naylor and Tomas Vondra. Discussion: https://postgr.es/m/CAMkU=1yvdGvW9TmiLAhz2erFnvnPFYHbOZuO+a=4DVkzpuQ2tw@mail.gmail.com	2018-03-22 09:37:36 +00:00
Alvaro Herrera	56163004b8	Fix relcache handling of the 'default' partition My commit `4dba331cb3` that moved around CommandCounterIncrement calls in partitioning DDL code unearthed a problem with the relcache handling for the 'default' partition: the construction of a correct relcache entry for the partitioned table was at the mercy of lack of CCI calls in non-trivial amounts of code. This was prone to creating problems later on, as the code develops. This was visible as a test failure in a compile with RELCACHE_FORCE_RELASE (buildfarm member prion). The problem is that after the mentioned commit it was possible to create a relcache entry that had incomplete information regarding the default partition because I introduced a CCI between adding the catalog entries for the default partition (StorePartitionBound) and the update of pg_partitioned_table entry for its parent partitioned table (update_default_partition_oid). It seems the best fix is to move the latter so that it occurs inside the former; the purposeful lack of intervening CCI should be more obvious, and harder to break. I also remove a check in RelationBuildPartitionDesc that returns NULL if the key is not set. I couldn't find any place that needs this hack anymore; probably it was required because of bugs that have since been fixed. Fix a few typos I noticed while reviewing the code involved. Discussion: https://postgr.es/m/20180320182659.nyzn3vqtjbbtfgwq@alvherre.pgsql	2018-03-21 12:03:35 -03:00
Peter Eisentraut	325f2ec555	Handle heap rewrites even better in logical decoding Logical decoding should not publish anything about tables created as part of a heap rewrite during DDL. Those tables don't exist externally, so consumers of logical decoding cannot do anything sensible with that information. In `ab28feae2b`, we worked around this for built-in logical replication, but that was hack. This is a more proper fix: We mark such transient heaps using the new field pg_class.relwrite, linking to the original relation OID. By default, we ignore them in logical decoding before they get to the output plugin. Optionally, a plugin can register their interest in getting such changes, if they handle DDL specially, in which case the new field will help them get information about the actual table. Reviewed-by: Craig Ringer <craig@2ndquadrant.com>	2018-03-21 09:15:04 -04:00
Alvaro Herrera	4dba331cb3	Fix CommandCounterIncrement in partition-related DDL It makes sense to do the CCIs in the places that do catalog updates, rather than before the places that error out because the former ones fail to do it. In particular, it looks like StorePartitionBound() and IndexSetParentIndex() ought to make their own CCIs. Per review comments from Peter Eisentraut for row-level triggers on partitioned tables. Discussion: https://postgr.es/m/20171229225319.ajltgss2ojkfd3kp@alvherre.pgsql	2018-03-20 11:19:41 -03:00
Tom Lane	6497a18e6c	Fix some corner-case issues in REFRESH MATERIALIZED VIEW CONCURRENTLY. refresh_by_match_merge() has some issues in the way it builds a SQL query to construct the "diff" table: 1. It doesn't require the selected unique index(es) to be indimmediate. 2. It doesn't pay attention to the particular equality semantics enforced by a given index, but just assumes that they must be those of the column datatype's default btree opclass. 3. It doesn't check that the indexes are btrees. 4. It's insufficiently careful to ensure that the parser will pick the intended operator when parsing the query. (This would have been a security bug before CVE-2018-1058.) 5. It's not careful about indexes on system columns. The way to fix #4 is to make use of the existing code in ri_triggers.c for generating an arbitrary binary operator clause. I chose to move that to ruleutils.c, since that seems a more reasonable place to be exporting such functionality from than ri_triggers.c. While #1, #3, and #5 are just latent given existing feature restrictions, and #2 doesn't arise in the core system for lack of alternate opclasses with different equality behaviors, #4 seems like an issue worth back-patching. That's the bulk of the change anyway, so just back-patch the whole thing to 9.4 where this code was introduced. Discussion: https://postgr.es/m/13836.1521413227@sss.pgh.pa.us	2018-03-19 18:50:05 -04:00
Tom Lane	6fbd5cce22	Fix performance hazard in REFRESH MATERIALIZED VIEW CONCURRENTLY. Jeff Janes discovered that commit `7ca25b7de` made one of the queries run by REFRESH MATERIALIZED VIEW CONCURRENTLY perform badly. The root cause is bad cardinality estimation for correlated quals, but a principled solution to that problem is some way off, especially since the planner lacks any statistics about whole-row variables. Moreover, in non-error cases this query produces no rows, meaning it must be run to completion; but use of LIMIT 1 encourages the planner to pick a fast-start, slow-completion plan, exactly not what we want. Remove the LIMIT clause, and instead rely on the count parameter we pass to SPI_execute() to prevent excess work if the query does return some rows. While we've heard no field reports of planner misbehavior with this query, it could be that people are having performance issues that haven't reached the level of pain needed to cause a bug report. In any case, that LIMIT clause can't possibly do anything helpful with any existing version of the planner, and it demonstrably can cause bad choices in some cases, so back-patch to 9.4 where the code was introduced. Thomas Munro Discussion: https://postgr.es/m/CAMkU=1z-JoGymHneGHar1cru4F1XDfHqJDzxP_CtK5cL3DOfmg@mail.gmail.com	2018-03-19 17:23:21 -04:00
Alvaro Herrera	6666ee49f4	Fix state reversal after partition tuple routing We make some changes to ModifyTableState and the EState it uses whenever we route tuples to partitions; but we weren't restoring properly in all cases, possibly causing crashes when partitions with different tuple descriptors are targeted by tuples inserted in the same command. Refactor some code, creating ExecPrepareTupleRouting, to encapsulate the needed state changing logic, and have it invoked one level above its current place (ie. put it in ExecModifyTable instead of ExecInsert); this makes it all more readable. Add a test case to exercise this. We don't support having views as partitions; and since only views can have INSTEAD OF triggers, there is no point in testing for INSTEAD OF when processing insertions into a partitioned table. Remove code that appears to support this (but which is actually never relevant.) In passing, fix location of some very confusing comments in ModifyTableState. Reported-by: Amit Langote Author: Etsuro Fujita, Amit Langote Discussion: https://postgr/es/m/0473bf5c-57b1-f1f7-3d58-455c2230bc5f@lab.ntt.co.jp	2018-03-19 17:45:53 -03:00
Andres Freund	7a50bb690b	Add 'unit' parameter to ExplainProperty{Integer,Float}. This allows to deduplicate some existing code, but mainly avoids some duplication in upcoming commits. In passing, fix variable names indicating wrong unit (seconds instead of ms). Author: Andres Freund Discussion: https://postgr.es/m/20180314002740.cah3mdsonz5mxney@alap3.anarazel.de	2018-03-16 23:16:04 -07:00
Andres Freund	f3e4b95edb	Make ExplainPropertyInteger accept 64bit input, remove *Long variant. 'long' is not useful type across platforms, as it's 32bit on 32 bit platforms, and even on some 64bit platforms (e.g. windows) it's still only 32bits wide. As ExplainPropertyInteger should never be performance critical, change it to accept a 64bit argument and remove ExplainPropertyLong. Author: Andres Freund Discussion: https://postgr.es/m/20180314164832.n56wt7zcbpzi6zxe@alap3.anarazel.de	2018-03-16 23:13:12 -07:00
Peter Eisentraut	04700b685f	Rename TransactionChain functions We call this thing a "transaction block" everywhere except in a few functions, where it is mysteriously called a "transaction chain". In the SQL standard, a transaction chain is something different. So rename these functions to match the common terminology. Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org>	2018-03-16 13:18:06 -04:00
Peter Eisentraut	3a4b891964	Fix more format truncation issues Fix the warnings created by the compiler warning options -Wformat-overflow=2 -Wformat-truncation=2, supported since GCC 7. This is a more aggressive variant of the fixes in `6275f5d28a`, which GCC 7 warned about by default. The issues are all harmless, but some dubious coding patterns are cleaned up. One issue that is of external interest is that BGW_MAXLEN is increased from 64 to 96. Apparently, the old value would cause the bgw_name of logical replication workers to be truncated in some circumstances. But this doesn't actually add those warning options. It appears that the warnings depend a bit on compilation and optimization options, so it would be annoying to have to keep up with that. This is more of a once-in-a-while cleanup. Reviewed-by: Michael Paquier <michael@paquier.xyz>	2018-03-15 11:41:42 -04:00
Peter Eisentraut	f66e8bf875	Remove pg_class.relhaspkey It is not used for anything internally, and it cannot be relied on for external uses, so it can just be removed. To correct recommended way to check for a primary key is in pg_index. Discussion: https://www.postgresql.org/message-id/flat/b1a24c6c-6913-f89c-674e-0704f0ed69db@2ndquadrant.com	2018-03-14 15:31:34 -04:00
Peter Eisentraut	33803f67f1	Support INOUT arguments in procedures In a top-level CALL, the values of INOUT arguments will be returned as a result row. In PL/pgSQL, the values are assigned back to the input arguments. In other languages, the same convention as for return a record from a function is used. That does not require any code changes in the PL implementations. Reviewed-by: Pavel Stehule <pavel.stehule@gmail.com>	2018-03-14 12:07:28 -04:00
Tom Lane	d04900de7d	When updating reltuples after ANALYZE, just extrapolate from our sample. The existing logic for updating pg_class.reltuples trusted the sampling results only for the pages ANALYZE actually visited, preferring to believe the previous tuple density estimate for all the unvisited pages. While there's some rationale for doing that for VACUUM (first that VACUUM is likely to visit a very nonrandom subset of pages, and second that we know for sure that the unvisited pages did not change), there's no such rationale for ANALYZE: by assumption, it's looked at an unbiased random sample of the table's pages. Furthermore, in a very large table ANALYZE will have examined only a tiny fraction of the table's pages, meaning it cannot slew the overall density estimate very far at all. In a table that is physically growing, this causes reltuples to increase nearly proportionally to the change in relpages, regardless of what is actually happening in the table. This has been observed to cause reltuples to become so much larger than reality that it effectively shuts off autovacuum, whose threshold for doing anything is a fraction of reltuples. (Getting to the point where that would happen seems to require some additional, not well understood, conditions. But it's undeniable that if reltuples is seriously off in a large table, ANALYZE alone will not fix it in any reasonable number of iterations, especially not if the table is continuing to grow.) Hence, restrict the use of vac_estimate_reltuples() to VACUUM alone, and in ANALYZE, just extrapolate from the sample pages on the assumption that they provide an accurate model of the whole table. If, by very bad luck, they don't, at least another ANALYZE will fix it; in the old logic a single bad estimate could cause problems indefinitely. In HEAD, let's remove vac_estimate_reltuples' is_analyze argument altogether; it was never used for anything and now it's totally pointless. But keep it in the back branches, in case any third-party code is calling this function. Per bug #15005. Back-patch to all supported branches. David Gould, reviewed by Alexander Kuzmenkov, cosmetic changes by me Discussion: https://postgr.es/m/20180117164916.3fdcf2e9@engels	2018-03-13 13:24:27 -04:00
Peter Eisentraut	377b5ac484	Fix CREATE TABLE / LIKE with bigint identity column CREATE TABLE / LIKE with a bigint identity column would fail on platforms where long is 32 bits. Copying the sequence values used makeInteger(), which would truncate the 64-bit sequence data to 32 bits. To fix, use makeFloat() instead, like the parser. (This does not actually make use of floats, but stores the values as strings.) Bug: #15096 Reviewed-by: Michael Paquier <michael@paquier.xyz>	2018-03-13 09:41:30 -04:00
Alvaro Herrera	1f8a3327a9	Avoid having two PKs in a partition If a table containing a primary key is attach as partition to a partitioned table which has a primary key with a different definition, we would happily create a second one in the new partition. Oops. It turns out that this is because an error check in DefineIndex is executed only if you tell it that it's being run by ALTER TABLE, and the original code here wasn't. Change it so that it does. Added a couple of test cases for this, also. A previously working test started to fail in a different way than before patch because the new check is called earlier; change the PK to plain UNIQUE so that the new behavior isn't invoked, so that the test continues to verify what we want it to verify. Reported by: Noriyoshi Shinoda Discussion: https://postgr.es/m/DF4PR8401MB102060EC2615EC9227CC73F7EEDF0@DF4PR8401MB1020.NAMPRD84.PROD.OUTLOOK.COM	2018-03-12 19:42:32 -03:00
Tom Lane	4a4e2442a7	Fix improper uses of canonicalize_qual(). One of the things canonicalize_qual() does is to remove constant-NULL subexpressions of top-level AND/OR clauses. It does that on the assumption that what it's given is a top-level WHERE clause, so that NULL can be treated like FALSE. Although this is documented down inside a subroutine of canonicalize_qual(), it wasn't mentioned in the documentation of that function itself, and some callers hadn't gotten that memo. Notably, commit `d007a9505` caused get_relation_constraints() to apply canonicalize_qual() to CHECK constraints. That allowed constraint exclusion to misoptimize situations in which a CHECK constraint had a provably-NULL subclause, as seen in the regression test case added here, in which a child table that should be scanned is not. (Although this thinko is ancient, the test case doesn't fail before 9.2, for reasons I've not bothered to track down in detail. There may be related cases that do fail before that.) More recently, commit `f0e44751d` added an independent bug by applying canonicalize_qual() to index expressions, which is even sillier since those might not even be boolean. If they are, though, I think this could lead to making incorrect index entries for affected index expressions in v10. I haven't attempted to prove that though. To fix, add an "is_check" parameter to canonicalize_qual() to specify whether it should assume WHERE or CHECK semantics, and make it perform NULL-elimination accordingly. Adjust the callers to apply the right semantics, or remove the call entirely in cases where it's not known that the expression has one or the other semantics. I also removed the call in some cases involving partition expressions, where it should be a no-op because such expressions should be canonical already ... and was a no-op, independently of whether it could in principle have done something, because it was being handed the qual in implicit-AND format which isn't what it expects. In HEAD, add an Assert to catch that type of mistake in future. This represents an API break for external callers of canonicalize_qual(). While that's intentional in HEAD to make such callers think about which case applies to them, it seems like something we probably wouldn't be thanked for in released branches. Hence, in released branches, the extra parameter is added to a new function canonicalize_qual_ext(), and canonicalize_qual() is a wrapper that retains its old behavior. Patch by me with suggestions from Dean Rasheed. Back-patch to all supported branches. Discussion: https://postgr.es/m/24475.1520635069@sss.pgh.pa.us	2018-03-11 18:10:42 -04:00
Tom Lane	5748f3a0aa	Improve predtest.c's internal docs, and enhance its functionality a bit. Commit `b08df9cab` left things rather poorly documented as far as the exact semantics of "clause_is_check" mode went. Also, that mode did not really work correctly for predicate_refuted_by; although given the lack of specification as to what it should do, as well as the lack of any actual use-case, that's perhaps not surprising. Rename "clause_is_check" to "weak" proof mode, and provide specifications for what it should do. I defined weak refutation as meaning "truth of A implies non-truth of B", which makes it possible to use the mode in the part of relation_excluded_by_constraints that checks for mutually contradictory WHERE clauses. Fix up several places that did things wrong for that definition. (As far as I can see, these errors would only lead to failure-to-prove, not incorrect claims of proof, making them not serious bugs even aside from the fact that v10 contains no use of this mode. So there seems no need for back-patching.) In addition, teach predicate_refuted_by_recurse that it can use predicate_implied_by_recurse after all when processing a strong NOT-clause, so long as it asks for the correct proof strength. This is an optimization that could have been included in commit `b08df9cab`, but wasn't. Also, simplify and generalize the logic that checks for whether nullness of the argument of IS [NOT] NULL would force overall nullness of the predicate or clause. (This results in a change in the partition_prune test's output, as it is now able to prune an all-nulls partition that it did not recognize before.) In passing, in PartConstraintImpliedByRelConstraint, remove bogus conversion of the constraint list to explicit-AND form and then right back again; that accomplished nothing except forcing a useless extra level of recursion inside predicate_implied_by. Discussion: https://postgr.es/m/5983.1520487191@sss.pgh.pa.us	2018-03-09 16:58:26 -05:00
Alvaro Herrera	1ffb63a2a1	Fix bogus Name assignment in CreateStatistics Apparently, it doesn't work to use a plain cstring as a Name datum: you may end up having random bytes because of failing to zero the bytes after the terminating \0, as indicated by valgrind. I introduced this bug in `5564c11815`, so backpatch this fix to REL_10_STABLE, like that commit. While at it, fix a slightly misleading comment, pointed out by David Rowley.	2018-03-06 13:20:40 -03:00
Alvaro Herrera	5564c11815	Clone extended stats in CREATE TABLE (LIKE INCLUDING ALL) The LIKE INCLUDING ALL clause to CREATE TABLE intuitively indicates cloning of extended statistics on the source table, but it failed to do so. Patch it up so that it does. Also include an INCLUDING STATISTICS option to the LIKE clause, so that the behavior can be requested individually, or excluded individually. While at it, reorder the INCLUDING options, both in code and in docs, in alphabetical order which makes more sense than feature-implementation order that was previously used. Backpatch this to Postgres 10, where extended statistics were introduced, because this is seen as an oversight in a fresh feature which is better to get consistent from the get-go instead of changing only in pg11. In pg11, comments on statistics objects are cloned too. In pg10 they are not, because I (Álvaro) was too coward to change the parse node as required to support it. Also, in pg10 I chose not to renumber the parser symbols for the various INCLUDING options in LIKE, for the same reason. Any corresponding user-visible changes (docs) are backpatched, though. Reported-by: Stephen Froehlich Author: David Rowley Reviewed-by: Álvaro Herrera, Tomas Vondra Discussion: https://postgr.es/m/CY1PR0601MB1927315B45667A1B679D0FD5E5EF0@CY1PR0601MB1927.namprd06.prod.outlook.com	2018-03-05 19:37:19 -03:00
Peter Eisentraut	fd1a421fe6	Add prokind column, replacing proisagg and proiswindow The new column distinguishes normal functions, procedures, aggregates, and window functions. This replaces the existing columns proisagg and proiswindow, and replaces the convention that procedures are indicated by prorettype == 0. Also change prorettype to be VOIDOID for procedures. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Michael Paquier <michael@paquier.xyz>	2018-03-02 13:48:33 -05:00
Tom Lane	25b692568f	Prevent dangling-pointer access when update trigger returns old tuple. A before-update row trigger may choose to return the "new" or "old" tuple unmodified. ExecBRUpdateTriggers failed to consider the second possibility, and would proceed to free the "old" tuple even if it was the one returned, leading to subsequent access to already-deallocated memory. In debug builds this reliably leads to an "invalid memory alloc request size" failure; in production builds it might accidentally work, but data corruption is also possible. This is a very old bug. There are probably a couple of reasons it hasn't been noticed up to now. It would be more usual to return NULL if one wanted to suppress the update action; returning "old" is significantly less efficient since the update will occur anyway. Also, none of the standard PLs would ever cause this because they all returned freshly-manufactured tuples even if they were just copying "old". But commit `4b93f5799` changed that for plpgsql, making it possible to see the bug with a plpgsql trigger. Still, this is certainly legal behavior for a trigger function, so it's ExecBRUpdateTriggers's fault not plpgsql's. It seems worth creating a test case that exercises returning "old" directly with a C-language trigger; testing this through plpgsql seems unreliable because its behavior might change again. Report and fix by Rushabh Lathia; regression test case by me. Back-patch to all supported branches. Discussion: https://postgr.es/m/CAGPqQf1P4pjiNPrMof=P_16E-DFjt457j+nH2ex3=nBTew7tXw@mail.gmail.com	2018-02-27 13:28:02 -05:00
Peter Eisentraut	76b6aa41f4	Support parameters in CALL To support parameters in CALL, move the parse analysis of the procedure and arguments into the global transformation phase, so that the parser hooks can be applied. And then at execution time pass the parameters from ProcessUtility on to ExecuteCallStmt.	2018-02-22 21:36:48 -05:00
Robert Haas	edd44738bc	Be lazier about partition tuple routing. It's not necessary to fully initialize the executor data structures for partitions to which no tuples are ever routed. Consider, for example, an INSERT statement that inserts only one row: it only cares about the partition to which that one row is routed. The new function ExecInitPartitionInfo performs the initialization in question only when a particular partition is about to receive a tuple. This includes creating, validating, and saving a pointer to the ResultRelInfo, setting up for speculative insertions, translating WCOs and initializing the resulting expressions, translating returning lists and building the appropriate projection information, and setting up a tuple conversion map. One thing that's not deferred is locking the child partitions; that seems desirable but would need more thought. Still, testing shows that this makes single-row inserts significantly faster on a table with many partitions without harming the bulk-insert case. Amit Langote, reviewed by Etsuro Fujita, with a few changes by me Discussion: http://postgr.es/m/8975331d-d961-cbdd-f862-fdd3d97dc2d0@lab.ntt.co.jp	2018-02-22 10:55:54 -05:00
Peter Eisentraut	c2ff42c6c1	Error message improvement	2018-02-20 17:58:27 -05:00
Alvaro Herrera	eb7ed3f306	Allow UNIQUE indexes on partitioned tables If we restrict unique constraints on partitioned tables so that they must always include the partition key, then our standard approach to unique indexes already works --- each unique key is forced to exist within a single partition, so enforcing the unique restriction in each index individually is enough to have it enforced globally. Therefore we can implement unique indexes on partitions by simply removing a few restrictions (and adding others.) Discussion: https://postgr.es/m/20171222212921.hi6hg6pem2w2t36z@alvherre.pgsql Discussion: https://postgr.es/m/20171229230607.3iib6b62fn3uaf47@alvherre.pgsql Reviewed-by: Simon Riggs, Jesper Pedersen, Peter Eisentraut, Jaime Casanova, Amit Langote	2018-02-19 17:40:00 -03:00
Peter Eisentraut	97a804cb2b	Message style fix	2018-02-18 17:16:11 -05:00
Peter Eisentraut	1a1adb215c	Move function comment to the right place	2018-02-17 20:45:28 -05:00
Andres Freund	ad7dbee368	Allow tupleslots to have a fixed tupledesc, use in executor nodes. The reason for doing so is that it will allow expression evaluation to optimize based on the underlying tupledesc. In particular it will allow to JIT tuple deforming together with the expression itself. For that expression initialization needs to be moved after the relevant slots are initialized - mostly unproblematic, except in the case of nodeWorktablescan.c. After doing so there's no need for ExecAssignResultType() and ExecAssignResultTypeFromTL() anymore, as all former callers have been converted to create a slot with a fixed descriptor. When creating a slot with a fixed descriptor, tts_values/isnull can be allocated together with the main slot, reducing allocation overhead and increasing cache density a bit. Author: Andres Freund Discussion: https://postgr.es/m/20171206093717.vqdxe5icqttpxs3p@alap3.anarazel.de	2018-02-16 21:17:38 -08:00
Tom Lane	d02d4a6d4f	Avoid premature free of pass-by-reference CALL arguments. Prematurely freeing the EState used to evaluate CALL arguments led, in some cases, to passing dangling pointers to the procedure. This was masked in trivial cases because the argument pointers would point to Const nodes in the original expression tree, and in some other cases because the result value would end up in the standalone ExprContext rather than in memory belonging to the EState --- but that wasn't exactly high quality programming either, because the standalone ExprContext was never explicitly freed, breaking assorted API contracts. In addition, using a separate EState for each argument was just silly. So let's use just one EState, and one ExprContext, and make the latter belong to the former rather than be standalone, and clean up the EState (and hence the ExprContext) post-call. While at it, improve the function's commentary a bit. Discussion: https://postgr.es/m/29173.1518282748@sss.pgh.pa.us	2018-02-10 13:37:12 -05:00
Tom Lane	65b1d76785	Fix oversight in CALL argument handling, and do some minor cleanup. CALL statements cannot support sub-SELECTs in the arguments of the called procedure, since they just use ExecEvalExpr to evaluate such arguments. Teach transformSubLink() to reject the case, as it already does for other contexts in which subqueries are not supported. In passing, s/EXPR_KIND_CALL/EXPR_KIND_CALL_ARGUMENT/ to make that enum symbol line up more closely with the phrasing of the error messages it is associated with. And fix someone's weak grasp of English grammar in the preceding EXPR_KIND_PARTITION_EXPRESSION addition. Also update an incorrect comment in resolve_unique_index_expr (possibly it was correct when written, but nowadays transformExpr definitely does reject SRFs here). Per report from Pavel Stehule --- but this resolves only one of the bugs he mentions. Discussion: https://postgr.es/m/CAFj8pRDxOwPPzpA8i+AQeDQFj7bhVw-dR2==rfWZ3zMGkm568Q@mail.gmail.com	2018-02-10 13:05:14 -05:00
Robert Haas	e44dd84325	Avoid listing the same ResultRelInfo in more than one EState list. Doing so causes EXPLAIN ANALYZE to show trigger statistics multiple times. Commit `2f17844104` seems to be to blame for this. Amit Langote, revieed by Amit Khandekar, Etsuro Fujita, and me.	2018-02-08 14:29:05 -05:00
Tom Lane	0a459cec96	Support all SQL:2011 options for window frame clauses. This patch adds the ability to use "RANGE offset PRECEDING/FOLLOWING" frame boundaries in window functions. We'd punted on that back in the original patch to add window functions, because it was not clear how to do it in a reasonably data-type-extensible fashion. That problem is resolved here by adding the ability for btree operator classes to provide an "in_range" support function that defines how to add or subtract the RANGE offset value. Factoring it this way also allows the operator class to avoid overflow problems near the ends of the datatype's range, if it wishes to expend effort on that. (In the committed patch, the integer opclasses handle that issue, but it did not seem worth the trouble to avoid overflow failures for datetime types.) The patch includes in_range support for the integer_ops opfamily (int2/int4/int8) as well as the standard datetime types. Support for other numeric types has been requested, but that seems like suitable material for a follow-on patch. In addition, the patch adds GROUPS mode which counts the offset in ORDER-BY peer groups rather than rows, and it adds the frame_exclusion options specified by SQL:2011. As far as I can see, we are now fully up to spec on window framing options. Existing behaviors remain unchanged, except that I changed the errcode for a couple of existing error reports to meet the SQL spec's expectation that negative "offset" values should be reported as SQLSTATE 22013. Internally and in relevant parts of the documentation, we now consistently use the terminology "offset PRECEDING/FOLLOWING" rather than "value PRECEDING/FOLLOWING", since the term "value" is confusingly vague. Oliver Ford, reviewed and whacked around some by me Discussion: https://postgr.es/m/CAGMVOdu9sivPAxbNN0X+q19Sfv9edEPv=HibOJhB14TJv_RCQg@mail.gmail.com	2018-02-07 00:06:56 -05:00
Peter Eisentraut	533c5d8bdd	Fix application of identity values in some cases Investigation of `2d2d06b7e2` revealed that identity values were not applied in some further cases, including logical replication subscribers, VALUES RTEs, and ALTER TABLE ... ADD COLUMN. To fix all that, apply the identity column expression in build_column_default() instead of repeating the same logic at each call site. For ALTER TABLE ... ADD COLUMN ... IDENTITY, the previous coding completely ignored that existing rows for the new column should have values filled in from the identity sequence. The coding using build_column_default() fails for this because the sequence ownership isn't registered until after ALTER TABLE, and we can't do it before because we don't have the column in the catalog yet. So we specially remember in ColumnDef the sequence name that we decided on and build a custom NextValueExpr using that. Reviewed-by: Michael Paquier <michael.paquier@gmail.com>	2018-02-02 14:39:10 -05:00
Robert Haas	9da0cc3528	Support parallel btree index builds. To make this work, tuplesort.c and logtape.c must also support parallelism, so this patch adds that infrastructure and then applies it to the particular case of parallel btree index builds. Testing to date shows that this can often be 2-3x faster than a serial index build. The model for deciding how many workers to use is fairly primitive at present, but it's better than not having the feature. We can refine it as we get more experience. Peter Geoghegan with some help from Rushabh Lathia. While Heikki Linnakangas is not an author of this patch, he wrote other patches without which this feature would not have been possible, and therefore the release notes should possibly credit him as an author of this feature. Reviewed by Claudio Freire, Heikki Linnakangas, Thomas Munro, Tels, Amit Kapila, me. Discussion: http://postgr.es/m/CAM3SWZQKM=Pzc=CAHzRixKjp2eO5Q0Jg1SoFQqeXFQ647JiwqQ@mail.gmail.com Discussion: http://postgr.es/m/CAH2-Wz=AxWqDoVvGU7dq856S4r6sJAj6DBn7VMtigkB33N5eyg@mail.gmail.com	2018-02-02 13:32:44 -05:00
Peter Eisentraut	07e524d3e9	Silence complaint about dead assignment The preferred place for "placate compiler" assignments is after elog(ERROR), not before it. Otherwise, scan-build complains about a dead assignment.	2018-01-29 20:43:43 -05:00
Tom Lane	fb8697b31a	Avoid unnecessary use of pg_strcasecmp for already-downcased identifiers. We have a lot of code in which option names, which from the user's viewpoint are logically keywords, are passed through the grammar as plain identifiers, and then matched to string literals during command execution. This approach avoids making words into lexer keywords unnecessarily. Some places matched these strings using plain strcmp, some using pg_strcasecmp. But the latter should be unnecessary since identifiers would have been downcased on their way through the parser. Aside from any efficiency concerns (probably not a big factor), the lack of consistency in this area creates a hazard of subtle bugs due to different places coming to different conclusions about whether two option names are the same or different. Hence, standardize on using strcmp() to match any option names that are expected to have been fed through the parser. This does create a user-visible behavioral change, which is that while formerly all of these would work: alter table foo set (fillfactor = 50); alter table foo set (FillFactor = 50); alter table foo set ("fillfactor" = 50); alter table foo set ("FillFactor" = 50); now the last case will fail because that double-quoted identifier is different from the others. However, none of our documentation says that you can use a quoted identifier in such contexts at all, and we should discourage doing so since it would break if we ever decide to parse such constructs as true lexer keywords rather than poor man's substitutes. So this shouldn't create a significant compatibility issue for users. Daniel Gustafsson, reviewed by Michael Paquier, small changes by me Discussion: https://postgr.es/m/29405B24-564E-476B-98C0-677A29805B84@yesql.se	2018-01-26 18:25:14 -05:00
Tom Lane	4971d2a322	Remove the obsolete WITH clause of CREATE FUNCTION. This clause was superseded by SQL-standard syntax back in 7.3. We've kept it around for backwards-compatibility purposes ever since; but 15 years seems like long enough for that, especially seeing that there are undocumented weirdnesses in how it interacts with the SQL-standard syntax for specifying the same options. Michael Paquier, per an observation by Daniel Gustafsson; some small cosmetic adjustments to nearby code by me. Discussion: https://postgr.es/m/20180115022748.GB1724@paquier.xyz	2018-01-26 12:25:44 -05:00
Alvaro Herrera	05fb5d6619	Ignore partitioned indexes where appropriate get_relation_info() was too optimistic about opening indexes in partitioned tables, which would raise errors when any queries were planned on such tables. Fix by ignoring any indexes of the partitioned kind. CLUSTER (and ALTER TABLE CLUSTER ON) had a similar problem. Fix by disallowing these commands in partitioned tables. Fallout from `8b08f7d482`.	2018-01-25 16:12:15 -03:00
Peter Eisentraut	8561e4840c	Transaction control in PL procedures In each of the supplied procedural languages (PL/pgSQL, PL/Perl, PL/Python, PL/Tcl), add language-specific commit and rollback functions/commands to control transactions in procedures in that language. Add similar underlying functions to SPI. Some additional cleanup so that transaction commit or abort doesn't blow away data structures still used by the procedure call. Add execution context tracking to CALL and DO statements so that transaction control commands can only be issued in top-level procedure and block calls, not function calls or other procedure or block calls. - SPI Add a new function SPI_connect_ext() that is like SPI_connect() but allows passing option flags. The only option flag right now is SPI_OPT_NONATOMIC. A nonatomic SPI connection can execute transaction control commands, otherwise it's not allowed. This is meant to be passed down from CALL and DO statements which themselves know in which context they are called. A nonatomic SPI connection uses different memory management. A normal SPI connection allocates its memory in TopTransactionContext. For nonatomic connections we use PortalContext instead. As the comment in SPI_connect_ext() (previously SPI_connect()) indicates, one could potentially use PortalContext in all cases, but it seems safest to leave the existing uses alone, because this stuff is complicated enough already. SPI also gets new functions SPI_start_transaction(), SPI_commit(), and SPI_rollback(), which can be used by PLs to implement their transaction control logic. - portalmem.c Some adjustments were made in the code that cleans up portals at transaction abort. The portal code could already handle a command committing a transaction and continuing (e.g., VACUUM), but it was not quite prepared for a command aborting a transaction and continuing. In AtAbort_Portals(), remove the code that marks an active portal as failed. As the comment there already predicted, this doesn't work if the running command wants to keep running after transaction abort. And it's actually not necessary, because pquery.c is careful to run all portal code in a PG_TRY block and explicitly runs MarkPortalFailed() if there is an exception. So the code in AtAbort_Portals() is never used anyway. In AtAbort_Portals() and AtCleanup_Portals(), we need to be careful not to clean up active portals too much. This mirrors similar code in PreCommit_Portals(). - PL/Perl Gets new functions spi_commit() and spi_rollback() - PL/pgSQL Gets new commands COMMIT and ROLLBACK. Update the PL/SQL porting example in the documentation to reflect that transactions are now possible in procedures. - PL/Python Gets new functions plpy.commit and plpy.rollback. - PL/Tcl Gets new commands commit and rollback. Reviewed-by: Andrew Dunstan <andrew.dunstan@2ndquadrant.com>	2018-01-22 08:43:06 -05:00
Robert Haas	2f17844104	Allow UPDATE to move rows between partitions. When an UPDATE causes a row to no longer match the partition constraint, try to move it to a different partition where it does match the partition constraint. In essence, the UPDATE is split into a DELETE from the old partition and an INSERT into the new one. This can lead to surprising behavior in concurrency scenarios because EvalPlanQual rechecks won't work as they normally did; the known problems are documented. (There is a pending patch to improve the situation further, but it needs more review.) Amit Khandekar, reviewed and tested by Amit Langote, David Rowley, Rajkumar Raghuwanshi, Dilip Kumar, Amul Sul, Thomas Munro, Álvaro Herrera, Amit Kapila, and me. A few final revisions by me. Discussion: http://postgr.es/m/CAJ3gD9do9o2ccQ7j7+tSgiE1REY65XRiMb=yJO3u3QhyP8EEPQ@mail.gmail.com	2018-01-19 15:33:06 -05:00
Peter Eisentraut	8b9e9644dc	Replace AclObjectKind with ObjectType AclObjectKind was basically just another enumeration for object types, and we already have a preferred one for that. It's only used in aclcheck_error. By using ObjectType instead, we can also give some more precise error messages, for example "index" instead of "relation". Reviewed-by: Michael Paquier <michael.paquier@gmail.com>	2018-01-19 14:01:15 -05:00
Peter Eisentraut	2c6f37ed62	Replace GrantObjectType with ObjectType There used to be a lot of different Type and Kind symbol groups to address objects within different commands, most of which have been replaced by ObjectType, starting with `b256f24264`. But this conversion was never done for the ACL commands until now. This change ends up being just a plain replacement of the types and symbols, without any code restructuring needed, except deleting some now redundant code. Reviewed-by: Michael Paquier <michael.paquier@gmail.com> Reviewed-by: Stephen Frost <sfrost@snowman.net>	2018-01-19 14:01:14 -05:00
Alvaro Herrera	8b08f7d482	Local partitioned indexes When CREATE INDEX is run on a partitioned table, create catalog entries for an index on the partitioned table (which is just a placeholder since the table proper has no data of its own), and recurse to create actual indexes on the existing partitions; create them in future partitions also. As a convenience gadget, if the new index definition matches some existing index in partitions, these are picked up and used instead of creating new ones. Whichever way these indexes come about, they become attached to the index on the parent table and are dropped alongside it, and cannot be dropped on isolation unless they are detached first. To support pg_dump'ing these indexes, add commands CREATE INDEX ON ONLY <table> (which creates the index on the parent partitioned table, without recursing) and ALTER INDEX ATTACH PARTITION (which is used after the indexes have been created individually on each partition, to attach them to the parent index). These reconstruct prior database state exactly. Reviewed-by: (in alphabetical order) Peter Eisentraut, Robert Haas, Amit Langote, Jesper Pedersen, Simon Riggs, David Rowley Discussion: https://postgr.es/m/20171113170646.gzweigyrgg6pwsg4@alvherre.pgsql	2018-01-19 11:49:22 -03:00
Alvaro Herrera	1ef61ddce9	Fix StoreCatalogInheritance1 to use 32bit inhseqno For no apparent reason, this function was using a 16bit-wide inhseqno value, rather than the correct 32 bit width which is what is stored in the pg_inherits catalog. This becomes evident if you try to create a table with more than 65535 parents, because this error appears: ERROR: duplicate key value violates unique constraint «pg_inherits_relid_seqno_index» DETAIL: Key (inhrelid, inhseqno)=(329371, 0) already exists. Needless to say, having so many parents is an uncommon situations, which explains why this error has never been reported despite being having been introduced with the Postgres95 1.01 sources in commit `d31084e9d1`: https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/commands/creatinh.c;hb=d31084e9d111#l349 Backpatch all the way back. David Rowley noticed this while reviewing a patch of mine. Discussion: https://postgr.es/m/CAKJS1f8Dn7swSEhOWwzZzssW7747YB=2Hi+T7uGud40dur69-g@mail.gmail.com	2018-01-19 10:17:54 -03:00
Tom Lane	4d41b2e092	Add QueryEnvironment to ExplainOneQuery_hook's parameter list. This should have been done in commit `18ce3a4ab`, which added that parameter to ExplainOneQuery, but it was overlooked. This makes it impossible for a user of the hook to pass the queryEnv down to ExplainOnePlan. It's too late to change this API in v10, I suppose, but fortunately passing NULL to ExplainOnePlan will work in nearly all interesting cases in v10. That might not be true forever, so we'd better fix it. Tatsuro Yamada, reviewed by Thomas Munro Discussion: https://postgr.es/m/890e8dd9-c1c7-a422-6892-874f5eaee048@lab.ntt.co.jp	2018-01-11 12:16:18 -05:00
Peter Eisentraut	a77dd53f30	Remove PortalGetQueryDesc() After having gotten rid of PortalGetHeapMemory(), there seems little reason to keep one Portal access macro around that offers no actual abstraction and isn't consistently used anyway. Reviewed-by: Andrew Dunstan <andrew.dunstan@2ndquadrant.com> Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org>	2018-01-09 13:47:56 -05:00
Peter Eisentraut	0f7c49e855	Update portal-related memory context names and API Rename PortalMemory to TopPortalContext, to avoid confusion with PortalContext and align naming with similar top-level memory contexts. Rename PortalData's "heap" field to portalContext. The "heap" naming seems quite antiquated and confusing. Also get rid of the PortalGetHeapMemory() macro and access the field directly, which we do for other portal fields, so this abstraction doesn't buy anything. Reviewed-by: Andrew Dunstan <andrew.dunstan@2ndquadrant.com> Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org>	2018-01-09 13:47:56 -05:00
Robert Haas	19c47e7c82	Factor error generation out of ExecPartitionCheck. At present, we always raise an ERROR if the partition constraint is violated, but a pending patch for UPDATE tuple routing will consider instead moving the tuple to the correct partition. Refactor to make that simpler. Amit Khandekar, reviewed by Amit Langote, David Rowley, and me. Discussion: http://postgr.es/m/CAJ3gD9cue54GbEzfV-61nyGpijvjZgCcghvLsB0_nL8Nm8HzCA@mail.gmail.com	2018-01-05 15:22:33 -05:00
Robert Haas	ef6087ee5f	Minor preparatory refactoring for UPDATE row movement. Generalize is_partition_attr to has_partition_attrs and make it accessible from outside tablecmds.c. Change map_partition_varattnos to clarify that it can be used for mapping between any two relations in a partitioning hierarchy, not just parent -> child. Amit Khandekar, reviewed by Amit Langote, David Rowley, and me. Some comment changes by me. Discussion: http://postgr.es/m/CAJ3gD9fWfxgKC+PfJZF3hkgAcNOy-LpfPxVYitDEXKHjeieWQQ@mail.gmail.com	2018-01-04 16:25:49 -05:00
Robert Haas	cc6337d2fe	Simplify and encapsulate tuple routing support code. Instead of having ExecSetupPartitionTupleRouting return multiple out parameters, have it return a pointer to a structure containing all of those different things. Also, provide and use a cleanup function, ExecCleanupTupleRouting, instead of cleaning up all of the resources allocated by ExecSetupPartitionTupleRouting individually. Amit Khandekar, reviewed by Amit Langote, David Rowley, and me Discussion: http://postgr.es/m/CAJ3gD9fWfxgKC+PfJZF3hkgAcNOy-LpfPxVYitDEXKHjeieWQQ@mail.gmail.com	2018-01-04 15:48:15 -05:00
Bruce Momjian	9d4649ca49	Update copyright for 2018 Backpatch-through: certain files through 9.3	2018-01-02 23:30:12 -05:00
Alvaro Herrera	54eff5311d	Fix deadlock hazard in CREATE INDEX CONCURRENTLY Multiple sessions doing CREATE INDEX CONCURRENTLY simultaneously are supposed to be able to work in parallel, as evidenced by fixes in commit `c3d09b3bd2` specifically to support this case. In reality, one of the sessions would be aborted by a misterious "deadlock detected" error. Jeff Janes diagnosed that this is because of leftover snapshots used for system catalog scans -- this was broken by `8aa3e47510` keeping track of (registering) the catalog snapshot. To fix the deadlocks, it's enough to de-register that snapshot prior to waiting. Backpatch to 9.4, which introduced MVCC catalog scans. Include an isolationtester spec that 8 out of 10 times reproduces the deadlock with the unpatched code for me (Álvaro). Author: Jeff Janes Diagnosed-by: Jeff Janes Reported-by: Jeremy Finzel Discussion: https://postgr.es/m/CAMa1XUhHjCv8Qkx0WOr1Mpm_R4qxN26EibwCrj0Oor2YBUFUTg%40mail.gmail.com	2018-01-02 19:16:16 -03:00
Andres Freund	93ea78b17c	Fix EXPLAIN ANALYZE output for Parallel Hash. In a race case, EXPLAIN ANALYZE could fail to display correct nbatch and size information. Refactor so that participants report only on batches they worked on rather than trying to report on all of them, and teach explain.c to consider the HashInstrumentation object from all participants instead of picking the first one it can find. This should fix an occasional build farm failure in the "join" regression test. Author: Thomas Munro Reviewed-By: Andres Freund Discussion: https://postgr.es/m/30219.1514428346%40sss.pgh.pa.us	2018-01-01 14:38:23 -08:00
Simon Riggs	2958a672b1	Extend near-wraparound hints to include replication slots Author: Feike Steenbergen Reviewed-by: Michael Paquier	2017-12-29 14:01:25 +00:00
Teodor Sigaev	ad337c76b6	Update relation's stats in pg_class during vacuum full. Hash index depends on estimation of numbers of tuples and pages of relations, incorrect value could be a reason of significantly growing of index. Vacuum full recreates heap and reindex all indexes before renewal stats. The patch fixes that, so indexes will see correct values. Backpatch to v10 only because earlier versions haven't usable hash index and growing of hash index is a single user-visible symptom. Author: Amit Kapila Reviewed-by: Ashutosh Sharma, me Discussion: https://www.postgresql.org/message-id/flat/20171115232922.5tomkxnw3iq6jsg7@inml.weebeastie.net	2017-12-27 18:25:37 +03:00
Tom Lane	6719b238e8	Rearrange execution of PARAM_EXTERN Params for plpgsql's benefit. This patch does three interrelated things: * Create a new expression execution step type EEOP_PARAM_CALLBACK and add the infrastructure needed for add-on modules to generate that. As discussed, the best control mechanism for that seems to be to add another hook function to ParamListInfo, which will be called by ExecInitExpr if it's supplied and a PARAM_EXTERN Param is found. For stand-alone expressions, we add a new entry point to allow the ParamListInfo to be specified directly, since it can't be retrieved from the parent plan node's EState. * Redesign the API for the ParamListInfo paramFetch hook so that the ParamExternData array can be entirely virtual. This also lets us get rid of ParamListInfo.paramMask, instead leaving it to the paramFetch hook to decide which param IDs should be accessible or not. plpgsql_param_fetch was already doing the identical masking check, so having callers do it too seemed redundant. While I was at it, I added a "speculative" flag to paramFetch that the planner can specify as TRUE to avoid unwanted failures. This solves an ancient problem for plpgsql that it couldn't provide values of non-DTYPE_VAR variables to the planner for fear of triggering premature "record not assigned yet" or "field not found" errors during planning. * Rework plpgsql to get rid of the need for "unshared" parameter lists, by dint of turning the single ParamListInfo per estate into a nearly read-only data structure that doesn't instantiate any per-variable data. Instead, the paramFetch hook controls access to per-variable data and can make the right decisions on the fly, replacing the cases that we used to need multiple ParamListInfos for. This might perhaps have been a performance loss on its own, but by using a paramCompile hook we can bypass plpgsql_param_fetch entirely during normal query execution. (It's now only called when, eg, we copy the ParamListInfo into a cursor portal. copyParamList() or SerializeParamList() effectively instantiate the virtual parameter array as a simple physical array without a paramFetch hook, which is what we want in those cases.) This allows reverting most of commit `6c82d8d1f`, though I kept the cosmetic code-consolidation aspects of that (eg the assign_simple_var function). Performance testing shows this to be at worst a break-even change, and it can provide wins ranging up to 20% in test cases involving accesses to fields of "record" variables. The fact that values of such variables can now be exposed to the planner might produce wins in some situations, too, but I've not pursued that angle. In passing, remove the "parent" pointer from the arguments to ExecInitExprRec and related functions, instead storing that pointer in a transient field in ExprState. The ParamListInfo pointer for a stand-alone expression is handled the same way; we'd otherwise have had to add yet another recursively-passed-down argument in expression compilation. Discussion: https://postgr.es/m/32589.1513706441@sss.pgh.pa.us	2017-12-21 12:57:45 -05:00
Andres Freund	699bf7d05c	Perform a lot more sanity checks when freezing tuples. The previous commit has shown that the sanity checks around freezing aren't strong enough. Strengthening them seems especially important because the existance of the bug has caused corruption that we don't want to make even worse during future vacuum cycles. The errors are emitted with ereport rather than elog, despite being "should never happen" messages, so a proper error code is emitted. To avoid superflous translations, mark messages as internal. Author: Andres Freund and Alvaro Herrera Reviewed-By: Alvaro Herrera, Michael Paquier Discussion: https://postgr.es/m/20171102112019.33wb7g5wp4zpjelu@alap3.anarazel.de Backpatch: 9.3-	2017-12-14 18:20:47 -08:00
Tom Lane	9fa6f00b13	Rethink MemoryContext creation to improve performance. This patch makes a number of interrelated changes to reduce the overhead involved in creating/deleting memory contexts. The key ideas are: * Include the AllocSetContext header of an aset.c context in its first malloc request, rather than allocating it separately in TopMemoryContext. This means that we now always create an initial or "keeper" block in an aset, even if it never receives any allocation requests. * Create freelists in which we can save and recycle recently-destroyed asets (this idea is due to Robert Haas). * In the common case where the name of a context is a constant string, just store a pointer to it in the context header, rather than copying the string. The first change eliminates a palloc/pfree cycle per context, and also avoids bloat in TopMemoryContext, at the price that creating a context now involves a malloc/free cycle even if the context never receives any allocations. That would be a loser for some common usage patterns, but recycling short-lived contexts via the freelist eliminates that pain. Avoiding copying constant strings not only saves strlen() and strcpy() overhead, but is an essential part of the freelist optimization because it makes the context header size constant. Currently we make no attempt to use the freelist for contexts with non-constant names. (Perhaps someday we'll need to think harder about that, but in current usage, most contexts with custom names are long-lived anyway.) The freelist management in this initial commit is pretty simplistic, and we might want to refine it later --- but in common workloads that will never matter because the freelists will never get full anyway. To create a context with a non-constant name, one is now required to call AllocSetContextCreateExtended and specify the MEMCONTEXT_COPY_NAME option. AllocSetContextCreate becomes a wrapper macro, and it includes a test that will complain about non-string-literal context name parameters on gcc and similar compilers. An unfortunate side effect of making AllocSetContextCreate a macro is that one is now required to use the size parameter abstraction macros (ALLOCSET_DEFAULT_SIZES and friends) with it; the pre-9.6 habit of writing out individual size parameters no longer works unless you switch to AllocSetContextCreateExtended. Internally to the memory-context-related modules, the context creation APIs are simplified, removing the rather baroque original design whereby a context-type module called mcxt.c which then called back into the context-type module. That saved a bit of code duplication, but not much, and it prevented context-type modules from exercising control over the allocation of context headers. In passing, I converted the test-and-elog validation of aset size parameters into Asserts to save a few more cycles. The original thought was that callers might compute size parameters on the fly, but in practice nobody does that, so it's useless to expend cycles on checking those numbers in production builds. Also, mark the memory context method-pointer structs "const", just for cleanliness. Discussion: https://postgr.es/m/2264.1512870796@sss.pgh.pa.us	2017-12-13 13:55:16 -05:00
Peter Eisentraut	2d2d06b7e2	Apply identity sequence values on COPY A COPY into a table should apply identity sequence values just like it does for ordinary defaults. This was previously forgotten, leading to null values being inserted, which in turn would fail because identity columns have not-null constraints. Author: Michael Paquier <michael.paquier@gmail.com> Reported-by: Steven Winfield <steven.winfield@cantabcapital.com> Bug: #14952	2017-12-08 09:18:18 -05:00
Andres Freund	5bcf389ecf	Fix EXPLAIN ANALYZE of hash join when the leader doesn't participate. If a hash join appears in a parallel query, there may be no hash table available for explain.c to inspect even though a hash table may have been built in other processes. This could happen either because parallel_leader_participation was set to off or because the leader happened to hit the end of the outer relation immediately (even though the complete relation is not empty) and decided not to build the hash table. Commit `bf11e7ee` introduced a way for workers to exchange instrumentation via the DSM segment for Sort nodes even though they are not parallel-aware. This commit does the same for Hash nodes, so that explain.c has a way to find instrumentation data from an arbitrary participant that actually built the hash table. Author: Thomas Munro Reviewed-By: Andres Freund Discussion: https://postgr.es/m/CAEepm%3D3DUQC2-z252N55eOcZBer6DPdM%3DFzrxH9dZc5vYLsjaA%40mail.gmail.com	2017-12-05 10:55:56 -08:00
Robert Haas	ab6eaee884	When VACUUM or ANALYZE skips a concurrently dropped table, log it. Hopefully, the additional logging will help avoid confusion that could otherwise result. Nathan Bossart, reviewed by Michael Paquier, Fabrízio Mello, and me	2017-12-04 15:25:55 -05:00
Robert Haas	87c37e3291	Re-allow INSERT .. ON CONFLICT DO NOTHING on partitioned tables. Commit `8355a011a0` was reverted in `f05230752d`, but this attempt is hopefully better-considered: we now pass the correct value to ExecOpenIndices, which should avoid the crash that we hit before. Amit Langote, reviewed by Simon Riggs and by me. Some final editing by me. Discussion: http://postgr.es/m/7ff1e8ec-dc39-96b1-7f47-ff5965dceeac@lab.ntt.co.jp	2017-12-01 12:53:21 -05:00
Peter Eisentraut	e4128ee767	SQL procedures This adds a new object type "procedure" that is similar to a function but does not have a return type and is invoked by the new CALL statement instead of SELECT or similar. This implementation is aligned with the SQL standard and compatible with or similar to other SQL implementations. This commit adds new commands CALL, CREATE/ALTER/DROP PROCEDURE, as well as ALTER/DROP ROUTINE that can refer to either a function or a procedure (or an aggregate function, as an extension to SQL). There is also support for procedures in various utility commands such as COMMENT and GRANT, as well as support in pg_dump and psql. Support for defining procedures is available in all the languages supplied by the core distribution. While this commit is mainly syntax sugar around existing functionality, future features will rely on having procedures as a separate object type. Reviewed-by: Andrew Dunstan <andrew.dunstan@2ndquadrant.com>	2017-11-30 11:03:20 -05:00
Robert Haas	eaedf0df71	Update typedefs.list and re-run pgindent Discussion: http://postgr.es/m/CA+TgmoaA9=1RWKtBWpDaj+sF3Stgc8sHgf5z=KGtbjwPLQVDMA@mail.gmail.com	2017-11-29 09:24:24 -05:00
Tom Lane	cb03fa33ae	Fix assorted syscache lookup sloppiness in partition-related code. heap_drop_with_catalog and ATExecDetachPartition neglected to check for SearchSysCache failures, as noted in bugs #14927 and #14928 from Pan Bian. Such failures are pretty unlikely, since we should already have some sort of lock on the rel at these points, but it's neither a good idea nor per project style to omit a check for failure. Also, StorePartitionKey contained a syscache lookup that it never did anything with, including never releasing the result. Presumably the reason why we don't see refcount-leak complaints is that the lookup always fails; but in any case it's pretty useless, so remove it. All of these errors were evidently introduced by the relation partitioning feature. Back-patch to v10 where that came in. Amit Langote and Tom Lane Discussion: https://postgr.es/m/20171127090105.1463.3962@wrigleys.postgresql.org Discussion: https://postgr.es/m/20171127091341.1468.72696@wrigleys.postgresql.org	2017-11-27 19:22:08 -05:00
Robert Haas	e89a71fb44	Pass InitPlan values to workers via Gather (Merge). If a PARAM_EXEC parameter is used below a Gather (Merge) but the InitPlan that computes it is attached to or above the Gather (Merge), force the value to be computed before starting parallelism and pass it down to all workers. This allows us to use parallelism in cases where it previously would have had to be rejected as unsafe. We do - in this case - lose the optimization that the value is only computed if it's actually used. An alternative strategy would be to have the first worker that needs the value compute it, but one downside of that approach is that we'd then need to select a parallel-safe path to compute the parameter value; it couldn't for example contain a Gather (Merge) node. At some point in the future, we might want to consider both approaches. Independent of that consideration, there is a great deal more work that could be done to make more kinds of PARAM_EXEC parameters parallel-safe. This infrastructure could be used to allow a Gather (Merge) on the inner side of a nested loop (although that's not a very appealing plan) and cases where the InitPlan is attached below the Gather (Merge) could be addressed as well using various techniques. But this is a good start. Amit Kapila, reviewed and revised by me. Reviewing and testing from Kuntal Ghosh, Haribabu Kommi, and Tushar Ahuja. Discussion: http://postgr.es/m/CAA4eK1LV0Y1AUV4cUCdC+sYOx0Z0-8NAJ2Pd9=UKsbQ5Sr7+JQ@mail.gmail.com	2017-11-16 12:06:14 -05:00
Robert Haas	4e5fe9ad19	Centralize executor-related partitioning code. Some code is moved from partition.c, which has grown very quickly lately; splitting the executor parts out might help to keep it from getting totally out of control. Other code is moved from execMain.c. All is moved to a new file execPartition.c. get_partition_for_tuple now has a new interface that more clearly separates executor concerns from generic concerns. Amit Langote. A slight comment tweak by me. Discussion: http://postgr.es/m/1f0985f8-3b61-8bc4-4350-baa6d804cb6d@lab.ntt.co.jp	2017-11-15 10:26:25 -05:00
Alvaro Herrera	a61f5ab986	Simplify index_[constraint_]create API Instead of passing large swaths of boolean arguments, define some flags that can be used in a bitmask. This makes it easier not only to figure out what each call site is doing, but also to add some new flags. The flags are split in two -- one set for index_create directly and another for constraints. index_create() itself receives both, and then passes down the latter to index_constraint_create(), which can also be called standalone. Discussion: https://postgr.es/m/20171023151251.j75uoe27gajdjmlm@alvherre.pgsql Reviewed-by: Simon Riggs	2017-11-14 15:19:05 +01:00
Peter Eisentraut	0e1539ba0d	Add some const decorations to prototypes Reviewed-by: Fabien COELHO <coelho@cri.ensmp.fr>	2017-11-10 13:38:57 -05:00
Robert Haas	1aba8e651a	Add hash partitioning. Hash partitioning is useful when you want to partition a growing data set evenly. This can be useful to keep table sizes reasonable, which makes maintenance operations such as VACUUM faster, or to enable partition-wise join. At present, we still depend on constraint exclusion for partitioning pruning, and the shape of the partition constraints for hash partitioning is such that that doesn't work. Work is underway to fix that, which should both improve performance and make partitioning pruning work with hash partitioning. Amul Sul, reviewed and tested by Dilip Kumar, Ashutosh Bapat, Yugo Nagata, Rajkumar Raghuwanshi, Jesper Pedersen, and by me. A few final tweaks also by me. Discussion: http://postgr.es/m/CAAJ_b96fhpJAP=ALbETmeLk1Uni_GFZD938zgenhF49qgDTjaQ@mail.gmail.com	2017-11-09 18:07:44 -05:00
Peter Eisentraut	2eb4a831e5	Change TRUE/FALSE to true/false The lower case spellings are C and C++ standard and are used in most parts of the PostgreSQL sources. The upper case spellings are only used in some files/modules. So standardize on the standard spellings. The APIs for ICU, Perl, and Windows define their own TRUE and FALSE, so those are left as is when using those APIs. In code comments, we use the lower-case spelling for the C concepts and keep the upper-case spelling for the SQL concepts. Reviewed-by: Michael Paquier <michael.paquier@gmail.com>	2017-11-08 11:37:28 -05:00
Noah Misch	bab3a714b6	Ignore CatalogSnapshot when checking COPY FREEZE prerequisites. This restores the ability, essentially lost in commit `ffaa44cb55`, to use COPY FREEZE under REPEATABLE READ isolation. Back-patch to 9.4, like that commit. Reviewed by Tom Lane. Discussion: https://postgr.es/m/CA+TgmoahWDm-7fperBxzU9uZ99LPMUmEpSXLTw9TmrOgzwnORw@mail.gmail.com	2017-11-05 09:25:52 -08:00
Alvaro Herrera	c6764eb3ae	Revert bogus fixes of HOT-freezing bug It turns out we misdiagnosed what the real problem was. Revert the previous changes, because they may have worse consequences going forward. A better fix is forthcoming. The simplistic test case is kept, though disabled. Discussion: https://postgr.es/m/20171102112019.33wb7g5wp4zpjelu@alap3.anarazel.de	2017-11-02 15:51:41 +01:00
Tom Lane	af20e2d728	Fix ALTER TABLE code to update domain constraints when needed. It's possible for dropping a column, or altering its type, to require changes in domain CHECK constraint expressions; but the code was previously only expecting to find dependent table CHECK constraints. Make the necessary adjustments. This is a fairly old oversight, but it's a lot easier to encounter the problem in the context of domains over composite types than it was before. Given the lack of field complaints, I'm not going to bother with a back-patch, though I'd be willing to reconsider that decision if someone does complain. Patch by me, reviewed by Michael Paquier Discussion: https://postgr.es/m/30656.1509128130@sss.pgh.pa.us	2017-11-01 13:32:23 -04:00
Tom Lane	37a795a60b	Support domains over composite types. This is the last major omission in our domains feature: you can now make a domain over anything that's not a pseudotype. The major complication from an implementation standpoint is that places that might be creating tuples of a domain type now need to be prepared to apply domain_check(). It seems better that unprepared code fail with an error like "<type> is not composite" than that it silently fail to apply domain constraints. Therefore, relevant infrastructure like get_func_result_type() and lookup_rowtype_tupdesc() has been adjusted to treat domain-over-composite as a distinct case that unprepared code won't recognize, rather than just transparently treating it the same as plain composite. This isn't a 100% solution to the possibility of overlooked domain checks, but it catches most places. In passing, improve typcache.c's support for domains (it can now cache the identity of a domain's base type), and rewrite the argument handling logic in jsonfuncs.c's populate_record[set]_worker to reduce duplicative per-call lookups. I believe this is code-complete so far as the core and contrib code go. The PLs need varying amounts of work, which will be tackled in followup patches. Discussion: https://postgr.es/m/4206.1499798337@sss.pgh.pa.us	2017-10-26 13:47:45 -04:00
Robert Haas	b55509332f	In relevant log messages, indicate whether vacuums are aggressive. Kyotaro Horiguchi, reviewed Masahiko Sawada, David G. Johnston, Álvaro Herrera, and me. Grammar correction to the final posted patch by me. Discussion: http://postgr.es/m/20170329.124649.193656100.horiguchi.kyotaro@lab.ntt.co.jp	2017-10-26 12:38:10 +02:00
Alvaro Herrera	60a1d96ed7	Rework DefineIndex relkind check Simplify coding using a switch rather than nested if tests. Author: Álvaro Reviewed-by: Robert Haas, Amit Langote, Michaël Paquier Discussion: https://postgr.es/m/20171013163820.pai7djcaxrntaxtn@alvherre.pgsql	2017-10-16 12:22:18 +02:00
Tom Lane	4de2d4fba3	Explicitly track whether aggregate final functions modify transition state. Up to now, there's been hard-wired assumptions that normal aggregates' final functions never modify their transition states, while ordered-set aggregates' final functions always do. This has always been a bit limiting, and in particular it's getting in the way of improving the built-in ordered-set aggregates to allow merging of transition states. Therefore, let's introduce catalog and CREATE AGGREGATE infrastructure that lets the finalfn's behavior be declared explicitly. There are now three possibilities for the finalfn behavior: it's purely read-only, it trashes the transition state irrecoverably, or it changes the state in such a way that no more transfn calls are possible but the state can still be passed to other, compatible finalfns. There are no examples of this third case today, but we'll shortly make the built-in OSAs act like that. This change allows user-defined aggregates to explicitly disclaim support for use as window functions, and/or to prevent transition state merging, if their implementations cannot handle that. While it was previously possible to handle the window case with a run-time error check, there was not any way to prevent transition state merging, which in retrospect is something commit `804163bc2` should have provided for. But better late than never. In passing, split out pg_aggregate.c's extern function declarations into a new header file pg_aggregate_fn.h, similarly to what we've done for some other catalog headers, so that pg_aggregate.h itself can be safe for frontend files to include. This lets pg_dump use the symbolic names for relevant constants. Discussion: https://postgr.es/m/4834.1507849699@sss.pgh.pa.us	2017-10-14 15:21:39 -04:00
Robert Haas	60f7c0abef	Use ResultRelInfo ** rather than ResultRelInfo * for tuple routing. The previous convention doesn't lend itself to creating ResultRelInfos lazily, as we already do in ExecGetTriggerResultRel. This patch doesn't make anything lazier than before, but the pending patch for UPDATE tuple routing proposes to do so (and there might be other opportunities as well). Amit Khandekar with some adjustments by me. Discussion: http://postgr.es/m/CA+TgmoYPVP9Lyf6vUFA5DwxS4c--x6LOj2y36BsJaYtp62eXPQ@mail.gmail.com	2017-10-12 16:50:53 -04:00
Andres Freund	31079a4a8e	Replace remaining uses of pq_sendint with pq_sendint{8,16,32}. pq_sendint() remains, so extension code doesn't unnecessarily break. Author: Andres Freund Discussion: https://postgr.es/m/20170914063418.sckdzgjfrsbekae4@alap3.anarazel.de	2017-10-11 21:00:46 -07:00
Tom Lane	5fa6b0d102	Remove unnecessary PG_TRY overhead for CurrentResourceOwner changes. resowner/README contained advice to use a PG_TRY block to restore the old CurrentResourceOwner value anywhere that that variable is transiently changed. That advice was only inconsistently followed, however, and on reflection it seems like unnecessary overhead. We don't bother with such a convention for transient CurrentMemoryContext changes, on the grounds that any (sub)transaction abort will start out by resetting CurrentMemoryContext to what it wants. But the same is true of CurrentResourceOwner, so there seems no need to treat it differently. Hence, remove PG_TRY blocks that exist only to restore CurrentResourceOwner before re-throwing the error. There are a couple of places that restore it along with some other actions, and I left those alone; the restore is probably unnecessary but no noticeable gain will result from removing it. Discussion: https://postgr.es/m/5236.1507583529@sss.pgh.pa.us	2017-10-11 17:44:09 -04:00
Tom Lane	118e99c3d7	Fix low-probability loss of NOTIFY messages due to XID wraparound. Up to now async.c has used TransactionIdIsInProgress() to detect whether a notify message's source transaction is still running. However, that function has a quick-exit path that reports that XIDs before RecentXmin are no longer running. If a listening backend is doing nothing but listening, and not running any queries, there is nothing that will advance its value of RecentXmin. Once 2 billion transactions elapse, the RecentXmin check causes active transactions to be reported as not running. If they aren't committed yet according to CLOG, async.c decides they aborted and discards their messages. The timing for that is a bit tight but it can happen when multiple backends are sending notifies concurrently. The net symptom therefore is that a sufficiently-long-surviving listen-only backend starts to miss some fraction of NOTIFY traffic, but only under heavy load. The only function that updates RecentXmin is GetSnapshotData(). A brute-force fix would therefore be to take a snapshot before processing incoming notify messages. But that would add cycles, as well as contention for the ProcArrayLock. We can be smarter: having taken the snapshot, let's use that to check for running XIDs, and not call TransactionIdIsInProgress() at all. In this way we reduce the number of ProcArrayLock acquisitions from one per message to one per notify interrupt; that's the same under light load but should be a benefit under heavy load. Light testing says that this change is a wash performance-wise for normal loads. I looked around for other callers of TransactionIdIsInProgress() that might be at similar risk, and didn't find any; all of them are inside transactions that presumably have already taken a snapshot. Problem report and diagnosis by Marko Tiikkaja, patch by me. Back-patch to all supported branches, since it's been like this since 9.0. Discussion: https://postgr.es/m/20170926182935.14128.65278@wrigleys.postgresql.org	2017-10-11 14:28:33 -04:00
Robert Haas	14f67a8ee2	On attach, consider skipping validation of subpartitions individually. If the table attached as a partition is itself partitioned, individual partitions might have constraints strong enough to skip scanning the table even if the table actually attached does not. This is pretty cheap to check, and possibly a big win if it works out. Amit Langote, with test case changes by me. Discussion: http://postgr.es/m/1f08b844-0078-aa8d-452e-7af3bf77d05f@lab.ntt.co.jp	2017-10-05 13:06:46 -04:00
Robert Haas	c31e9d4baf	Improve error message when skipping scan of default partition. It seems like a good idea to clearly distinguish between skipping the scan of the new partition itself and skipping the scan of the default partition. Amit Langote Discussion: http://postgr.es/m/1f08b844-0078-aa8d-452e-7af3bf77d05f@lab.ntt.co.jp	2017-10-05 12:19:40 -04:00
Robert Haas	e9baa5e9fa	Allow DML commands that create tables to use parallel query. Haribabu Kommi, reviewed by Dilip Kumar and Rafia Sabih. Various cosmetic changes by me to explain why this appears to be safe but allowing inserts in parallel mode in general wouldn't be. Also, I removed the REFRESH MATERIALIZED VIEW case from Haribabu's patch, since I'm not convinced that case is OK, and hacked on the documentation somewhat. Discussion: http://postgr.es/m/CAJrrPGdo5bak6qnPWe8Kpi8g_jfQEs-G4SYmG9y+OFaw2-dPvA@mail.gmail.com	2017-10-05 11:40:48 -04:00
Tom Lane	4d85c2900b	Improve comments in vacuum_rel() and analyze_rel(). Remove obsolete references to get_rel_oids(). Avoid listing specific relkinds in the comments, since we seem unable to keep such things in sync with the code, and it's not all that helpful anyhow. Noted by Michael Paquier, though I rewrote the comments a bit more. Discussion: https://postgr.es/m/CAB7nPqTWiN9zwKTaOrsnKiGDChqRt7C1+CiiDk4N4OMn92rs6A@mail.gmail.com	2017-10-05 10:47:47 -04:00
Tom Lane	11d8d72c27	Allow multiple tables to be specified in one VACUUM or ANALYZE command. Not much to say about this; does what it says on the tin. However, formerly, if there was a column list then the ANALYZE action was implied; now it must be specified, or you get an error. This is because it would otherwise be a bit unclear what the user meant if some tables have column lists and some don't. Nathan Bossart, reviewed by Michael Paquier and Masahiko Sawada, with some editorialization by me Discussion: https://postgr.es/m/E061A8E3-5E3D-494D-94F0-E8A9B312BBFC@amazon.com	2017-10-03 18:53:44 -04:00
Andres Freund	0ba99c84e8	Replace most usages of ntoh[ls] and hton[sl] with pg_bswap.h. All postgres internal usages are replaced, it's just libpq example usages that haven't been converted. External users of libpq can't generally rely on including postgres internal headers. Note that this includes replacing open-coded byte swapping of 64bit integers (using two 32 bit swaps) with a single 64bit swap. Where it looked applicable, I have removed netinet/in.h and arpa/inet.h usage, which previously provided the relevant functionality. It's perfectly possible that I missed other reasons for including those, the buildfarm will tell. Author: Andres Freund Discussion: https://postgr.es/m/20170927172019.gheidqy6xvlxb325@alap3.anarazel.de	2017-10-01 15:36:14 -07:00
Tom Lane	c12d570fa1	Support arrays over domains. Allowing arrays with a domain type as their element type was left un-done in the original domain patch, but not for any very good reason. This omission leads to such surprising results as array_agg() not working on a domain column, because the parser can't identify a suitable output type for the polymorphic aggregate. In order to fix this, first clean up the APIs of coerce_to_domain() and some internal functions in parse_coerce.c so that we consistently pass around a CoercionContext along with CoercionForm. Previously, we sometimes passed an "isExplicit" boolean flag instead, which is strictly less information; and coerce_to_domain() didn't even get that, but instead had to reverse-engineer isExplicit from CoercionForm. That's contrary to the documentation in primnodes.h that says that CoercionForm only affects display and not semantics. I don't think this change fixes any live bugs, but it makes things more consistent. The main reason for doing it though is that now build_coercion_expression() receives ccontext, which it needs in order to be able to recursively invoke coerce_to_target_type(). Next, reimplement ArrayCoerceExpr so that the node does not directly know any details of what has to be done to the individual array elements while performing the array coercion. Instead, the per-element processing is represented by a sub-expression whose input is a source array element and whose output is a target array element. This simplifies life in parse_coerce.c, because it can build that sub-expression by a recursive invocation of coerce_to_target_type(). The executor now handles the per-element processing as a compiled expression instead of hard-wired code. The main advantage of this is that we can use a single ArrayCoerceExpr to handle as many as three successive steps per element: base type conversion, typmod coercion, and domain constraint checking. The old code used two stacked ArrayCoerceExprs to handle type + typmod coercion, which was pretty inefficient, and adding yet another array deconstruction to do domain constraint checking seemed very unappetizing. In the case where we just need a single, very simple coercion function, doing this straightforwardly leads to a noticeable increase in the per-array-element runtime cost. Hence, add an additional shortcut evalfunc in execExprInterp.c that skips unnecessary overhead for that specific form of expression. The runtime speed of simple cases is within 1% or so of where it was before, while cases that previously required two levels of array processing are significantly faster. Finally, create an implicit array type for every domain type, as we do for base types, enums, etc. Everything except the array-coercion case seems to just work without further effort. Tom Lane, reviewed by Andrew Dunstan Discussion: https://postgr.es/m/9852.1499791473@sss.pgh.pa.us	2017-09-30 13:40:56 -04:00
Tom Lane	19de0ab23c	Fix inadequate locking during get_rel_oids(). get_rel_oids used to not take any relation locks at all, but that stopped being a good idea with commit `3c3bb9933`, which inserted a syscache lookup into the function. A concurrent DROP TABLE could now produce "cache lookup failed", which we don't want to have happen in normal operation. The best solution seems to be to transiently take a lock on the relation named by the RangeVar (which also makes the result of RangeVarGetRelid a lot less spongy). But we shouldn't hold the lock beyond this function, because we don't want VACUUM to lock more than one table at a time. (That would not be a big problem right now, but it will become one after the pending feature patch to allow multiple tables to be named in VACUUM.) In passing, adjust vacuum_rel and analyze_rel to document that we don't trust the passed RangeVar to be accurate, and allow the RangeVar to possibly be NULL --- which it is anyway for a whole-database VACUUM, though we accidentally didn't crash for that case. The passed RangeVar is in fact inaccurate when dealing with a child partition, as of v10, and it has been wrong for a whole long time in the case of vacuum_rel() recursing to a TOAST table. None of these things present visible bugs up to now, because the passed RangeVar is in fact only consulted for autovacuum logging, and in that particular context it's always accurate because autovacuum doesn't let vacuum.c expand partitions nor recurse to toast tables. Still, this seems like trouble waiting to happen, so let's nail the door at least partly shut. (Further cleanup is planned, in HEAD only, as part of the pending feature patch.) Fix some sadly inaccurate/obsolete comments too. Back-patch to v10. Michael Paquier and Tom Lane Discussion: https://postgr.es/m/25023.1506107590@sss.pgh.pa.us	2017-09-29 16:26:31 -04:00
Alvaro Herrera	20b6552242	Fix freezing of a dead HOT-updated tuple Vacuum calls page-level HOT prune to remove dead HOT tuples before doing liveness checks (HeapTupleSatisfiesVacuum) on the remaining tuples. But concurrent transaction commit/abort may turn DEAD some of the HOT tuples that survived the prune, before HeapTupleSatisfiesVacuum tests them. This happens to activate the code that decides to freeze the tuple ... which resuscitates it, duplicating data. (This is especially bad if there's any unique constraints, because those are now internally violated due to the duplicate entries, though you won't know until you try to REINDEX or dump/restore the table.) One possible fix would be to simply skip doing anything to the tuple, and hope that the next HOT prune would remove it. But there is a problem: if the tuple is older than freeze horizon, this would leave an unfrozen XID behind, and if no HOT prune happens to clean it up before the containing pg_clog segment is truncated away, it'd later cause an error when the XID is looked up. Fix the problem by having the tuple freezing routines cope with the situation: don't freeze the tuple (and keep it dead). In the cases that the XID is older than the freeze age, set the HEAP_XMAX_COMMITTED flag so that there is no need to look up the XID in pg_clog later on. An isolation test is included, authored by Michael Paquier, loosely based on Daniel Wood's original reproducer. It only tests one particular scenario, though, not all the possible ways for this problem to surface; it be good to have a more reliable way to test this more fully, but it'd require more work. In message https://postgr.es/m/20170911140103.5akxptyrwgpc25bw@alvherre.pgsql I outlined another test case (more closely matching Dan Wood's) that exposed a few more ways for the problem to occur. Backpatch all the way back to 9.3, where this problem was introduced by multixact juggling. In branches 9.3 and 9.4, this includes a backpatch of commit e5ff9fefcd50 (of 9.5 era), since the original is not correctable without matching the coding pattern in 9.5 up. Reported-by: Daniel Wood Diagnosed-by: Daniel Wood Reviewed-by: Yi Wen Wong, Michaël Paquier Discussion: https://postgr.es/m/E5711E62-8FDF-4DCA-A888-C200BF6B5742@amazon.com	2017-09-28 16:44:01 +02:00
Tom Lane	28e0727076	Revert to 9.6 treatment of ALTER TYPE enumtype ADD VALUE. This reverts commit `15bc038f9`, along with the followon commits `1635e80d3` and `984c92074` that tried to clean up the problems exposed by bug #14825. The result was incomplete because it failed to address parallel-query requirements. With 10.0 release so close upon us, now does not seem like the time to be adding more code to fix that. I hope we can un-revert this code and add the missing parallel query support during the v11 cycle. Back-patch to v10. Discussion: https://postgr.es/m/20170922185904.1448.16585@wrigleys.postgresql.org	2017-09-27 16:14:43 -04:00
Tom Lane	9a50a93c7b	Improve wording of error message added in commit `714805010`. Per suggestions from Peter Eisentraut and David Johnston. Back-patch, like the previous commit. Discussion: https://postgr.es/m/E1dv9jI-0006oT-Fn@gemulon.postgresql.org	2017-09-26 15:25:56 -04:00
Tom Lane	24541ffd78	... and the very same bug in publicationListToArray(). Sigh.	2017-09-23 15:16:48 -04:00
Peter Eisentraut	aa6b7b72d9	Fix saving and restoring umask In two cases, we set a different umask for some piece of code and restore it afterwards. But if the contained code errors out, the umask is not restored. So add TRY/CATCH blocks to fix that.	2017-09-22 17:10:36 -04:00
Robert Haas	6a2fa09c0c	For wal_consistency_checking, mask page checksum as well as page LSN. If the LSN is different, the checksum will be different, too. Ashwin Agrawal, reviewed by Michael Paquier and Kuntal Ghosh Discussion: http://postgr.es/m/CALfoeis5iqrAU-+JAN+ZzXkpPr7+-0OAGv7QUHwFn=-wDy4o4Q@mail.gmail.com	2017-09-22 14:28:22 -04:00
Tom Lane	7148050105	Give a better error for duplicate entries in VACUUM/ANALYZE column list. Previously, the code didn't think about this case and would just try to analyze such a column twice. That would fail at the point of inserting the second version of the pg_statistic row, with obscure error messsages like "duplicate key value violates unique constraint" or "tuple already updated by self", depending on context and PG version. We could allow the case by ignoring duplicate column specifications, but it seems better to reject it explicitly. The bogus error messages seem like arguably a bug, so back-patch to all supported versions. Nathan Bossart, per a report from Michael Paquier, and whacked around a bit by me. Discussion: https://postgr.es/m/E061A8E3-5E3D-494D-94F0-E8A9B312BBFC@amazon.com	2017-09-21 18:13:11 -04:00
Tom Lane	66917bfaa7	Make ExplainOpenGroup and ExplainCloseGroup public. Extensions with custom plan nodes might like to use these in their EXPLAIN output. Hadi Moshayedi Discussion: https://postgr.es/m/CA+_kT_dU-rHCN0u6pjA6bN5CZniMfD=-wVqPY4QLrKUY_uJq5w@mail.gmail.com	2017-09-18 16:01:16 -04:00
Peter Eisentraut	8edacab209	Fix DROP SUBSCRIPTION hang When ALTER SUBSCRIPTION DISABLE is run in the same transaction before DROP SUBSCRIPTION, the latter will hang because workers will still be running, not having seen the DISABLE committed, and DROP SUBSCRIPTION will wait until the workers have vacated the replication origin slots. Previously, DROP SUBSCRIPTION killed the logical replication workers immediately only if it was going to drop the replication slot, otherwise it scheduled the worker killing for the end of the transaction, as a result of `7e174fa793`. This, however, causes the present problem. To fix, kill the workers immediately in all cases. This covers all cases: A subscription that doesn't have a replication slot must be disabled. It was either disabled in the same transaction, or it was already disabled before the current transaction, but then there shouldn't be any workers left and this won't make a difference. Reported-by: Arseny Sher <a.sher@postgrespro.ru> Discussion: https://www.postgresql.org/message-id/flat/87mv6av84w.fsf%40ars-thinkpad	2017-09-17 22:00:23 -04:00
Tom Lane	27c6619e9c	Fix possible dangling pointer dereference in trigger.c. AfterTriggerEndQuery correctly notes that the query_stack could get repalloc'd during a trigger firing, but it nonetheless passes the address of a query_stack entry to afterTriggerInvokeEvents, so that if such a repalloc occurs, afterTriggerInvokeEvents is already working with an obsolete dangling pointer while it scans the rest of the events. Oops. The only code at risk is its "delete_ok" cleanup code, so we can prevent unsafe behavior by passing delete_ok = false instead of true. However, that could have a significant performance penalty, because the point of passing delete_ok = true is to not have to re-scan possibly a large number of dead trigger events on the next time through the loop. There's more than one way to skin that cat, though. What we can do is delete all the "chunks" in the event list except the last one, since we know all events in them must be dead. Deleting the chunks is work we'd have had to do later in AfterTriggerEndQuery anyway, and it ends up saving rescanning of just about the same events we'd have gotten rid of with delete_ok = true. In v10 and HEAD, we also have to be careful to mop up any per-table after_trig_events pointers that would become dangling. This is slightly annoying, but I don't think that normal use-cases will traverse this code path often enough for it to be a performance problem. It's pretty hard to hit this in practice because of the unlikelihood of the query_stack getting resized at just the wrong time. Nonetheless, it's definitely a live bug of ancient standing, so back-patch to all supported branches. Discussion: https://postgr.es/m/2891.1505419542@sss.pgh.pa.us	2017-09-17 14:50:01 -04:00
Tom Lane	fd31f9f033	Ensure that BEFORE STATEMENT triggers fire the right number of times. Commit `0f79440fb` introduced mechanism to keep AFTER STATEMENT triggers from firing more than once per statement, which was formerly possible if more than one FK enforcement action had to be applied to a given table. Add a similar mechanism for BEFORE STATEMENT triggers, so that we don't have the unexpected situation of firing BEFORE STATEMENT triggers more often than AFTER STATEMENT. As with the previous patch, back-patch to v10. Discussion: https://postgr.es/m/22315.1505584992@sss.pgh.pa.us	2017-09-17 12:16:38 -04:00
Tom Lane	0f79440fb0	Fix SQL-spec incompatibilities in new transition table feature. The standard says that all changes of the same kind (insert, update, or delete) caused in one table by a single SQL statement should be reported in a single transition table; and by that, they mean to include foreign key enforcement actions cascading from the statement's direct effects. It's also reasonable to conclude that if the standard had wCTEs, they would say that effects of wCTEs applying to the same table as each other or the outer statement should be merged into one transition table. We weren't doing it like that. Hence, arrange to merge tuples from multiple update actions into a single transition table as much as we can. There is a problem, which is that if the firing of FK enforcement triggers and after-row triggers with transition tables is interspersed, we might need to report more tuples after some triggers have already seen the transition table. It seems like a bad idea for the transition table to be mutable between trigger calls. There's no good way around this without a major redesign of the FK logic, so for now, resolve it by opening a new transition table each time this happens. Also, ensure that AFTER STATEMENT triggers fire just once per statement, or once per transition table when we're forced to make more than one. Previous versions of Postgres have allowed each FK enforcement query to cause an additional firing of the AFTER STATEMENT triggers for the referencing table, but that's certainly not per spec. (We're still doing multiple firings of BEFORE STATEMENT triggers, though; is that something worth changing?) Also, forbid using transition tables with column-specific UPDATE triggers. The spec requires such transition tables to show only the tuples for which the UPDATE trigger would have fired, which means maintaining multiple transition tables or else somehow filtering the contents at readout. Maybe someday we'll bother to support that option, but it looks like a lot of trouble for a marginal feature. The transition tables are now managed by the AfterTriggers data structures, rather than being directly the responsibility of ModifyTable nodes. This removes a subtransaction-lifespan memory leak introduced by my previous band-aid patch `3c4359521`. In passing, refactor the AfterTriggers data structures to reduce the management overhead for them, by using arrays of structs rather than several parallel arrays for per-query-level and per-subtransaction state. I failed to resist the temptation to do some copy-editing on the SGML docs about triggers, above and beyond merely documenting the effects of this patch. Back-patch to v10, because we don't want the semantics of transition tables to change post-release. Patch by me, with help and review from Thomas Munro. Discussion: https://postgr.es/m/20170909064853.25630.12825@wrigleys.postgresql.org	2017-09-16 13:20:36 -04:00
Robert Haas	42651bdd68	Fix inconsistent capitalization. Amit Langote Discussion: http://postgr.es/m/a83a0899-19f5-594c-9aac-3ba0f16989a1@lab.ntt.co.jp	2017-09-14 11:11:12 -04:00
Peter Eisentraut	821fb8cdbf	Message style fixes	2017-09-11 11:21:27 -04:00
Tom Lane	3c43595217	Quick-hack fix for foreign key cascade vs triggers with transition tables. AFTER triggers using transition tables crashed if they were fired due to a foreign key ON CASCADE update. This is because ExecEndModifyTable flushes the transition tables, on the assumption that any trigger that could need them was already fired during ExecutorFinish. Normally that's true, because we don't allow transition-table-using triggers to be deferred. However, foreign key CASCADE updates force any triggers on the referencing table to be deferred to the outer query level, by means of the EXEC_FLAG_SKIP_TRIGGERS flag. I don't recall all the details of why it's like that and am pretty loath to redesign it right now. Instead, just teach ExecEndModifyTable to skip destroying the TransitionCaptureState when that flag is set. This will allow the transition table data to survive until end of the current subtransaction. This isn't a terribly satisfactory solution, because (1) we might be leaking the transition tables for much longer than really necessary, and (2) as things stand, an AFTER STATEMENT trigger will fire once per RI updating query, ie once per row updated or deleted in the referenced table. I suspect that is not per SQL spec. But redesigning this is a research project that we're certainly not going to get done for v10. So let's go with this hackish answer for now. In passing, tweak AfterTriggerSaveEvent to not save the transition_capture pointer into the event record for a deferrable trigger. This is not necessary to fix the current bug, but it avoids letting dangling pointers to long-gone transition tables persist in the trigger event queue. That's at least a safety feature. It might also allow merging shared trigger states in more cases than before. I added a regression test that demonstrates the crash on unpatched code, and also exposes the behavior of firing the AFTER STATEMENT triggers once per row update. Per bug #14808 from Philippe Beaudoin. Back-patch to v10. Discussion: https://postgr.es/m/20170909064853.25630.12825@wrigleys.postgresql.org	2017-09-10 14:59:56 -04:00
Robert Haas	6f6b99d133	Allow a partitioned table to have a default partition. Any tuples that don't route to any other partition will route to the default partition. Jeevan Ladhe, Beena Emerson, Ashutosh Bapat, Rahila Syed, and Robert Haas, with review and testing at various stages by (at least) Rushabh Lathia, Keith Fiske, Amit Langote, Amul Sul, Rajkumar Raghuanshi, Sven Kunze, Kyotaro Horiguchi, Thom Brown, Rafia Sabih, and Dilip Kumar. Discussion: http://postgr.es/m/CAH2L28tbN4SYyhS7YV1YBWcitkqbhSWfQCy0G=apRcC_PEO-bg@mail.gmail.com Discussion: http://postgr.es/m/CAOG9ApEYj34fWMcvBMBQ-YtqR9fTdXhdN82QEKG0SVZ6zeL1xg@mail.gmail.com	2017-09-08 17:28:04 -04:00
Peter Eisentraut	1356f78ea9	Reduce excessive dereferencing of function pointers It is equivalent in ANSI C to write (*funcptr) () and funcptr(). These two styles have been applied inconsistently. After discussion, we'll use the more verbose style for plain function pointer variables, to make it clear that it's a variable, and the shorter style when the function pointer is in a struct (s.func() or s->func()), because then it's clear that it's not a plain function name, and otherwise the excessive punctuation makes some of those invocations hard to read. Discussion: https://www.postgresql.org/message-id/f52c16db-14ed-757d-4b48-7ef360b1631d@2ndquadrant.com	2017-09-07 13:56:09 -04:00
Simon Riggs	5b6d13eec7	Allow SET STATISTICS on expression indexes Index columns are referenced by ordinal number rather than name, e.g. CREATE INDEX coord_idx ON measured (x, y, (z + t)); ALTER INDEX coord_idx ALTER COLUMN 3 SET STATISTICS 1000; Incompatibility note for release notes: \d+ for indexes now also displays Stats Target Authors: Alexander Korotkov, with contribution by Adrien NAYRAT Review: Adrien NAYRAT, Simon Riggs Wordsmith: Simon Riggs	2017-09-06 13:46:01 -07:00
Alvaro Herrera	be7161566d	Add a WAIT option to DROP_REPLICATION_SLOT Commit `9915de6c1c` changed the default behavior of DROP_REPLICATION_SLOT so that it would wait until any session holding the slot active would release it, instead of raising an error. But users are already depending on the original behavior, so revert to it by default and add a WAIT option to invoke the new behavior. Per complaint from Simone Gotti, in Discussion: https://postgr.es/m/CAEvsy6Wgdf90O6pUvg2wSVXL2omH5OPC-38OD4Zzgk-FXavj3Q@mail.gmail.com	2017-09-01 13:44:14 +02:00
Robert Haas	81c5e46c49	Introduce 64-bit hash functions with a 64-bit seed. This will be useful for hash partitioning, which needs a way to seed the hash functions to avoid problems such as a hash index on a hash partitioned table clumping all values into a small portion of the bucket space; it's also useful for anything that wants a 64-bit hash value rather than a 32-bit hash value. Just in case somebody wants a 64-bit hash value that is compatible with the existing 32-bit hash values, make the low 32-bits of the 64-bit hash value match the 32-bit hash value when the seed is 0. Robert Haas and Amul Sul Discussion: http://postgr.es/m/CA+Tgmoafx2yoJuhCQQOL5CocEi-w_uG4S2xT0EtgiJnPGcHW3g@mail.gmail.com	2017-08-31 22:21:21 -04:00
Robert Haas	bf11e7ee2e	Propagate sort instrumentation from workers back to leader. Up until now, when parallel query was used, no details about the sort method or space used by the workers were available; details were shown only for any sorting done by the leader. Fix that. Commit `1177ab1dab` forced the test case added by commit `1f6d515a67` to run without parallelism; now that we have this infrastructure, allow that again, with a little tweaking to make it pass with and without force_parallel_mode. Robert Haas and Tom Lane Discussion: http://postgr.es/m/CA+Tgmoa2VBZW6S8AAXfhpHczb=Rf6RqQ2br+zJvEgwJ0uoD_tQ@mail.gmail.com	2017-08-29 13:26:33 -04:00
Peter Eisentraut	580ddcec39	Fix translation marker This was erroneously removed in `55a70a023c`.	2017-08-23 09:56:38 -04:00
Peter Eisentraut	2bfd1b1ee5	Don't install ICU collation keyword variants Users can still create them themselves. Instead, document Unicode TR 35 collation options for ICU, so users can create all this themselves. Reviewed-by: Peter Geoghegan <pg@bowt.ie>	2017-08-21 19:21:07 -04:00
Peter Eisentraut	51e225da30	Expand set of predefined ICU locales Install language+region combinations even if they are not distinct from the language's base locale. This gives better long-term stability of the set of predefined locales and makes the predefined locales less implementation-dependent and more practical for users. Reviewed-by: Peter Geoghegan <pg@bowt.ie>	2017-08-21 19:21:07 -04:00
Andres Freund	2cd7084524	Change tupledesc->attrs[n] to TupleDescAttr(tupledesc, n). This is a mechanical change in preparation for a later commit that will change the layout of TupleDesc. Introducing a macro to abstract the details of where attributes are stored will allow us to change that in separate step and revise it in future. Author: Thomas Munro, editorialized by Andres Freund Reviewed-By: Andres Freund Discussion: https://postgr.es/m/CAEepm=0ZtQ-SpsgCyzzYpsXS6e=kZWqk3g5Ygn3MDV7A8dabUA@mail.gmail.com	2017-08-20 11:19:07 -07:00
Peter Eisentraut	24620fc52b	Fix creation of ICU comments for keyword variants It would create the comment referring to the keyword-less parent locale. This was broken in `ddb5fdc068`.	2017-08-18 23:02:28 -04:00
Robert Haas	c4b841ba6a	Fix interaction of triggers, partitioning, and EXPLAIN ANALYZE. Add a new EState member es_leaf_result_relations, so that the trigger code knows about ResultRelInfos created by tuple routing. Also make sure ExplainPrintTriggers knows about partition-related ResultRelInfos. Etsuro Fujita, reviewed by Amit Langote Discussion: http://postgr.es/m/57163e18-8e56-da83-337a-22f2c0008051@lab.ntt.co.jp	2017-08-18 13:01:05 -04:00
Robert Haas	ecfe59e50f	Refactor validation of new partitions a little bit. Move some logic that is currently in ATExecAttachPartition to separate functions to facilitate future code reuse. Ashutosh Bapat and Jeevan Ladhe Discussion: http://postgr.es/m/CA+Tgmobbnamyvii0pRdg9pp_jLHSUvq7u5SiRrVV0tEFFU58Tg@mail.gmail.com	2017-08-17 14:49:45 -04:00
Peter Eisentraut	77d05706be	Fix up some misusage of appendStringInfo() and friends Change to appendStringInfoChar() or appendStringInfoString() where those can be used. Author: David Rowley <david.rowley@2ndquadrant.com> Reviewed-by: Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>	2017-08-15 23:34:39 -04:00
Tom Lane	21d304dfed	Final pgindent + perltidy run for v10.	2017-08-14 17:29:33 -04:00
Peter Eisentraut	7f1bb1d734	Fix typo Author: Masahiko Sawada <sawada.mshk@gmail.com>	2017-08-14 13:53:05 -04:00
Peter Eisentraut	a1ef920e27	Remove uses of "slave" in replication contexts This affects mostly code comments, some documentation, and tests. Official APIs already used "standby".	2017-08-10 22:55:41 -04:00
Tom Lane	749c7c4170	Fix handling of container types in find_composite_type_dependencies. find_composite_type_dependencies correctly found columns that are of the specified type, and columns that are of arrays of that type, but not columns that are domains or ranges over the given type, its array type, etc. The most general way to handle this seems to be to assume that any type that is directly dependent on the specified type can be treated as a container type, and processed recursively (allowing us to handle nested cases such as ranges over domains over arrays ...). Since a type's array type already has such a dependency, we can drop the existing special case for the array type. The very similar logic in get_rels_with_domain was likewise a few bricks shy of a load, as it supposed that a directly dependent type could only be a sub-domain. This is already wrong for ranges over domains, and it'll someday be wrong for arrays over domains. Add test cases illustrating the problems, and back-patch to all supported branches. Discussion: https://postgr.es/m/15268.1502309024@sss.pgh.pa.us	2017-08-09 17:03:09 -04:00
Alvaro Herrera	b2c95a3798	Fix replication origin-related race conditions Similar to what was fixed in commit `9915de6c1c` for replication slots, but this time it's related to replication origins: DROP SUBSCRIPTION attempts to drop the replication origin, but that fails if the replication worker process hasn't yet marked it unused. This causes failures in the buildfarm: ERROR: could not drop replication origin with OID 1, in use by PID 34069 Like the aforementioned commit, fix by having the process running DROP SUBSCRIPTION sleep until the worker marks the the replication origin struct as free. This uses a condition variable on each replication origin shmem state struct, so that the session trying to drop can sleep and expect to be awakened by the process keeping the origin open. Also fix a SGML markup in the previous commit. Discussion: https://postgr.es/m/20170808001433.rozlseaf4m2wkw3n@alvherre.pgsql	2017-08-08 16:07:46 -04:00
Heikki Linnakangas	bf6b9e9444	Don't allow logging in with empty password. Some authentication methods allowed it, others did not. In the client-side, libpq does not even try to authenticate with an empty password, which makes using empty passwords hazardous: an administrator might think that an account with an empty password cannot be used to log in, because psql doesn't allow it, and not realize that a different client would in fact allow it. To clear that confusion and to be be consistent, disallow empty passwords in all authentication methods. All the authentication methods that used plaintext authentication over the wire, except for BSD authentication, already checked that the password received from the user was not empty. To avoid forgetting it in the future again, move the check to the recv_password_packet function. That only forbids using an empty password with plaintext authentication, however. MD5 and SCRAM need a different fix: * In stable branches, check that the MD5 hash stored for the user does not not correspond to an empty string. This adds some overhead to MD5 authentication, because the server needs to compute an extra MD5 hash, but it is not noticeable in practice. * In HEAD, modify CREATE and ALTER ROLE to clear the password if an empty string, or a password hash that corresponds to an empty string, is specified. The user-visible behavior is the same as in the stable branches, the user cannot log in, but it seems better to stop the empty password from entering the system in the first place. Secondly, it is fairly expensive to check that a SCRAM hash doesn't correspond to an empty string, because computing a SCRAM hash is much more expensive than an MD5 hash by design, so better avoid doing that on every authentication. We could clear the password on CREATE/ALTER ROLE also in stable branches, but we would still need to check at authentication time, because even if we prevent empty passwords from being stored in pg_authid, there might be existing ones there already. Reported by Jeroen van der Ham, Ben de Graaff and Jelte Fennema. Security: CVE-2017-7546	2017-08-07 17:03:42 +03:00
Peter Eisentraut	86524f0387	Fix function name in code comment Reported-by: Peter Geoghegan <pg@bowt.ie>	2017-08-07 09:49:55 -04:00
Peter Eisentraut	ad2ca3cba6	Improve wording of subscription refresh debug messages Reported-by: Yugo Nagata <nagata@sraoss.co.jp>	2017-08-07 09:40:12 -04:00
Peter Eisentraut	6f81306e4d	Downgrade subscription refresh messages to DEBUG1 The NOTICE messages about tables being added or removed during subscription refresh would be incorrect and possibly confusing if the transaction rolls back, so silence them but keep them available for debugging. Discussion: https://www.postgresql.org/message-id/CAD21AoAvaXizc2h7aiNyK_i0FQSa-tmhpdOGwbhh7Jy544Ad4Q%40mail.gmail.com	2017-08-07 09:16:03 -04:00
Tom Lane	e9f4ac1389	Suppress unused-variable warnings when building with ICU 4.2. Tidy-up for commit `eccead9ed`.	2017-08-05 11:48:43 -04:00
Peter Eisentraut	eccead9ed4	Add support for ICU 4.2 Supporting ICU 4.2 seems useful because it ships with CentOS 6. Versions before ICU 4.6 don't support pkg-config, so document an installation method without using pkg-config. In ICU 4.2, ucol_getKeywordsForLocale() sometimes returns values that will not be accepted by uloc_toLanguageTag(). Skip loading keyword variants in that version. Reported-by: Victor Wagner <vitus@wagner.pp.ru>	2017-08-05 09:32:42 -04:00
Robert Haas	f85f88bcc2	Fix bug in deciding whether to scan newly-attached partition. If the table being attached had different attribute numbers than the parent, the old code could incorrectly decide it needed to be scanned. Amit Langote, reviewed by Ashutosh Bapat Discussion: http://postgr.es/m/CA+TgmobexgbBr2+Utw-pOMw9uxaBRKRjMW_-mmzKKx9PejPLMg@mail.gmail.com	2017-08-04 22:01:37 -04:00
Peter Eisentraut	7e174fa793	Only kill sync workers at commit time in subscription DDL This allows a transaction abort to avoid killing those workers. Author: Petr Jelinek <petr.jelinek@2ndquadrant.com>	2017-08-04 21:17:47 -04:00
Robert Haas	972b6ec20b	Fix lock upgrade hazard in ATExecAttachPartition. Amit Langote Discussion: http://postgr.es/m/CAFjFpReT_kq_uwU_B8aWDxR7jNGE=P0iELycdq5oupi=xSQTOw@mail.gmail.com	2017-08-03 14:21:00 -04:00
Robert Haas	583df3b5c5	Code beautification for ATExecAttachPartition. Amit Langote Discussion: http://postgr.es/m/CAFjFpReT_kq_uwU_B8aWDxR7jNGE=P0iELycdq5oupi=xSQTOw@mail.gmail.com	2017-08-03 14:19:59 -04:00
Robert Haas	610e8ebb0f	Teach map_partition_varattnos to handle whole-row expressions. Otherwise, partitioned tables with RETURNING expressions or subject to a WITH CHECK OPTION do not work properly. Amit Langote, reviewed by Amit Khandekar and Etsuro Fujita. A few comment changes by me. Discussion: http://postgr.es/m/9a39df80-871e-6212-0684-f93c83be4097@lab.ntt.co.jp	2017-08-03 11:21:29 -04:00
Tom Lane	f97256570f	Allow creation of C/POSIX collations without depending on libc behavior. Most of our collations code has special handling for the locale names "C" and "POSIX", allowing those collations to be used whether or not the system libraries think those locale names are valid, or indeed whether said libraries even have any locale support. But we missed handling things that way in CREATE COLLATION. This meant you couldn't clone the C/POSIX collations, nor explicitly define a new collation using those locale names, unless the libraries allow it. That's pretty pointless, as well as being a violation of pg_newlocale_from_collation's API specification. The practical effect of this change is quite limited: it allows creating such collations even on platforms that don't HAVE_LOCALE_T, and it allows making "POSIX" collation objects on Windows, which before this would only let you make "C" collation objects. Hence, even though this is a bug fix IMO, it doesn't seem worth the trouble to back-patch. In passing, suppress the DROP CASCADE detail messages at the end of the collation regression test. I'm surprised we've never been bit by message ordering issues there. Per report from Murtuza Zabuawala. Discussion: https://postgr.es/m/CAKKotZS-wcDcofXDCH=sidiuajE+nqHn2CGjLLX78anyDmi3gQ@mail.gmail.com	2017-08-01 13:51:05 -04:00
Alvaro Herrera	de38489b92	Fix typo in comment Commit `fd31cd2651` renamed the variable to skipping_blocks, but forgot to update this comment. Noticed while inspecting code.	2017-07-21 20:08:53 -04:00
Robert Haas	f81a91db4d	Use a real RT index when setting up partition tuple routing. Before, we always used a dummy value of 1, but that's not right when the partitioned table being modified is inside of a WITH clause rather than part of the main query. Amit Langote, reported and reviewd by Etsuro Fujita, with a comment change by me. Discussion: http://postgr.es/m/ee12f648-8907-77b5-afc0-2980bcb0aa37@lab.ntt.co.jp	2017-07-17 21:29:45 -04:00
Andrew Gierth	1add0b15f1	Fix COPY's handling of transition tables with indexes. Commit `c46c0e5202` failed to pass the TransitionCaptureState object to ExecARInsertTriggers() in the case where it's using heap_multi_insert and there are indexes. Repair. Thomas Munro, from a report by David Fetter Discussion: https://postgr.es/m/20170708084213.GA14720%40fetter.org	2017-07-10 11:40:08 +01:00
Tom Lane	ec4073f641	Avoid unreferenced-function warning on low-functionality platforms. On platforms lacking both locale_t and ICU, collationcmds.c failed to make any use of its static function is_all_ascii(), thus probably drawing a compiler warning. Oversight in my commit `ddb5fdc06`. Per buildfarm member gaur.	2017-07-08 12:42:25 -04:00
Alvaro Herrera	46e9151942	Fix typo Noticed while reviewing code.	2017-07-07 17:10:36 -04:00
Peter Eisentraut	cb9079cd51	Improve subscription locking This avoids "tuple concurrently updated" errors when a ALTER or DROP SUBSCRIPTION writes to pg_subscription_rel at the same time as a worker. Author: Petr Jelinek <petr.jelinek@2ndquadrant.com>	2017-07-03 22:47:06 -04:00
Peter Eisentraut	54baa48139	Copy collencoding in CREATE COLLATION / FROM This command used to compute the collencoding entry like when a completely new collation is created. But for example when copying the "C" collation, this would then result in a collation that has a collencoding entry for the current database encoding rather than -1, thus not making an exact copy. This has probably no practical impact, but making this change keeps the catalog contents neat. Reported-by: Tom Lane <tgl@sss.pgh.pa.us>	2017-06-30 08:50:39 -04:00
Andrew Gierth	8c55244ae3	Fix transition tables for ON CONFLICT. We now disallow having triggers with both transition tables and ON INSERT OR UPDATE (which was a PG extension to the spec anyway), because in this case it's not at all clear how the transition tables should work for an INSERT ... ON CONFLICT query. Separate ON INSERT and ON UPDATE triggers with transition tables are allowed, and the transition tables for these reflect only the inserted and only the updated tuples respectively. Patch by Thomas Munro Discussion: https://postgr.es/m/CAEepm%3D11KHQ0JmETJQihSvhZB5mUZL2xrqHeXbCeLhDiqQ39%3Dw%40mail.gmail.com	2017-06-28 19:00:55 +01:00
Andrew Gierth	c46c0e5202	Fix transition tables for wCTEs. The original coding didn't handle this case properly; each separate DML substatement needs its own set of transitions. Patch by Thomas Munro Discussion: https://postgr.es/m/CAL9smLCDQ%3D2o024rBgtD4WihzX8B3C6u_oSQ2K3%2BR5grJrV0bg%40mail.gmail.com	2017-06-28 18:59:01 +01:00
Andrew Gierth	501ed02cf6	Fix transition tables for partition/inheritance. We disallow row-level triggers with transition tables on child tables. Transition tables for triggers on the parent table contain only those columns present in the parent. (We can't mix tuple formats in a single transition table.) Patch by Thomas Munro Discussion: https://postgr.es/m/CA%2BTgmoZzTBBAsEUh4MazAN7ga%3D8SsMC-Knp-6cetts9yNZUCcg%40mail.gmail.com	2017-06-28 18:55:03 +01:00
Tom Lane	ddb5fdc068	Further hacking on ICU collation creation and usage. pg_import_system_collations() refused to create any ICU collations if the current database's encoding didn't support ICU. This is wrongheaded: initdb must initialize pg_collation in an encoding-independent way since it might be used in other databases with different encodings. The reason for the restriction seems to be that get_icu_locale_comment() used icu_from_uchar() to convert the UChar-format display name, and that unsurprisingly doesn't know what to do in unsupported encodings. But by the same token that the initial catalog contents must be encoding-independent, we can't allow non-ASCII characters in the comment strings. So we don't really need icu_from_uchar() here: just check for Unicode codes outside the ASCII range, and if there are none, the format conversion is trivial. If there are some, we can simply not install the comment. (In my testing, this affects only Norwegian Bokmål, which has given us trouble before.) For paranoia's sake, also check for non-ASCII characters in ICU locale names, and skip such locales, as we do for libc locales. I don't currently have a reason to believe that this will ever reject anything, but then again the libc maintainers should have known better too. With just the import changes, ICU collations can be found in pg_collation in databases with unsupported encodings. This resulted in more or less clean failures at runtime, but that's not how things act for unsupported encodings with libc collations. Make it work the same as our traditional behavior for libc collations by having collation lookup take into account whether is_encoding_supported_by_icu(). Adjust documentation to match. Also, expand Table 23.1 to show which encodings are supported by ICU. catversion bump because of likely change in pg_collation/pg_description initial contents in ICU-enabled builds. Discussion: https://postgr.es/m/20c74bc3-d6ca-243d-1bbc-12f17fa4fe9a@gmail.com	2017-06-24 13:54:23 -04:00
Tom Lane	d1fcc62298	Fix incorrect buffer-length argument to uloc_getDisplayName(). The maxResultSize argument of uloc_getDisplayName is the number of UChars in the output buffer, not the number of bytes. In principle this could result in a stack smash, although at least in my Fedora 25 install there are no ICU locales with display names long enough to overrun the buffer. But it's easily proven to be wrong by reducing the length of displayname to around 20, whereupon a stack smash does happen. (This is a rather scary bug, because the same mistake could easily have been made in other places; but in a quick code search looking at uses of UChar I could not find any other instances.)	2017-06-23 16:00:55 -04:00
Tom Lane	0b13b2a771	Rethink behavior of pg_import_system_collations(). Marco Atzeri reported that initdb would fail if "locale -a" reported the same locale name more than once. All previous versions of Postgres implicitly de-duplicated the results of "locale -a", but the rewrite to move the collation import logic into C had lost that property. It had also lost the property that locale names matching built-in collation names were silently ignored. The simplest way to fix this is to make initdb run the function in if-not-exists mode, which means that there's no real use-case for non if-not-exists mode; we might as well just drop the boolean argument and simplify the function's definition to be "add any collations not already known". This change also gets rid of some odd corner cases caused by the fact that aliases were added in if-not-exists mode even if the function argument said otherwise. While at it, adjust the behavior so that pg_import_system_collations() doesn't spew "collation foo already exists, skipping" messages during a re-run; that's completely unhelpful, especially since there are often hundreds of them. And make it return a count of the number of collations it did add, which seems like it might be helpful. Also, re-integrate the previous coding's property that it would make a deterministic selection of which alias to use if there were conflicting possibilities. This would only come into play if "locale -a" reports multiple equivalent locale names, say "de_DE.utf8" and "de_DE.UTF-8", but that hardly seems out of the question. In passing, fix incorrect behavior in pg_import_system_collations()'s ICU code path: it neglected CommandCounterIncrement, which would result in failures if ICU returns duplicate names, and it would try to create comments even if a new collation hadn't been created. Also, reorder operations in initdb so that the 'ucs_basic' collation is created before calling pg_import_system_collations() not after. This prevents a failure if "locale -a" were to report a locale named that. There's no reason to think that that ever happens in the wild, but the old coding would have survived it, so let's be equally robust. Discussion: https://postgr.es/m/20c74bc3-d6ca-243d-1bbc-12f17fa4fe9a@gmail.com	2017-06-23 14:19:58 -04:00
Tom Lane	b6159202c9	Fix memory leakage in ICU encoding conversion, and other code review. Callers of icu_to_uchar() neglected to pfree the result string when done with it. This results in catastrophic memory leaks in varstr_cmp(), because of our prevailing assumption that btree comparison functions don't leak memory. For safety, make all the call sites clean up leaks, though I suspect that we could get away without it in formatting.c. I audited callers of icu_from_uchar() as well, but found no places that seemed to have a comparable issue. Add function API specifications for icu_to_uchar() and icu_from_uchar(); the lack of any thought-through specification is perhaps not unrelated to the existence of this bug in the first place. Fix icu_to_uchar() to guarantee a nul-terminated result; although no existing caller appears to care, the fact that it would have been nul-terminated except in extreme corner cases seems ideally designed to bite someone on the rear someday. Fix ucnv_fromUChars() destCapacity argument --- in the worst case, that could perhaps have led to a non-nul-terminated result, too. Fix icu_from_uchar() to have a more reasonable definition of the function result --- no callers are actually paying attention, so this isn't a live bug, but it's certainly sloppily designed. Const-ify icu_from_uchar()'s input string for consistency. That is not the end of what needs to be done to these functions, but it's as much as I have the patience for right now. Discussion: https://postgr.es/m/1955.1498181798@sss.pgh.pa.us	2017-06-23 12:22:06 -04:00
Robert Haas	2a6db5eba6	Update out-of-date comment in vacuumlazy.c Commit `15c121b3ed` seems to have overlooked the need to trim this part of the comment. Pavan Deolasee Discussion: http://postgr.es/m/CABOikdPq_9+cWRNZ0RLKTwuZyj=uL85X=Usifa-CbPee1ZCM5A@mail.gmail.com	2017-06-22 13:38:53 -04:00
Tom Lane	382ceffdf7	Phase 3 of pgindent updates. Don't move parenthesized lines to the left, even if that means they flow past the right margin. By default, BSD indent lines up statement continuation lines that are within parentheses so that they start just to the right of the preceding left parenthesis. However, traditionally, if that resulted in the continuation line extending to the right of the desired right margin, then indent would push it left just far enough to not overrun the margin, if it could do so without making the continuation line start to the left of the current statement indent. That makes for a weird mix of indentations unless one has been completely rigid about never violating the 80-column limit. This behavior has been pretty universally panned by Postgres developers. Hence, disable it with indent's new -lpl switch, so that parenthesized lines are always lined up with the preceding left paren. This patch is much less interesting than the first round of indent changes, but also bulkier, so I thought it best to separate the effects. Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us	2017-06-21 15:35:54 -04:00
Tom Lane	c7b8998ebb	Phase 2 of pgindent updates. Change pg_bsd_indent to follow upstream rules for placement of comments to the right of code, and remove pgindent hack that caused comments following #endif to not obey the general rule. Commit `e3860ffa4d` wasn't actually using the published version of pg_bsd_indent, but a hacked-up version that tried to minimize the amount of movement of comments to the right of code. The situation of interest is where such a comment has to be moved to the right of its default placement at column 33 because there's code there. BSD indent has always moved right in units of tab stops in such cases --- but in the previous incarnation, indent was working in 8-space tab stops, while now it knows we use 4-space tabs. So the net result is that in about half the cases, such comments are placed one tab stop left of before. This is better all around: it leaves more room on the line for comment text, and it means that in such cases the comment uniformly starts at the next 4-space tab stop after the code, rather than sometimes one and sometimes two tabs after. Also, ensure that comments following #endif are indented the same as comments following other preprocessor commands such as #else. That inconsistency turns out to have been self-inflicted damage from a poorly-thought-through post-indent "fixup" in pgindent. This patch is much less interesting than the first round of indent changes, but also bulkier, so I thought it best to separate the effects. Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us	2017-06-21 15:19:25 -04:00
Tom Lane	e3860ffa4d	Initial pgindent run with pg_bsd_indent version 2.0. The new indent version includes numerous fixes thanks to Piotr Stefaniak. The main changes visible in this commit are: * Nicer formatting of function-pointer declarations. * No longer unexpectedly removes spaces in expressions using casts, sizeof, or offsetof. * No longer wants to add a space in "struct structname varname", as well as some similar cases for const- or volatile-qualified pointers. Declarations using PG_USED_FOR_ASSERTS_ONLY are formatted more nicely. * Fixes bug where comments following declarations were sometimes placed with no space separating them from the code. * Fixes some odd decisions for comments following case labels. * Fixes some cases where comments following code were indented to less than the expected column 33. On the less good side, it now tends to put more whitespace around typedef names that are not listed in typedefs.list. This might encourage us to put more effort into typedef name collection; it's not really a bug in indent itself. There are more changes coming after this round, having to do with comment indentation and alignment of lines appearing within parentheses. I wanted to limit the size of the diffs to something that could be reviewed without one's eyes completely glazing over, so it seemed better to split up the changes as much as practical. Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us	2017-06-21 14:39:04 -04:00
Peter Eisentraut	94da2a6a9a	Use RangeVarGetRelidExtended() in AlterSequence() This allows us to combine the opening and the ownership check. Reported-by: Robert Haas <robertmhaas@gmail.com>	2017-06-16 10:24:50 -04:00
Heikki Linnakangas	30681c830d	Fix dependency, when changing a function's argument/return type. When a new base type is created using the old-style procedure of first creating the input/output functions with "opaque" in place of the base type, the "opaque" argument/return type is changed to the final base type, on CREATE TYPE. However, we did not create a pg_depend record when doing that, so the functions were left not depending on the type. Fixes bug #14706, reported by Karen Huddleston. Discussion: https://www.postgresql.org/message-id/20170614232259.1424.82774@wrigleys.postgresql.org	2017-06-16 11:33:12 +03:00
Robert Haas	b08df9cab7	Teach predtest.c about CHECK clauses to fix partitioning bugs. In a CHECK clause, a null result means true, whereas in a WHERE clause it means false. predtest.c provided different functions depending on which set of semantics applied to the predicate being proved, but had no option to control what a null meant in the clauses provided as axioms. Add one. Use that in the partitioning code when figuring out whether the validation scan on a new partition can be skipped. Rip out the old logic that attempted (not very successfully) to compensate for the absence of the necessary support in predtest.c. Ashutosh Bapat and Robert Haas, reviewed by Amit Langote and incorporating feedback from Tom Lane. Discussion: http://postgr.es/m/CAFjFpReT_kq_uwU_B8aWDxR7jNGE=P0iELycdq5oupi=xSQTOw@mail.gmail.com	2017-06-14 13:13:11 -04:00
Tom Lane	a571c7f661	Fix violations of CatalogTupleInsert/Update/Delete abstraction. In commits `2f5c9d9c9` and `ab0289651` we invented an abstraction layer to insulate catalog manipulations from direct heap update calls. But evidently some patches that hadn't landed in-tree at that point didn't get the memo completely. Fix a couple of direct calls to simple_heap_delete to use CatalogTupleDelete instead; these appear to have been added in commits `7c4f52409` and `7b504eb28`. This change is purely cosmetic ATM, but there's no point in having an abstraction layer if we allow random code to break it. Masahiko Sawada and Tom Lane Discussion: https://postgr.es/m/CAD21AoDOPRSVcwbnCN3Y1n_68ATyTspsU6=ygtHz_uY0VcdZ8A@mail.gmail.com	2017-06-14 10:26:46 -04:00
Dean Rasheed	f356ec5744	Teach RemoveRoleFromObjectPolicy() about partitioned tables. Table partitioning, introduced in commit `f0e44751d7`, added a new relkind - RELKIND_PARTITIONED_TABLE. Update RemoveRoleFromObjectPolicy() to handle it, otherwise DROP OWNED BY will fail if the role has any RLS policies referring to partitioned tables. Dean Rasheed, reviewed by Amit Langote. Discussion: https://postgr.es/m/CAEZATCUnNOKN8sLML9jUzxecALWpEXK3a3W7y0PgFR4%2Buhgc%3Dg%40mail.gmail.com	2017-06-14 08:43:40 +01:00
Tom Lane	651902deb1	Re-run pgindent. This is just to have a clean base state for testing of Piotr Stefaniak's latest version of FreeBSD indent. I fixed up a couple of places where pgindent would have changed format not-nicely. perltidy not included. Discussion: https://postgr.es/m/VI1PR03MB119959F4B65F000CA7CD9F6BF2CC0@VI1PR03MB1199.eurprd03.prod.outlook.com	2017-06-13 13:05:59 -04:00
Robert Haas	ee252f074b	Fix failure to remove dependencies when a partition is detached. Otherwise, dropping the partitioned table will automatically drop any previously-detached children, which would be unfortunate. Ashutosh Bapat and Rahila Syed, reviewed by Amit Langote and by me. Discussion: http://postgr.es/m/CAFjFpRdOwHuGj45i25iLQ4QituA0uH6RuLX1h5deD4KBZJ25yg@mail.gmail.com	2017-06-13 11:51:42 -04:00
Peter Eisentraut	17082a88ea	Prevent copying default collation This will not have the desired effect and might lead to crashes when the copied collation is used. Reported-by: Tom Lane <tgl@sss.pgh.pa.us>	2017-06-13 08:49:41 -04:00
Tom Lane	a475e46634	Fix ALTER SEQUENCE OWNED BY to not rewrite the sequence relation. It's not necessary for it to do that, since OWNED BY requires only ordinary catalog updates and doesn't affect future sequence values. And pg_upgrade needs to use OWNED BY without having it change the sequence's relfilenode. Commit `3d79013b9` broke this by making all forms of ALTER SEQUENCE change the relfilenode; that seems to be the explanation for the hard-to-reproduce buildfarm failures we've been seeing since then. Discussion: https://postgr.es/m/19785.1497215827@sss.pgh.pa.us	2017-06-12 16:57:31 -04:00
Peter Eisentraut	7f28a7946a	Remove "synchronized table states" notice message It appears to be more confusing than useful. Reported-by: Jeff Janes <jeff.janes@gmail.com>	2017-06-12 11:42:06 -04:00
Peter Eisentraut	253504fb9f	Fix build of ICU support in Windows and also any platform that does not have locale_t but enabled ICU. Author: Ashutosh Sharma <ashu.coek88@gmail.com>	2017-06-12 10:28:37 -04:00
Peter Eisentraut	ddd7b22b22	Stop table sync workers when subscription relation entry is removed When a table sync worker is in waiting state and the subscription table entry is removed because of a concurrent subscription refresh, the worker could be left orphaned. To avoid that, explicitly stop the worker when the pg_subscription_rel entry is removed. Reported-by: Masahiko Sawada <sawada.mshk@gmail.com>	2017-06-12 08:53:37 -04:00
Peter Eisentraut	644ea35fc1	Fix updating of pg_subscription_rel from workers A logical replication worker should not insert new rows into pg_subscription_rel, only update existing rows, so that there are no races if a concurrent refresh removes rows. Adjust the API to be able to choose that behavior. Author: Masahiko Sawada <sawada.mshk@gmail.com> Reported-by: tushar <tushar.ahuja@enterprisedb.com>	2017-06-07 13:49:14 -04:00
Robert Haas	15ce775faa	Prevent BEFORE triggers from violating partitioning constraints. Since tuple-routing implicitly checks the partitioning constraints at least for the levels of the partitioning hierarchy it traverses, there's normally no need to revalidate the partitioning constraint after performing tuple routing. However, if there's a BEFORE trigger on the target partition, it could modify the tuple, causing the partitioning constraint to be violated. Catch that case. Also, instead of checking the root table's partition constraint after tuple-routing, check it beforehand. Otherwise, the rules for when the partitioning constraint gets checked get too complicated, because you sometimes have to check part of the constraint but not all of it. This effectively reverts commit `39162b2030` in favor of a different approach altogether. Report by me. Initial debugging by Jeevan Ladhe. Patch by Amit Langote, reviewed by me. Discussion: http://postgr.es/m/CA+Tgmoa9DTgeVOqopieV8d1QRpddmP65aCdxyjdYDoEO5pS5KA@mail.gmail.com	2017-06-07 12:50:45 -04:00
Peter Eisentraut	9907b55ceb	Fix ALTER SUBSCRIPTION grammar ambiguity There was a grammar ambiguity between SET PUBLICATION name REFRESH and SET PUBLICATION SKIP REFRESH, because SKIP is not a reserved word. To resolve that, fold the refresh choice into the WITH options. Refreshing is the default now. Reported-by: tushar <tushar.ahuja@enterprisedb.com>	2017-06-05 21:43:25 -04:00
Tom Lane	e7941a9766	Replace over-optimistic Assert in partitioning code with a runtime test. get_partition_parent felt that it could simply Assert that systable_getnext found a tuple. This is unlike any other caller of that function, and it's unsafe IMO --- in fact, the reason I noticed it was that the Assert failed. (OK, I was working with known-inconsistent catalog contents, but I wasn't expecting the DB to fall over quite that violently. The behavior in a non-assert-enabled build wouldn't be very nice, either.) Fix it to do what other callers do, namely an actual runtime-test-and-elog. Also, standardize the wording of elog messages that are complaining about unexpected failure of systable_getnext. 90% of them say "could not find tuple for <object>", so make the remainder do likewise. Many of the holdouts were using the phrasing "cache lookup failed", which is outright misleading since no catcache search is involved.	2017-06-04 16:20:03 -04:00
Tom Lane	0d18852666	Disallow CREATE INDEX if table is already in use in current session. If we allow this, whatever outer command has the table open will not know about the new index and may fail to update it as needed, as shown in a report from Laurenz Albe. We already had such a prohibition in place for ALTER TABLE, but the CREATE INDEX syntax missed the check. Fixing it requires an API change for DefineIndex(), which conceivably would break third-party extensions if we were to back-patch it. Given how long this problem has existed without being noticed, fixing it in the back branches doesn't seem worth that risk. Discussion: https://postgr.es/m/A737B7A37273E048B164557ADEF4A58B53A4DC9A@ntex2010i.host.magwien.gv.at	2017-06-04 12:02:41 -04:00
Alvaro Herrera	55a70a023c	Assorted translatable string fixes Mark our rusage reportage string translatable; remove quotes from type names; unify formatting of very similar messages.	2017-06-04 11:41:16 -04:00
Andres Freund	34aebcf42a	Allow parallelism in COPY (query) TO ...; Previously this was not allowed, as copy.c didn't set the CURSOR_OPT_PARALLEL_OK flag when planning the query. Set it. While the lack of parallel query for COPY isn't strictly speaking a bug, it does prevent parallelism from being used in a facility commonly used to run long running queries. Thus backpatch to 9.6. Author: Andres Freund Discussion: https://postgr.es/m/20170531231958.ihanapplorptykzm@alap3.anarazel.de Backpatch: 9.6, where parallelism was introduced.	2017-06-02 19:11:15 -07:00
Andres Freund	665104557f	Modify sequence catalog tuple before invoking post alter hook. This seems to have been broken in the commit (`1753b1b027`) that moved the sequence definition into pg_sequence. Author: Andres Freund Discussion: https://postgr.es/m/20170601000716.qxg7c46ukkiljjb3@alap3.anarazel.de Backpatch: Bug is in master/v10 only	2017-06-01 14:19:33 -07:00
Andres Freund	3d79013b97	Make ALTER SEQUENCE, including RESTART, fully transactional. Previously the changes to the "data" part of the sequence, i.e. the one containing the current value, were not transactional, whereas the definition, including minimum and maximum value were. That leads to odd behaviour if a schema change is rolled back, with the potential that out-of-bound sequence values can be returned. To avoid the issue create a new relfilenode fork whenever ALTER SEQUENCE is executed, similar to how TRUNCATE ... RESTART IDENTITY already is already handled. This commit also makes ALTER SEQUENCE RESTART transactional, as it seems to be too confusing to have some forms of ALTER SEQUENCE behave transactionally, some forms not. This way setval() and nextval() are not transactional, but DDL is, which seems to make sense. This commit also rolls back parts of the changes made in `3d092fe540` and `f8dc1985f` as they're now not needed anymore. Author: Andres Freund Discussion: https://postgr.es/m/20170522154227.nvafbsm62sjpbxvd@alap3.anarazel.de Backpatch: Bug is in master/v10 only	2017-06-01 14:19:33 -07:00
Tom Lane	d5cb3bab56	Fix improper quoting of format_type_be() output. Per our message style guidelines, error messages incorporating the results of format_type_be() and its siblings should not add quotes around those results, because those functions already add quotes at need. Fix a few places that hadn't gotten that memo.	2017-05-29 21:48:26 -04:00
Tom Lane	76a3df6e5e	Code review focused on new node types added by partitioning support. Fix failure to check that we got a plain Const from const-simplification of a coercion request. This is the cause of bug #14666 from Tian Bing: there is an int4 to money cast, but it's only stable not immutable (because of dependence on lc_monetary), resulting in a FuncExpr that the code was miserably unequipped to deal with, or indeed even to notice that it was failing to deal with. Add test cases around this coercion behavior. In view of the above, sprinkle the code liberally with castNode() macros, in hope of catching the next such bug a bit sooner. Also, change some functions that were randomly declared to take Node* to take more specific pointer types. And change some struct fields that were declared Node* but could be given more specific types, allowing removal of assorted explicit casts. Place PARTITION_MAX_KEYS check a bit closer to the code it's protecting. Likewise check only-one-key-for-list-partitioning restriction in a less random place. Avoid not-per-project-style usages like !strcmp(...). Fix assorted failures to avoid scribbling on the input of parse transformation. I'm not sure how necessary this is, but it's entirely silly for these functions to be expending cycles to avoid that and not getting it right. Add guards against partitioning on system columns. Put backend/nodes/ support code into an order that matches handling of these node types elsewhere. Annotate the fact that somebody added location fields to PartitionBoundSpec and PartitionRangeDatum but forgot to handle them in outfuncs.c/readfuncs.c. This is fairly harmless for production purposes (since readfuncs.c would just substitute -1 anyway) but it's still bogus. It's not worth forcing a post-beta1 initdb just to fix this, but if we have another reason to force initdb before 10.0, we should go back and clean this up. Contrariwise, somebody added location fields to PartitionElem and PartitionSpec but forgot to teach exprLocation() about them. Consolidate duplicative code in transformPartitionBound(). Improve a couple of error messages. Improve assorted commentary. Re-pgindent the files touched by this patch; this affects a few comment blocks that must have been added quite recently. Report: https://postgr.es/m/20170524024550.29935.14396@wrigleys.postgresql.org	2017-05-28 23:20:28 -04:00
Peter Eisentraut	e807d8b163	Fix mistake in error message Reported-by: tushar <tushar.ahuja@enterprisedb.com> Author: Dilip Kumar <dilipbalaut@gmail.com>	2017-05-19 16:30:02 -04:00
Robert Haas	ac8d7e1b83	Fix corruption of tableElts list by MergeAttributes(). Since commit `e7b3349a8a`, MergeAttributes destructively modifies the input List, to which the caller's CreateStmt still points. One may wonder whether this was already a bug, but commit `f0e44751d7` made things noticeably worse by adding additional destructive modifications so that the caller's List might, in the case of creation a partitioned table, no longer even be structurally valid. Restore the status quo ante by assigning the return value of MergeAttributes back to stmt->tableElts in the caller. In most of the places where DefineRelation is called, it doesn't matter what stmt->tableElts points to here or whether it's valid or not, because the caller doesn't use the statement for anything after DefineRelation returns anyway. However, ProcessUtilitySlow passes it to EventTriggerCollectSimpleCommand, and that function tries to invoke copyObject on it. If any of the CreateStmt's substructure is invalid at that point, undefined behavior will result. One might wonder whether this whole area needs further revision - perhaps DefineRelation() ought not to be destructively modifying the caller-provided CreateStmt at all. However, that would be a behavior change for any event triggers using C code to inspect the CreateStmt, so for now, just fix the crash. Report by Amit Langote, who provided a somewhat different patch for it. Discussion: http://postgr.es/m/bf6a39a7-100a-74bd-1156-3c16a1429d88@lab.ntt.co.jp	2017-05-19 15:02:16 -04:00
Robert Haas	3ec76ff1f2	Don't explicitly mark range partitioning columns NOT NULL. This seemed like a good idea originally because there's no way to mark a range partition as accepting NULL, but that now seems more like a current limitation than something we want to lock down for all time. For example, there's a proposal to add the notion of a default partition which accepts all rows not otherwise routed, which directly conflicts with the idea that a range-partitioned table should never allow nulls anywhere. So let's change this while we still can, by putting the NOT NULL test into the partition constraint instead of changing the column properties. Amit Langote and Robert Haas, reviewed by Amit Kapila Discussion: http://postgr.es/m/8e2dd63d-c6fb-bb74-3c2b-ed6d63629c9d@lab.ntt.co.jp	2017-05-18 13:49:31 -04:00
Peter Eisentraut	6234569851	Improve CREATE SUBSCRIPTION option parsing When creating a subscription with slot_name = NONE, we failed to check that also create_slot = false and enabled = false were set. This created an invalid subscription and could later lead to a crash if a NULL slot name was accessed. Add more checks around that for robustness. Reported-by: tushar <tushar.ahuja@enterprisedb.com>	2017-05-17 20:47:37 -04:00
Bruce Momjian	a6fd7b7a5f	Post-PG 10 beta1 pgindent run perltidy run not included.	2017-05-17 16:31:56 -04:00
Peter Eisentraut	3db22794b7	Add more tests for CREATE SUBSCRIPTION Add some tests for parsing different option combinations. Fix some of the resulting error messages for recent changes in option naming. Author: Masahiko Sawada <sawada.mshk@gmail.com>	2017-05-17 12:24:48 -04:00
Peter Eisentraut	944dc0f9ce	Check relkind of tables in CREATE/ALTER SUBSCRIPTION We used to only check for a supported relkind on the subscriber during replication, which is needed to ensure that the setup is valid and we don't crash. But it's also useful to tell the user immediately when CREATE or ALTER SUBSCRIPTION is executed that the relation being added to the subscription is not of a supported relkind. Author: Petr Jelinek <petr.jelinek@2ndquadrant.com> Reported-by: tushar <tushar.ahuja@enterprisedb.com>	2017-05-16 22:57:16 -04:00
Tom Lane	c079673dcb	Preventive maintenance in advance of pgindent run. Reformat various places in which pgindent will make a mess, and fix a few small violations of coding style that I happened to notice while perusing the diffs from a pgindent dry run. There is one actual bug fix here: the need-to-enlarge-the-buffer code path in icu_convert_case was obviously broken. Perhaps it's unreachable in our usage? Or maybe this is just sadly undertested.	2017-05-16 20:36:35 -04:00
Robert Haas	59f40566ca	Fix relcache leak when row triggers on partitions are fired by COPY. Thomas Munro, reviewed by Amit Langote Discussion: http://postgr.es/m/CAEepm=15Jss-yhFApuKzxcoCuFnb8TR8iQiWMjG=CLYPx48QLw@mail.gmail.com	2017-05-16 12:46:32 -04:00
Peter Eisentraut	b1ff33fd9b	Add assertion to quiet Coverity	2017-05-15 13:59:58 -04:00
Tom Lane	12590c5d33	Fix unsafe reference into relcache in constructed CommentStmt. The CommentStmt made by RebuildConstraintComment() has to pstrdup the relation name, else it will contain a dangling pointer after that relcache entry is flushed. (I'm less sure that pstrdup'ing conname is necessary, but let's be safe.) Failure to do this leads to weird errors or crashes, as reported by Marko Elezovic. Bug introduced by commit `e42375fc8`, so back-patch to 9.5 as that was. Fix by David Rowley, regression test by Michael Paquier Discussion: https://postgr.es/m/DB6PR03MB30775D58E732D4EB0C13725B9AE00@DB6PR03MB3077.eurprd03.prod.outlook.com	2017-05-15 11:33:44 -04:00
Peter Eisentraut	f8dc1985fd	Fix ALTER SEQUENCE locking In `1753b1b027`, the pg_sequence system catalog was introduced. This made sequence metadata changes transactional, while the actual sequence values are still behaving nontransactionally. This requires some refinement in how ALTER SEQUENCE, which operates on both, locks the sequence and the catalog. The main problems were: - Concurrent ALTER SEQUENCE causes "tuple concurrently updated" error, caused by updates to pg_sequence catalog. - Sequence WAL writes and catalog updates are not protected by same lock, which could lead to inconsistent recovery order. - nextval() disregarding uncommitted ALTER SEQUENCE changes. To fix, nextval() and friends now lock the sequence using RowExclusiveLock instead of AccessShareLock. ALTER SEQUENCE locks the sequence using ShareRowExclusiveLock. This means that nextval() and ALTER SEQUENCE block each other, and ALTER SEQUENCE on the same sequence blocks itself. (This was already the case previously for the OWNER TO, RENAME, and SET SCHEMA variants.) Also, rearrange some code so that the entire AlterSequence is protected by the lock on the sequence. As an exception, use reduced locking for ALTER SEQUENCE ... RESTART. Since that is basically a setval(), it does not require the full locking of other ALTER SEQUENCE actions. So check whether we are only running a RESTART and run with less locking if so. Reviewed-by: Michael Paquier <michael.paquier@gmail.com> Reported-by: Jason Petersen <jason@citusdata.com> Reported-by: Andres Freund <andres@anarazel.de>	2017-05-15 10:19:57 -04:00
Tom Lane	e84c019598	Fix maintenance hazards caused by ill-considered use of default: cases. Remove default cases from assorted switches over ObjectClass and some related enum types, so that we'll get compiler warnings when someone adds a new enum value without accounting for it in all these places. In passing, re-order some switch cases as needed to match the declaration of enum ObjectClass. OK, that's just neatnik-ism, but I dislike code that looks like it was assembled with the help of a dartboard. Discussion: https://postgr.es/m/20170512221010.nglatgt5azzdxjlj@alvherre.pgsql	2017-05-14 13:32:59 -04:00
Tom Lane	b5b0db19b8	Fix handling of extended statistics during ALTER COLUMN TYPE. ALTER COLUMN TYPE on a column used by a statistics object fails since commit `928c4de30`, because the relevant switch in ATExecAlterColumnType is unprepared for columns to have dependencies from OCLASS_STATISTIC_EXT objects. Although the existing types of extended statistics don't actually need us to do any work for a column type change, it seems completely indefensible that that assumption is hidden behind the failure of an unrelated module to contain any code for the case. Hence, create and call an API function in statscmds.c where the assumption can be explained, and where we could add code to deal with the problem when it inevitably becomes real. Also, the reason this wasn't handled before, neither for extended stats nor for the last half-dozen new OCLASS kinds :-(, is that the default: in that switch suppresses compiler warnings, allowing people to miss the need to consider it when adding an OCLASS. We don't really need a default because surely getObjectClass should only return valid values of the enum; so remove it, and add the missed OCLASS entries where they should be. Discussion: https://postgr.es/m/20170512221010.nglatgt5azzdxjlj@alvherre.pgsql	2017-05-14 12:22:25 -04:00
Tom Lane	f04c9a6146	Standardize terminology for pg_statistic_ext entries. Consistently refer to such an entry as a "statistics object", not just "statistics" or "extended statistics". Previously we had a mismash of terms, accompanied by utter confusion as to whether the term was singular or plural. That's not only grating (at least to the ear of a native English speaker) but could be outright misleading, eg in error messages that seemed to be referring to multiple objects where only one could be meant. This commit fixes the code and a lot of comments (though I may have missed a few). I also renamed two new SQL functions, pg_get_statisticsextdef -> pg_get_statisticsobjdef pg_statistic_ext_is_visible -> pg_statistics_obj_is_visible to conform better with this terminology. I have not touched the SGML docs other than fixing those function names; the docs certainly need work but it seems like a separable task. Discussion: https://postgr.es/m/22676.1494557205@sss.pgh.pa.us	2017-05-14 10:55:01 -04:00
Tom Lane	928c4de309	Fix dependencies for extended statistics objects. A stats object ought to have a dependency on each individual column it reads, not the entire table. Doing this honestly lets us get rid of the hard-wired logic in RemoveStatisticsExt, which seems to have been misguidedly modeled on RemoveStatistics; and it will be far easier to extend to multiple tables later. Also, add overlooked dependency on owner, and make the dependency on schema be NORMAL like every other such dependency. There remains some unfinished work here, which is to allow statistics objects to be extension members. That takes more effort than just adding the dependency call, though, so I left it out for now. initdb forced because this changes the set of pg_depend records that should exist for a statistics object. Discussion: https://postgr.es/m/22676.1494557205@sss.pgh.pa.us	2017-05-12 16:26:31 -04:00
Alvaro Herrera	bc085205c8	Change CREATE STATISTICS syntax Previously, we had the WITH clause in the middle of the command, where you'd specify both generic options as well as statistic types. Few people liked this, so this commit changes it to remove the WITH keyword from that clause and makes it accept statistic types only. (We currently don't have any generic options, but if we invent in the future, we will gain a new WITH clause, probably at the end of the command). Also, the column list is now specified without parens, which makes the whole command look more similar to a SELECT command. This change will let us expand the command to supporting expressions (not just columns names) as well as multiple tables and their join conditions. Tom added lots of code comments and fixed some parts of the CREATE STATISTICS reference page, too; more changes in this area are forthcoming. He also fixed a potential problem in the alter_generic regression test, reducing verbosity on a cascaded drop to avoid dependency on message ordering, as we do in other tests. Tom also closed a security bug: we documented that table ownership was required in order to create a statistics object on it, but didn't actually implement it. Implement tab-completion for statistics objects. This can stand some more improvement. Authors: Alvaro Herrera, with lots of cleanup by Tom Lane Discussion: https://postgr.es/m/20170420212426.ltvgyhnefvhixm6i@alvherre.pgsql	2017-05-12 14:59:35 -03:00
Peter Eisentraut	b807f59828	Rework the options syntax for logical replication commands For CREATE/ALTER PUBLICATION/SUBSCRIPTION, use similar option style as other statements that use a WITH clause for options. Author: Petr Jelinek <petr.jelinek@2ndquadrant.com>	2017-05-12 08:57:49 -04:00
Alvaro Herrera	b66adb7b0c	Revert "Permit dump/reload of not-too-large >1GB tuples" This reverts commits `fa2fa99552` and `42f50cb8fa`. While the functionality that was intended to be provided by these commits is desired, the patch didn't actually solve as many of the problematic situations as we hoped, and it created a bunch of its own problems. Since we're going to require more extensive changes soon for other reasons and users have been working around these problems for a long time already, there is no point in spending effort in fixing this halfway measure. Per complaint from Tom Lane. Discussion: https://postgr.es/m/21407.1484606922@sss.pgh.pa.us (Commit `fa2fa99552` had already been reverted in branches 9.5 as `f858524ee4` and 9.6 as `e9e44a0953`, so this touches master only. Commit `42f50cb8fa` was not present in the older branches.)	2017-05-10 18:41:27 -03:00
Robert Haas	9e6104c667	Prohibit transition tables on views and foreign tables. Thomas Munro, per off-list report from Prabhat Sabu. Changes to the message wording for consistency with the existing relkind check for partitioned tables by me. Discussion: http://postgr.es/m/CAEepm=2xJFFpGM+N=gpWx-9Nft2q1oaFZX07_y23AHCrJQLt0g@mail.gmail.com	2017-05-09 23:34:02 -04:00
Robert Haas	29fd3d9da0	Don't permit transition tables with TRUNCATE triggers. Prior to this prohibition, such a trigger caused a crash. Thomas Munro, per a report from Neha Sharma. I added a regression test. Discussion: http://postgr.es/m/CAEepm=0VR5W-N38eTkO_FqJbGqQ_ykbBRmzmvHyxDhy1p=0Csw@mail.gmail.com	2017-05-09 23:24:23 -04:00
Peter Eisentraut	013c1178fd	Remove the NODROP SLOT option from DROP SUBSCRIPTION It turned out this approach had problems, because a DROP command should not have any options other than CASCADE and RESTRICT. Instead, always attempt to drop the slot if there is one configured, but also add an ALTER SUBSCRIPTION action to set the slot to NONE. Author: Petr Jelinek <petr.jelinek@2ndquadrant.com> Reported-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://www.postgresql.org/message-id/29431.1493730652@sss.pgh.pa.us	2017-05-09 10:20:42 -04:00
Peter Eisentraut	fe974cc5a6	Check connection info string in ALTER SUBSCRIPTION Previously it would allow an invalid connection string to be set. Author: Petr Jelinek <petr.jelinek@2ndquadrant.com> Reported-by: tushar <tushar.ahuja@enterprisedb.com>	2017-05-08 14:01:00 -04:00
Heikki Linnakangas	eb61136dc7	Remove support for password_encryption='off' / 'plain'. Storing passwords in plaintext hasn't been a good idea for a very long time, if ever. Now seems like a good time to finally forbid it, since we're messing with this in PostgreSQL 10 anyway. Remove the CREATE/ALTER USER UNENCRYPTED PASSSWORD 'foo' syntax, since storing passwords unencrypted is no longer supported. ENCRYPTED PASSWORD 'foo' is still accepted, but ENCRYPTED is now just a noise-word, it does the same as just PASSWORD 'foo'. Likewise, remove the --unencrypted option from createuser, but accept --encrypted as a no-op for backward compatibility. AFAICS, --encrypted was a no-op even before this patch, because createuser encrypted the password before sending it to the server even if --encrypted was not specified. It added the ENCRYPTED keyword to the SQL command, but since the password was already in encrypted form, it didn't make any difference. The documentation was not clear on whether that was intended or not, but it's moot now. Also, while password_encryption='on' is still accepted as an alias for 'md5', it is now marked as hidden, so that it is not listed as an accepted value in error hints, for example. That's not directly related to removing 'plain', but it seems better this way. Reviewed by Michael Paquier Discussion: https://www.postgresql.org/message-id/16e9b768-fd78-0b12-cfc1-7b6b7f238fde@iki.fi	2017-05-08 11:26:07 +03:00
Tom Lane	9209e07605	Ensure commands in extension scripts see the results of preceding DDL. Due to a missing CommandCounterIncrement() call, parsing of a non-utility command in an extension script would not see the effects of the immediately preceding DDL command, unless that command's execution ends with CommandCounterIncrement() internally ... which some do but many don't. Report by Philippe Beaudoin, diagnosis by Julien Rouhaud. Rather remarkably, this bug has evaded detection since extensions were invented, so back-patch to all supported branches. Discussion: https://postgr.es/m/2cf7941e-4e41-7714-3de8-37b1a8f74dff@free.fr	2017-05-02 18:06:09 -04:00
Peter Eisentraut	3d092fe540	Avoid unnecessary catalog updates in ALTER SEQUENCE ALTER SEQUENCE can do nontransactional changes to the sequence (RESTART clause) and transactional updates to the pg_sequence catalog (most other clauses). When just calling RESTART, the code would still needlessly do a catalog update without any changes. This would entangle that operation in the concurrency issues of a catalog update (causing either locking or concurrency errors, depending on how that issue is to be resolved). Fix by keeping track during options parsing whether a catalog update is needed, and skip it if not. Reported-by: Jason Petersen <jason@citusdata.com>	2017-05-02 10:41:48 -04:00
Peter Eisentraut	a99448ab45	Don't wake up logical replication launcher unnecessarily In CREATE SUBSCRIPTION, only wake up the launcher when the subscription is enabled. Author: Fujii Masao <masao.fujii@gmail.com>	2017-05-01 22:50:32 -04:00
Robert Haas	6a4dda44e0	Fix VALIDATE CONSTRAINT to consider NO INHERIT attribute. Currently, trying to validate a NO INHERIT constraint on the parent will search for the constraint in child tables (where it is not supposed to exist), wrongly causing a "constraint does not exist" error. Amit Langote, per a report from Hans Buschmann. Discussion: http://postgr.es/m/20170421184012.24362.19@wrigleys.postgresql.org	2017-04-28 14:48:38 -04:00
Robert Haas	504c2205ab	Fix crash when partitioned column specified twice. Amit Langote, reviewed by Beena Emerson Discussion: http://postgr.es/m/6ed23d3d-c09d-4cbc-3628-0a8a32f750f4@lab.ntt.co.jp	2017-04-28 13:52:17 -04:00
Fujii Masao	1f8b060121	Fix typo in comment. Author: Masahiko Sawada	2017-04-27 00:03:07 +09:00
Stephen Frost	9139aa1942	Allow ALTER TABLE ONLY on partitioned tables There is no need to forbid ALTER TABLE ONLY on partitioned tables, when no partitions exist yet. This can be handy for users who are building up their partitioned table independently and will create actual partitions later. In addition, this is how pg_dump likes to operate in certain instances. Author: Amit Langote, with some error message word-smithing by me	2017-04-25 16:57:43 -04:00
Peter Eisentraut	a3f17b9c31	Wake up launcher when enabling a subscription Otherwise one would have to wait up to DEFAULT_NAPTIME_PER_CYCLE until the subscription worker is considered for starting. There is a small race condition: If one enables a subscription right after disabling it, the launcher might not have registered the stopping when receiving the wakeup signal for the re-enabling. The start will then not happen right away but after the full cycle time. Author: Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>	2017-04-25 14:40:33 -04:00
Tom Lane	4b34624daa	Code review for commands/statscmds.c. Fix machine-dependent sorting of column numbers. (Odd behavior would only materialize for column numbers above 255, but that's certainly legal.) Fix poor choice of SQLSTATE for some errors, and improve error message wording. (Notably, "is not a scalar type" is a totally misleading way to explain "does not have a default btree opclass".) Avoid taking AccessExclusiveLock on the associated relation during DROP STATISTICS. That's neither necessary nor desirable, and it could easily have put us into situations where DROP fails (compare commit `68ea2b7f9`). Adjust/improve comments. David Rowley and Tom Lane Discussion: https://postgr.es/m/CAKJS1f-GmCfPvBbAEaM5xoVOaYdVgVN1gicALSoYQ77z-+vLbw@mail.gmail.com	2017-04-24 11:15:15 -04:00
Peter Eisentraut	dcb39c37c1	Synchronize table list before creating slot in CREATE SUBSCRIPTION This way a failure to synchronize the table list will not leave an unused slot on the publisher. Author: Petr Jelinek <petr.jelinek@2ndquadrant.com>	2017-04-21 08:37:03 -04:00
Heikki Linnakangas	c727f120ff	Rename "scram" to "scram-sha-256" in pg_hba.conf and password_encryption. Per discussion, plain "scram" is confusing because we actually implement SCRAM-SHA-256 rather than the original SCRAM that uses SHA-1 as the hash algorithm. If we add support for SCRAM-SHA-512 or some other mechanism in the SCRAM family in the future, that would become even more confusing. Most of the internal files and functions still use just "scram" as a shorthand for SCRMA-SHA-256, but I did change PASSWORD_TYPE_SCRAM to PASSWORD_TYPE_SCRAM_SHA_256, as that could potentially be used by 3rd party extensions that hook into the password-check hook. Michael Paquier did this in an earlier version of the SCRAM patch set already, but I didn't include that in the version that was committed. Discussion: https://www.postgresql.org/message-id/fde71ff1-5858-90c8-99a9-1c2427e7bafb@iki.fi	2017-04-18 14:50:50 +03:00
Simon Riggs	51175f3638	Allow COMMENT ON COLUMN with partitioned tables Amit Langote	2017-04-18 10:42:10 +01:00
Peter Eisentraut	e6242c18a5	Set range table for CopyFrom() in tablesync CopyFrom() needs a range table for formatting certain errors for constraint violations. This changes the mechanism of how the range table is passed to the CopyFrom() executor state. We used to generate the range table and one entry for the relation manually inside DoCopy(). Now we use addRangeTableEntryForRelation() to setup the range table and relation entry for the ParseState, which is then passed down by BeginCopyFrom(). Author: Petr Jelinek <petr.jelinek@2ndquadrant.com> Reported-by: Euler Taveira <euler@timbira.com.br>	2017-04-17 23:23:49 -04:00
Alvaro Herrera	ee6922112e	Rename columns in new pg_statistic_ext catalog The new catalog reused a column prefix "sta" from pg_statistic, but this is undesirable, so change the catalog to use prefix "stx" instead. Also, rename the column that lists enabled statistic kinds as "stxkind" rather than "enabled". Discussion: https://postgr.es/m/CAKJS1f_2t5jhSN7huYRFH3w3rrHfG2QU7hiUHsu-Vdjd1rYT3w@mail.gmail.com	2017-04-17 18:34:29 -03:00
Alvaro Herrera	8c5cdb7f4f	Tighten up relation kind checks for extended statistics We were accepting creation of extended statistics only for regular tables, but they can usefully be created for foreign tables, partitioned tables, and materialized views, too. Allow those cases. While at it, make sure all the rejected cases throw a consistent error message, and add regression tests for the whole thing. Author: David Rowley, Álvaro Herrera Discussion: https://postgr.es/m/CAKJS1f-BmGo410bh5RSPZUvOO0LhmHL2NYmdrC_Jm8pk_FfyCA@mail.gmail.com	2017-04-17 17:55:55 -03:00
Fujii Masao	9e0e5550c5	Fix typos in comment and log message.	2017-04-18 03:19:39 +09:00
Peter Eisentraut	887227a1cc	Add option to modify sync commit per subscription This also changes default behaviour of subscription workers to synchronous_commit = off. Author: Petr Jelinek <petr.jelinek@2ndquadrant.com>	2017-04-14 13:58:46 -04:00
Peter Eisentraut	25371a72b9	Remove pstrdup of TextDatumGetCString The result of TextDatumGetCString is already palloc'ed.	2017-04-14 12:54:09 -04:00
Robert Haas	1d5fede4a9	Code review for `c94e6942ce`. validateCheckConstraint() shouldn't try to access the storage for a partitioned table, because it no longer has any. Creating a _RETURN table on a partitioned table shouldn't be allowed, both because there's no value in it and because trying to do so would involve a validation scan against its nonexistent storage. Amit Langote, reviewed by Tom Lane. Regression test outputs updated to pass by me. Discussion: http://postgr.es/m/e5c3cbd3-1551-d6f8-c9e2-51777d632fd2@lab.ntt.co.jp	2017-04-12 11:35:11 -04:00
Robert Haas	258cef1254	Fix possibile deadlock when dropping partitions. heap_drop_with_catalog and RangeVarCallbackForDropRelation should lock the parent before locking the target relation. Amit Langote Discussion: http://postgr.es/m/29588799-a8ce-b0a2-3dae-f39ff6d35922@lab.ntt.co.jp	2017-04-11 09:08:36 -04:00
Tom Lane	8f0530f580	Improve castNode notation by introducing list-extraction-specific variants. This extends the castNode() notation introduced by commit `5bcab1114` to provide, in one step, extraction of a list cell's pointer and coercion to a concrete node type. For example, "lfirst_node(Foo, lc)" is the same as "castNode(Foo, lfirst(lc))". Almost half of the uses of castNode that have appeared so far include a list extraction call, so this is pretty widely useful, and it saves a few more keystrokes compared to the old way. As with the previous patch, back-patch the addition of these macros to pg_list.h, so that the notation will be available when back-patching. Patch by me, after an idea of Andrew Gierth's. Discussion: https://postgr.es/m/14197.1491841216@sss.pgh.pa.us	2017-04-10 13:51:53 -04:00
Robert Haas	c0a8ae7be3	Fix reporting of violations in ExecConstraints, again. We decided in `f1b4c771ea` to pass the original slot to ExecConstraints(), but that breaks when there are BEFORE ROW triggers involved. So we need to do reverse-map the tuples back to the original descriptor instead, as Amit originally proposed. Amit Langote, reviewed by Ashutosh Bapat. One overlooked comment fixed by me. Discussion: http://postgr.es/m/b3a17254-6849-e542-2353-bde4e880b6a4@lab.ntt.co.jp	2017-04-10 12:20:08 -04:00
Tom Lane	9c7f5229ad	Optimize joins when the inner relation can be proven unique. If there can certainly be no more than one matching inner row for a given outer row, then the executor can move on to the next outer row as soon as it's found one match; there's no need to continue scanning the inner relation for this outer row. This saves useless scanning in nestloop and hash joins. In merge joins, it offers the opportunity to skip mark/restore processing, because we know we have not advanced past the first possible match for the next outer row. Of course, the devil is in the details: the proof of uniqueness must depend only on joinquals (not otherquals), and if we want to skip mergejoin mark/restore then it must depend only on merge clauses. To avoid adding more planning overhead than absolutely necessary, the present patch errs in the conservative direction: there are cases where inner_unique or skip_mark_restore processing could be used, but it will not do so because it's not sure that the uniqueness proof depended only on "safe" clauses. This could be improved later. David Rowley, reviewed and rather heavily editorialized on by me Discussion: https://postgr.es/m/CAApHDvqF6Sw-TK98bW48TdtFJ+3a7D2mFyZ7++=D-RyPsL76gw@mail.gmail.com	2017-04-07 22:20:13 -04:00
Magnus Hagander	32e33a7979	Fix typo in comment Masahiko Sawada	2017-04-07 09:30:22 +02:00
Peter Eisentraut	dc0400cc50	Fix compiler warning and add some more comments	2017-04-06 11:18:13 -04:00
Peter Eisentraut	3217327053	Identity columns This is the SQL standard-conforming variant of PostgreSQL's serial columns. It fixes a few usability issues that serial columns have: - CREATE TABLE / LIKE copies default but refers to same sequence - cannot add/drop serialness with ALTER TABLE - dropping default does not drop sequence - need to grant separate privileges to sequence - other slight weirdnesses because serial is some kind of special macro Reviewed-by: Vitaly Burovoy <vitaly.burovoy@gmail.com>	2017-04-06 08:41:37 -04:00
Simon Riggs	68ea2b7f9b	Reduce lock level for CREATE STATISTICS In line with other lock reductions related to planning. Simon Riggs	2017-04-05 18:22:32 -04:00
Simon Riggs	2686ee1b7c	Collect and use multi-column dependency stats Follow on patch in the multi-variate statistics patch series. CREATE STATISTICS s1 WITH (dependencies) ON (a, b) FROM t; ANALYZE; will collect dependency stats on (a, b) and then use the measured dependency in subsequent query planning. Commit `7b504eb282` added CREATE STATISTICS with n-distinct coefficients. These are now specified using the mutually exclusive option WITH (ndistinct). Author: Tomas Vondra, David Rowley Reviewed-by: Kyotaro HORIGUCHI, Álvaro Herrera, Dean Rasheed, Robert Haas and many other comments and contributions Discussion: https://postgr.es/m/56f40b20-c464-fad2-ff39-06b668fac47c@2ndquadrant.com	2017-04-05 18:00:42 -04:00
Peter Eisentraut	60a0b2ec89	Adjust min/max values when changing sequence type When changing the type of a sequence, adjust the min/max values of the sequence if it looks like the previous values were the default values. Previously, it would leave the old values in place, requiring manual adjustments even in the usual/default cases. Reviewed-by: Michael Paquier <michael.paquier@gmail.com> Reviewed-by: Vitaly Burovoy <vitaly.burovoy@gmail.com>	2017-04-04 12:49:39 -04:00
Kevin Grittner	41bd155dd6	Fix two undocumented parameters to functions from ENR patch. On ProcessUtility document the parameter, to match others. On CreateCachedPlan drop the queryEnv parameter. It was not referenced within the function, and had been added on the assumption that with some unknown future usage of QueryEnvironment it might be useful to do something there. We have avoided other "just in case" implementation of unused paramters, so drop it here. Per gripe from Tom Lane	2017-04-01 15:21:05 -05:00
Kevin Grittner	18ce3a4ab2	Add infrastructure to support EphemeralNamedRelation references. A QueryEnvironment concept is added, which allows new types of objects to be passed into queries from parsing on through execution. At this point, the only thing implemented is a collection of EphemeralNamedRelation objects -- relations which can be referenced by name in queries, but do not exist in the catalogs. The only type of ENR implemented is NamedTuplestore, but provision is made to add more types fairly easily. An ENR can carry its own TupleDesc or reference a relation in the catalogs by relid. Although these features can be used without SPI, convenience functions are added to SPI so that ENRs can easily be used by code run through SPI. The initial use of all this is going to be transition tables in AFTER triggers, but that will be added to each PL as a separate commit. An incidental effect of this patch is to produce a more informative error message if an attempt is made to modify the contents of a CTE from a referencing DML statement. No tests previously covered that possibility, so one is added. Kevin Grittner and Thomas Munro Reviewed by Heikki Linnakangas, David Fetter, and Thomas Munro with valuable comments and suggestions from many others	2017-03-31 23:17:18 -05:00
Tom Lane	64d4da511c	For foreign keys, check REFERENCES privilege only on the referenced table. We were requiring that the user have REFERENCES permission on both the referenced and referencing tables --- but this doesn't seem to have any support in the SQL standard, which says only that you need REFERENCES permission on the referenced table. And ALTER TABLE ADD FOREIGN KEY has already checked that you own the referencing table, so the check could only fail if a table owner has revoked his own REFERENCES permission. Moreover, the symmetric interpretation of this permission is unintuitive and confusing, as per complaint from Paul Jungwirth. So let's drop the referencing-side check. In passing, do a bit of wordsmithing on the GRANT reference page so that all the privilege types are described in similar fashion. Discussion: https://postgr.es/m/8940.1490906755@sss.pgh.pa.us	2017-03-31 18:11:30 -04:00
Peter Eisentraut	4cb824699e	Cast result of copyObject() to correct type copyObject() is declared to return void , which allows easily assigning the result independent of the input, but it loses all type checking. If the compiler supports typeof or something similar, cast the result to the input type. This creates a greater amount of type safety. In some cases, where the result is assigned to a generic type such as Node or Expr *, new casts are now necessary, but in general casts are now unnecessary in the normal case and indicate that something unusual is happening. Reviewed-by: Mark Dilger <hornschnorter@gmail.com>	2017-03-28 21:59:23 -04:00
Simon Riggs	ff539da316	Cleanup slots during drop database Automatically drop all logical replication slots associated with a database when the database is dropped. Previously we threw an ERROR if a slot existed. Now we throw ERROR only if a slot is active in the database being dropped. Craig Ringer	2017-03-28 10:05:21 -04:00
Andrew Gierth	b5635948ab	Support hashed aggregation with grouping sets. This extends the Aggregate node with two new features: HashAggregate can now run multiple hashtables concurrently, and a new strategy MixedAggregate populates hashtables while doing sorted grouping. The planner will now attempt to save as many sorts as possible when planning grouping sets queries, while not exceeding work_mem for the estimated combined sizes of all hashtables used. No SQL-level changes are required. There should be no user-visible impact other than the new EXPLAIN output and possible changes to result ordering when ORDER BY was not used (which affected a few regression tests). The enable_hashagg option is respected. Author: Andrew Gierth Reviewers: Mark Dilger, Andres Freund Discussion: https://postgr.es/m/87vatszyhj.fsf@news-spur.riddles.org.uk	2017-03-27 04:20:54 +01:00
Tom Lane	9b95f2fa1e	Use ExecPrepareExpr in place of ExecPrepareCheck where appropriate. Change one more place where ExecInitCheck/ExecPrepareCheck's insistence on getting implicit-AND-format quals wasn't really helpful, because the caller had to do make_ands_implicit() for no reason that it cared about. Using ExecPrepareExpr directly simplifies the code and saves cycles. The only remaining use of these functions is to process resultRelInfo->ri_PartitionCheck quals. However, implicit-AND format does seem to be what we want for that, so leave it alone.	2017-03-26 18:14:03 -04:00
Andres Freund	b8d7f053c5	Faster expression evaluation and targetlist projection. This replaces the old, recursive tree-walk based evaluation, with non-recursive, opcode dispatch based, expression evaluation. Projection is now implemented as part of expression evaluation. This both leads to significant performance improvements, and makes future just-in-time compilation of expressions easier. The speed gains primarily come from: - non-recursive implementation reduces stack usage / overhead - simple sub-expressions are implemented with a single jump, without function calls - sharing some state between different sub-expressions - reduced amount of indirect/hard to predict memory accesses by laying out operation metadata sequentially; including the avoidance of nearly all of the previously used linked lists - more code has been moved to expression initialization, avoiding constant re-checks at evaluation time Future just-in-time compilation (JIT) has become easier, as demonstrated by released patches intended to be merged in a later release, for primarily two reasons: Firstly, due to a stricter split between expression initialization and evaluation, less code has to be handled by the JIT. Secondly, due to the non-recursive nature of the generated "instructions", less performance-critical code-paths can easily be shared between interpreted and compiled evaluation. The new framework allows for significant future optimizations. E.g.: - basic infrastructure for to later reduce the per executor-startup overhead of expression evaluation, by caching state in prepared statements. That'd be helpful in OLTPish scenarios where initialization overhead is measurable. - optimizing the generated "code". A number of proposals for potential work has already been made. - optimizing the interpreter. Similarly a number of proposals have been made here too. The move of logic into the expression initialization step leads to some backward-incompatible changes: - Function permission checks are now done during expression initialization, whereas previously they were done during execution. In edge cases this can lead to errors being raised that previously wouldn't have been, e.g. a NULL array being coerced to a different array type previously didn't perform checks. - The set of domain constraints to be checked, is now evaluated once during expression initialization, previously it was re-built every time a domain check was evaluated. For normal queries this doesn't change much, but e.g. for plpgsql functions, which caches ExprStates, the old set could stick around longer. The behavior around might still change. Author: Andres Freund, with significant changes by Tom Lane, changes by Heikki Linnakangas Reviewed-By: Tom Lane, Heikki Linnakangas Discussion: https://postgr.es/m/20161206034955.bh33paeralxbtluv@alap3.anarazel.de	2017-03-25 14:52:06 -07:00
Peter Eisentraut	e3920ac823	Add more subscription DDL tests Add more tests for various variants of subscription DDL commands, based on code coverage report. Fix a small bug discovered by that.	2017-03-24 21:48:05 -04:00
Peter Eisentraut	176cbc2a25	Check that published table exists on subscriber Author: Petr Jelinek <pjmodos@pjmodos.net>	2017-03-24 14:52:03 -04:00
Alvaro Herrera	8082bea2b0	Fix use-after-free bug Detected by buildfarm member prion	2017-03-24 15:43:15 -03:00
Fujii Masao	70adf2fbe1	Make VACUUM VERBOSE report the number of skipped frozen pages. Previously manual VACUUM did not report the number of skipped frozen pages even when VERBOSE option is specified. But this information is helpful to monitor the VACUUM activity, and also autovacuum reports that number in the log file when the condition of log_autovacuum_min_duration is met. This commit changes VACUUM VERBOSE so that it reports the number of frozen pages that it skips. Author: Masahiko Sawada Reviewed-by: Yugo Nagata and Jim Nasby Discussion: http://postgr.es/m/CAD21AoDZQKCxo0L39Mrq08cONNkXQKXuh=2DP1Q8ebmt35SoaA@mail.gmail.com	2017-03-25 02:39:44 +09:00
Alvaro Herrera	7b504eb282	Implement multivariate n-distinct coefficients Add support for explicitly declared statistic objects (CREATE STATISTICS), allowing collection of statistics on more complex combinations that individual table columns. Companion commands DROP STATISTICS and ALTER STATISTICS ... OWNER TO / SET SCHEMA / RENAME are added too. All this DDL has been designed so that more statistic types can be added later on, such as multivariate most-common-values and multivariate histograms between columns of a single table, leaving room for permitting columns on multiple tables, too, as well as expressions. This commit only adds support for collection of n-distinct coefficient on user-specified sets of columns in a single table. This is useful to estimate number of distinct groups in GROUP BY and DISTINCT clauses; estimation errors there can cause over-allocation of memory in hashed aggregates, for instance, so it's a worthwhile problem to solve. A new special pseudo-type pg_ndistinct is used. (num-distinct estimation was deemed sufficiently useful by itself that this is worthwhile even if no further statistic types are added immediately; so much so that another version of essentially the same functionality was submitted by Kyotaro Horiguchi: https://postgr.es/m/20150828.173334.114731693.horiguchi.kyotaro@lab.ntt.co.jp though this commit does not use that code.) Author: Tomas Vondra. Some code rework by Álvaro. Reviewed-by: Dean Rasheed, David Rowley, Kyotaro Horiguchi, Jeff Janes, Ideriha Takeshi Discussion: https://postgr.es/m/543AFA15.4080608@fuzzy.cz https://postgr.es/m/20170320190220.ixlaueanxegqd5gr@alvherre.pgsql	2017-03-24 14:06:10 -03:00
Peter Eisentraut	eccfef81e1	ICU support Add a column collprovider to pg_collation that determines which library provides the collation data. The existing choices are default and libc, and this adds an icu choice, which uses the ICU4C library. The pg_locale_t type is changed to a union that contains the provider-specific locale handles. Users of locale information are changed to look into that struct for the appropriate handle to use. Also add a collversion column that records the version of the collation when it is created, and check at run time whether it is still the same. This detects potentially incompatible library upgrades that can corrupt indexes and other structures. This is currently only supported by ICU-provided collations. initdb initializes the default collation set as before from the `locale -a` output but also adds all available ICU locales with a "-x-icu" appended. Currently, ICU-provided collations can only be explicitly named collations. The global database locales are still always libc-provided. ICU support is enabled by configure --with-icu. Reviewed-by: Thomas Munro <thomas.munro@enterprisedb.com> Reviewed-by: Andreas Karlsson <andreas@proxel.se>	2017-03-23 15:28:48 -04:00
Robert Haas	ea42cc18c3	Track the oldest XID that can be safely looked up in CLOG. This provides infrastructure for looking up arbitrary, user-supplied XIDs without a risk of scary-looking failures from within the clog module. Normally, the oldest XID that can be safely looked up in CLOG is the same as the oldest XID that can reused without causing wraparound, and the latter is already tracked. However, while truncation is in progress, the values are different, so we must keep track of them separately. Craig Ringer, reviewed by Simon Riggs and by me. Discussion: http://postgr.es/m/CAMsr+YHQiWNEi0daCTboS40T+V5s_+dst3PYv_8v2wNVH+Xx4g@mail.gmail.com	2017-03-23 14:26:31 -04:00
Robert Haas	691b8d5928	Allow for parallel execution whenever ExecutorRun() is done only once. Previously, it was unsafe to execute a plan in parallel if ExecutorRun() might be called with a non-zero row count. However, it's quite easy to fix things up so that we can support that case, provided that it is known that we will never call ExecutorRun() a second time for the same QueryDesc. Add infrastructure to signal this, and cross-checks to make sure that a caller who claims this is true doesn't later reneg. While that pattern never happens with queries received directly from a client -- there's no way to know whether multiple Execute messages will be sent unless the first one requests all the rows -- it's pretty common for queries originating from procedural languages, which often limit the result to a single tuple or to a user-specified number of tuples. This commit doesn't actually enable parallelism in any additional cases, because currently none of the places that would be able to benefit from this infrastructure pass CURSOR_OPT_PARALLEL_OK in the first place, but it makes it much more palatable to pass CURSOR_OPT_PARALLEL_OK in places where we currently don't, because it eliminates some cases where we'd end up having to run the parallel plan serially. Patch by me, based on some ideas from Rafia Sabih and corrected by Rafia Sabih based on feedback from Dilip Kumar and myself. Discussion: http://postgr.es/m/CA+TgmobXEhvHbJtWDuPZM9bVSLiTj-kShxQJ2uM5GPDze9fRYA@mail.gmail.com	2017-03-23 13:14:36 -04:00
Peter Eisentraut	7c4f52409a	Logical replication support for initial data copy Add functionality for a new subscription to copy the initial data in the tables and then sync with the ongoing apply process. For the copying, add a new internal COPY option to have the COPY source data provided by a callback function. The initial data copy works on the subscriber by receiving COPY data from the publisher and then providing it locally into a COPY that writes to the destination table. A WAL receiver can now execute full SQL commands. This is used here to obtain information about tables and publications. Several new options were added to CREATE and ALTER SUBSCRIPTION to control whether and when initial table syncing happens. Change pg_dump option --no-create-subscription-slots to --no-subscription-connect and use the new CREATE SUBSCRIPTION ... NOCONNECT option for that. Author: Petr Jelinek <petr.jelinek@2ndquadrant.com> Tested-by: Erik Rijkers <er@xs4all.nl>	2017-03-23 08:55:37 -04:00
Magnus Hagander	707576b571	Fix grammar in comment Author: Emil Iggland	2017-03-23 10:14:42 +01:00
Simon Riggs	af4b1a0869	Refactor GetOldestXmin() to use flags Replace ignoreVacuum parameter with more flexible flags. Author: Eiji Seki Review: Haribabu Kommi	2017-03-22 16:51:01 +00:00
Peter Eisentraut	4cfc9484d4	Refine rules for altering publication owner Previously, the new owner had to be a superuser. The new rules are more refined similar to other objects. Reviewed-by: Petr Jelinek <petr.jelinek@2ndquadrant.com>	2017-03-22 11:19:30 -04:00
Simon Riggs	9b013dc238	Improve performance of replay of AccessExclusiveLocks A hot standby replica keeps a list of Access Exclusive locks for a top level transaction. These locks are released when the top level transaction ends. Searching of this list is O(N^2), and each transaction had to pay the price of searching this list for locks, even if it didn't take any AE locks itself. This patch optimizes this case by having the master server track which transactions took AE locks, and passes that along to the standby server in the commit/abort record. This allows the standby to only try to release locks for transactions which actually took any, avoiding the majority of the performance issue. Refactor MyXactAccessedTempRel into MyXactFlags to allow minimal additional cruft with this. Analysis and initial patch by David Rowley Author: David Rowley and Simon Riggs	2017-03-22 13:09:36 +00:00
Andrew Dunstan	b6fb534f10	Add IF NOT EXISTS for CREATE SERVER and CREATE USER MAPPING There is still some inconsistency with the error messages surrounding foreign servers. Some use the word "foreign" and some don't. My inclination is to remove all such uses of "foreign" on the basis that the CREATE/ALTER/DROP SERVER commands don't use the word. However, that is left for another day. In this patch I have kept to the existing usage in the affected commands, which omits "foreign". Anastasia Lubennikova, reviewed by Arthur Zakirov and Ashtosh Bapat. Discussion: http://postgr.es/m/7c2ab9b8-388a-1ce0-23a3-7acf2a0ed3c6@postgrespro.ru	2017-03-20 16:40:45 -04:00
Andrew Dunstan	839cb0649a	Use a consistent error message style for user mappings. User mappings are essentially anonymous, so messages referring to "user mapping foo on server bar" are wrong, and inconsistent with other error messages referring to user mappings. To be consistent with existing use, use "user mapping for foo on server bar" instead. I dropped the noise word "user" from the original suggestion to be consistent with other uses. Discussion: http://postgr.es/m/56c6f8ab-b2d6-f1fa-deb0-1d18cf67f7b9@2ndQuadrant.com	2017-03-20 16:01:45 -04:00
Tom Lane	e3044f6184	Avoid use of already-closed relcache entry. Oversight in commit `17f8ffa1e`. Per buildfarm member prion.	2017-03-18 18:43:06 -04:00
Tom Lane	17f8ffa1e3	Fix REFRESH MATERIALIZED VIEW to report activity to the stats collector. The non-concurrent code path for REFRESH MATERIALIZED VIEW failed to report its updates to the stats collector. This is bad since it means auto-analyze doesn't know there's any work to be done. Adjust it to report the refresh as a table truncate followed by insertion of an appropriate number of rows. Since a matview could contain more than INT_MAX rows, change the signature of pgstat_count_heap_insert() to accept an int64 rowcount. (The accumulator it's adding into is already int64, but existing callers could not insert more than a small number of rows at once, so the argument had been declared just "int n".) This is surely a bug fix, but changing pgstat_count_heap_insert()'s API seems too risky for the back branches. Given the lack of previous complaints, I'm not sure it's a big enough problem to justify a kluge solution that would avoid that. So, no back-patch, at least for now. Jim Mlodgenski, adjusted a bit by me Discussion: https://postgr.es/m/CAB_5SRchSz7-WmdO5szdiknG8Oj_GGqJytrk1KRd11yhcMs1KQ@mail.gmail.com	2017-03-18 17:49:39 -04:00
Robert Haas	88e66d193f	Rename "pg_clog" directory to "pg_xact". Names containing the letters "log" sometimes confuse users into believing that only non-critical data is present. It is hoped this renaming will discourage ill-considered removals of transaction status data. Michael Paquier Discussion: http://postgr.es/m/CA+Tgmoa9xFQyjRZupbdEFuwUerFTvC6HjZq1ud6GYragGDFFgA@mail.gmail.com	2017-03-17 09:48:38 -04:00
Andrew Gierth	1914c5ea7d	Avoid having vacuum set reltuples to 0 on non-empty relations in the presence of page pins, which leads to serious estimation errors in the planner. This particularly affects small heavily-accessed tables, especially where locking (e.g. from FK constraints) forces frequent vacuums for mxid cleanup. Fix by keeping separate track of pages whose live tuples were actually counted vs. pages that were only scanned for freezing purposes. Thus, reltuples can only be set to 0 if all pages of the relation were actually counted. Backpatch to all supported versions. Per bug #14057 from Nicolas Baccelli, analyzed by me. Discussion: https://postgr.es/m/20160331103739.8956.94469@wrigleys.postgresql.org	2017-03-16 22:28:03 +00:00
Peter Eisentraut	eb4da3e380	Add option to control snapshot export to CREATE_REPLICATION_SLOT We used to export snapshots unconditionally in CREATE_REPLICATION_SLOT in the replication protocol, but several upcoming patches want more control over what happens. Suppress snapshot export in pg_recvlogical, which neither needs nor can use the exported snapshot. Since snapshot exporting can fail this improves reliability. This also paves the way for allowing the creation of replication slots on standbys, which cannot export snapshots because they cannot allocate new XIDs. Author: Petr Jelinek <petr.jelinek@2ndquadrant.com>	2017-03-14 17:34:22 -04:00
Robert Haas	c11453ce0a	hash: Add write-ahead logging support. The warning about hash indexes not being write-ahead logged and their use being discouraged has been removed. "snapshot too old" is now supported for tables with hash indexes. Most importantly, barring bugs, hash indexes will now be crash-safe and usable on standbys. This commit doesn't yet add WAL consistency checking for hash indexes, as we now have for other index types; a separate patch has been submitted to cure that lack. Amit Kapila, reviewed and slightly modified by me. The larger patch series of which this is a part has been reviewed and tested by Álvaro Herrera, Ashutosh Sharma, Mark Kirkwood, Jeff Janes, and Jesper Pedersen. Discussion: http://postgr.es/m/CAA4eK1JOBX=YU33631Qh-XivYXtPSALh514+jR8XeD7v+K3r_Q@mail.gmail.com	2017-03-14 13:27:02 -04:00
Peter Eisentraut	f97a028d8e	Spelling fixes in code comments From: Josh Soref <jsoref@gmail.com>	2017-03-14 12:58:39 -04:00
Tom Lane	5ed6fff6b7	Make logging about multixact wraparound protection less chatty. The original messaging design, introduced in commit `068cfadf9`, seems too chatty now that some time has elapsed since the bug fix; most installations will be in good shape and don't really need a reminder about this on every postmaster start. Hence, arrange to suppress the "wraparound protections are now enabled" message during startup (specifically, during the TrimMultiXact() call). The message will still appear if protection becomes effective at some later point. Discussion: https://postgr.es/m/17211.1489189214@sss.pgh.pa.us	2017-03-14 12:47:53 -04:00
Noah Misch	3a0d473192	Use wrappers of PG_DETOAST_DATUM_PACKED() more. This makes almost all core code follow the policy introduced in the previous commit. Specific decisions: - Text search support functions with char* and length arguments, such as prsstart and lexize, may receive unaligned strings. I doubt maintainers of non-core text search code will notice. - Use plain VARDATA() on values detoasted or synthesized earlier in the same function. Use VARDATA_ANY() on varlenas sourced outside the function, even if they happen to always have four-byte headers. As an exception, retain the universal practice of using VARDATA() on return values of SendFunctionCall(). - Retain PG_GETARG_BYTEA_P() in pageinspect. (Page images are too large for a one-byte header, so this misses no optimization.) Sites that do not call get_page_from_raw() typically need the four-byte alignment. - For now, do not change btree_gist. Its use of four-byte headers in memory is partly entangled with storage of 4-byte headers inside GBT_VARKEY, on disk. - For now, do not change gtrgm_consistent() or gtrgm_distance(). They incorporate the varlena header into a cache, and there are multiple credible implementation strategies to consider.	2017-03-12 19:35:34 -04:00
Peter Eisentraut	cd603a4d6b	Use SQL standard error code for nextval	2017-03-09 10:56:44 -05:00
Robert Haas	355d3993c5	Add a Gather Merge executor node. Like Gather, we spawn multiple workers and run the same plan in each one; however, Gather Merge is used when each worker produces the same output ordering and we want to preserve that output ordering while merging together the streams of tuples from various workers. (In a way, Gather Merge is like a hybrid of Gather and MergeAppend.) This works out to a win if it saves us from having to perform an expensive Sort. In cases where only a small amount of data would need to be sorted, it may actually be faster to use a regular Gather node and then sort the results afterward, because Gather Merge sometimes needs to wait synchronously for tuples whereas a pure Gather generally doesn't. But if this avoids an expensive sort then it's a win. Rushabh Lathia, reviewed and tested by Amit Kapila, Thomas Munro, and Neha Sharma, and reviewed and revised by me. Discussion: http://postgr.es/m/CAGPqQf09oPX-cQRpBKS0Gq49Z+m6KBxgxd_p9gX8CKk_d75HoQ@mail.gmail.com	2017-03-09 07:49:29 -05:00
Stephen Frost	f9b1a0dd40	Expose explain's SUMMARY option This exposes the existing explain summary option to users to allow them to choose if they wish to have the planning time and totalled run time included in the EXPLAIN result. The existing default behavior is retained if SUMMARY is not specified- running explain without analyze will not print the summary lines (just the planning time, currently) while running explain with analyze will include the summary lines (both the planning time and the totalled execution time). Users who wish to see the summary information for plain explain can now use: EXPLAIN (SUMMARY ON) query; Users who do not want to have the summary printed for an analyze run can use: EXPLAIN (ANALYZE ON, SUMMARY OFF) query; With this, we can now also have EXPLAIN ANALYZE queries included in our regression tests by using: EXPLAIN (ANALYZE ON, TIMING OFF, SUMMARY off) query; I went ahead and added an example of this, which will hopefully not make the buildfarm complain. Author: Ashutosh Bapat Discussion: https://postgr.es/m/CAFjFpReE5z2h98U2Vuia8hcEkpRRwrauRjHmyE44hNv8-xk+XA@mail.gmail.com	2017-03-08 15:14:03 -05:00
Fujii Masao	4eafdcc276	Prevent logical rep workers with removed subscriptions from starting. Any logical rep workers must have their subscription entries in pg_subscription. To ensure this, we need to prevent the launcher from starting new worker corresponding to the subscription that DROP SUBSCRIPTION command is removing. To implement this, previously LogicalRepLauncherLock was introduced and held until the end of transaction running DROP SUBSCRIPTION. But using LWLock for that purpose was not valid. Instead, this commit changes DROP SUBSCRIPTION so that it takes AccessExclusiveLock on pg_subscription, in order to ensure that the launcher cannot see any subscriptions being removed. Also this commit gets rid of LogicalRepLauncherLock. Patch by me, reviewed by Petr Jelinek Discussion: https://www.postgresql.org/message-id/CAHGQGwHPi8ky-yANFfe0sgmhKtsYcQLTnKx07bW9S7-Rn1746w@mail.gmail.com	2017-03-09 01:44:23 +09:00
Alvaro Herrera	fcec6caafa	Support XMLTABLE query expression XMLTABLE is defined by the SQL/XML standard as a feature that allows turning XML-formatted data into relational form, so that it can be used as a <table primary> in the FROM clause of a query. This new construct provides significant simplicity and performance benefit for XML data processing; what in a client-side custom implementation was reported to take 20 minutes can be executed in 400ms using XMLTABLE. (The same functionality was said to take 10 seconds using nested PostgreSQL XPath function calls, and 5 seconds using XMLReader under PL/Python). The implemented syntax deviates slightly from what the standard requires. First, the standard indicates that the PASSING clause is optional and that multiple XML input documents may be given to it; we make it mandatory and accept a single document only. Second, we don't currently support a default namespace to be specified. This implementation relies on a new executor node based on a hardcoded method table. (Because the grammar is fixed, there is no extensibility in the current approach; further constructs can be implemented on top of this such as JSON_TABLE, but they require changes to core code.) Author: Pavel Stehule, Álvaro Herrera Extensively reviewed by: Craig Ringer Discussion: https://postgr.es/m/CAFj8pRAgfzMD-LoSmnMGybD0WsEznLHWap8DO79+-GTRAPR4qA@mail.gmail.com	2017-03-08 12:40:26 -03:00
Fujii Masao	77d21970ae	Fix connection leak in DROP SUBSCRIPTION command, take 2. Commit `898a792eb8` fixed the connection leak issue, but it was an unreliable way of bugfix. This bugfix was assuming that walrcv_command() subroutine cannot throw an error, but it's untenable assumption. For example, if it will be changed so that an error is thrown, connection leak issue will happen again. This patch ensures that the connection is closed even when walrcv_command() subroutine throws an error. Patch by me, reviewed by Petr Jelinek and Michael Paquier Discussion: https://www.postgresql.org/message-id/2058.1487704345@sss.pgh.pa.us	2017-03-08 23:43:38 +09:00
Robert Haas	d88d06cd07	Fix relcache reference leak. Reported by Kevin Grittner. Faulty commit identified by Tom Lane. Patch by Amit Langote, reviewed by Michael Paquier. Discussion: http://postgr.es/m/CACjxUsOHbH1=99u8mGxmLHfy5hov4ENEpvM6=3ARjos7wG7rtQ@mail.gmail.com	2017-03-07 11:27:21 -05:00
Heikki Linnakangas	818fd4a67d	Support SCRAM-SHA-256 authentication (RFC 5802 and 7677). This introduces a new generic SASL authentication method, similar to the GSS and SSPI methods. The server first tells the client which SASL authentication mechanism to use, and then the mechanism-specific SASL messages are exchanged in AuthenticationSASLcontinue and PasswordMessage messages. Only SCRAM-SHA-256 is supported at the moment, but this allows adding more SASL mechanisms in the future, without changing the overall protocol. Support for channel binding, aka SCRAM-SHA-256-PLUS is left for later. The SASLPrep algorithm, for pre-processing the password, is not yet implemented. That could cause trouble, if you use a password with non-ASCII characters, and a client library that does implement SASLprep. That will hopefully be added later. Authorization identities, as specified in the SCRAM-SHA-256 specification, are ignored. SET SESSION AUTHORIZATION provides more or less the same functionality, anyway. If a user doesn't exist, perform a "mock" authentication, by constructing an authentic-looking challenge on the fly. The challenge is derived from a new system-wide random value, "mock authentication nonce", which is created at initdb, and stored in the control file. We go through these motions, in order to not give away the information on whether the user exists, to unauthenticated users. Bumps PG_CONTROL_VERSION, because of the new field in control file. Patch by Michael Paquier and Heikki Linnakangas, reviewed at different stages by Robert Haas, Stephen Frost, David Steele, Aleksander Alekseev, and many others. Discussion: https://www.postgresql.org/message-id/CAB7nPqRbR3GmFYdedCAhzukfKrgBLTLtMvENOmPrVWREsZkF8g%40mail.gmail.com Discussion: https://www.postgresql.org/message-id/CAB7nPqSMXU35g%3DW9X74HVeQp0uvgJxvYOuA4A-A3M%2B0wfEBv-w%40mail.gmail.com Discussion: https://www.postgresql.org/message-id/55192AFE.6080106@iki.fi	2017-03-07 14:25:40 +02:00
Tom Lane	a8df75b0a4	Avoid dangling pointer to relation name in RLS code path in DoCopy(). With RLS active, "COPY tab TO ..." failed under -DRELCACHE_FORCE_RELEASE, and would sometimes fail without that, because it used the relation name directly from the relcache as part of the parsetree it's building. That becomes a potentially-dangling pointer as soon as the relcache entry is closed, a bit further down. Typical symptom if the relcache entry chanced to get cleared would be "relation does not exist" error with a garbage relation name, or possibly a core dump; but if you were really truly unlucky, the COPY might copy from the wrong table. Per report from Andrew Dunstan that regression tests fail with -DRELCACHE_FORCE_RELEASE. The core tests now pass for me (but have not tried "make check-world" yet). Discussion: https://postgr.es/m/7b52f900-0579-cda9-ae2e-de5da17090e6@2ndQuadrant.com	2017-03-06 16:50:47 -05:00
Peter Eisentraut	2ca64c6f71	Replace LookupFuncNameTypeNames() with LookupFuncWithArgs() The old function took function name and function argument list as separate arguments. Now that all function signatures are passed around as ObjectWithArgs structs, this is no longer necessary and can be replaced by a function that takes ObjectWithArgs directly. Similarly for aggregates and operators. Reviewed-by: Jim Nasby <Jim.Nasby@BlueTreble.com> Reviewed-by: Michael Paquier <michael.paquier@gmail.com>	2017-03-06 13:31:47 -05:00
Peter Eisentraut	8b6d6cf853	Remove objname/objargs split for referring to objects In simpler times, it might have worked to refer to all kinds of objects by a list of name components and an optional argument list. But this doesn't work for all objects, which has resulted in a collection of hacks to place various other nodes types into these fields, which have to be unpacked at the other end. This makes it also weird to represent lists of such things in the grammar, because they would have to be lists of singleton lists, to make the unpacking work consistently. The other problem is that keeping separate name and args fields makes it awkward to deal with lists of functions. Change that by dropping the objargs field and have objname, renamed to object, be a generic Node, which can then be flexibly assigned and managed using the normal Node mechanisms. In many cases it will still be a List of names, in some cases it will be a string Value, for types it will be the existing Typename, for functions it will now use the existing ObjectWithArgs node type. Some of the more obscure object types still use somewhat arbitrary nested lists. Reviewed-by: Jim Nasby <Jim.Nasby@BlueTreble.com> Reviewed-by: Michael Paquier <michael.paquier@gmail.com>	2017-03-06 13:31:47 -05:00
Peter Eisentraut	550214a4ef	Add operator_with_argtypes grammar rule This makes the handling of operators similar to that of functions and aggregates. Rename node FuncWithArgs to ObjectWithArgs, to reflect the expanded use. Reviewed-by: Jim Nasby <Jim.Nasby@BlueTreble.com> Reviewed-by: Michael Paquier <michael.paquier@gmail.com>	2017-03-06 13:31:47 -05:00
Peter Eisentraut	63ebd377a6	Use class_args field in opclass_drop This makes it consistent with the usage in opclass_item. Reviewed-by: Jim Nasby <Jim.Nasby@BlueTreble.com> Reviewed-by: Michael Paquier <michael.paquier@gmail.com>	2017-03-06 13:31:47 -05:00
Simon Riggs	8b4d582d27	Allow partitioned tables to be dropped without CASCADE Record partitioned table dependencies as DEPENDENCY_AUTO rather than DEPENDENCY_NORMAL, so that DROP TABLE just works. Remove all the tests for partitioned tables where earlier work had deliberately avoided using CASCADE. Amit Langote, reviewed by Ashutosh Bapat and myself	2017-03-06 15:50:53 +05:30
Tom Lane	dbca84f04e	In rebuild_relation(), don't access an already-closed relcache entry. This reliably fails with -DRELCACHE_FORCE_RELEASE, as reported by Andrew Dunstan, and could sometimes fail in normal operation, resulting in a wrong persistence value being used for the transient table. It's not immediately clear to me what effects that might have beyond the risk of a crash while accessing OldHeap->rd_rel->relpersistence, but it's probably not good. Bug introduced by commit `f41872d0c`, and made substantially worse by commit `85b506bbf`, which added a second such access significantly later than the heap_close. I doubt the first reference could fail in a production scenario, but the second one definitely could. Discussion: https://postgr.es/m/7b52f900-0579-cda9-ae2e-de5da17090e6@2ndQuadrant.com	2017-03-04 16:09:33 -05:00
Peter Eisentraut	272adf4f9c	Disallow CREATE/DROP SUBSCRIPTION in transaction block Disallow CREATE SUBSCRIPTION and DROP SUBSCRIPTION in a transaction block when the replication slot is to be created or dropped, since that cannot be rolled back. based on patch by Masahiko Sawada <sawada.mshk@gmail.com>	2017-03-03 23:29:13 -05:00
Peter Eisentraut	2357c12b49	Fix typo	2017-03-03 18:21:06 -05:00
Peter Eisentraut	6da9759a03	Add RENAME support for PUBLICATIONs and SUBSCRIPTIONs From: Petr Jelinek <petr.jelinek@2ndquadrant.com>	2017-03-03 10:47:04 -05:00
Simon Riggs	9eb344faf5	Allow vacuums to report oldestxmin Allow VACUUM and Autovacuum to report the oldestxmin value they used while cleaning tables, helping to make better sense out of the other statistics we report in various cases.	2017-03-03 19:18:25 +05:30
Robert Haas	3c3bb99330	Don't uselessly rewrite, truncate, VACUUM, or ANALYZE partitioned tables. Also, recursively perform VACUUM and ANALYZE on partitions when the command is applied to a partitioned table. In passing, some related documentation updates. Amit Langote, reviewed by Michael Paquier, Ashutosh Bapat, and by me. Discussion: http://postgr.es/m/47288cf1-f72c-dfc2-5ff0-4af962ae5c1b@lab.ntt.co.jp	2017-03-02 17:23:44 +05:30
Tom Lane	9e3755ecb2	Remove useless duplicate inclusions of system header files. c.h #includes a number of core libc header files, such as <stdio.h>. There's no point in re-including these after having read postgres.h, postgres_fe.h, or c.h; so remove code that did so. While at it, also fix some places that were ignoring our standard pattern of "include postgres[_fe].h, then system header files, then other Postgres header files". While there's not any great magic in doing it that way rather than system headers last, it's silly to have just a few files deviating from the general pattern. (But I didn't attempt to enforce this globally, only in files I was touching anyway.) I'd be the first to say that this is mostly compulsive neatnik-ism, but over time it might save enough compile cycles to be useful.	2017-02-25 16:12:55 -05:00
Tom Lane	c29aff959d	Consistently declare timestamp variables as TimestampTz. Twiddle the replication-related code so that its timestamp variables are declared TimestampTz, rather than the uninformative "int64" that was previously used for meant-to-be-always-integer timestamps. This resolves the int64-vs-TimestampTz declaration inconsistencies introduced by commit `7c030783a`, though in the opposite direction to what was originally suggested. This required including datatype/timestamp.h in a couple more places than before. I decided it would be a good idea to slim down that header by not having it pull in <float.h> etc, as those headers are no longer at all relevant to its purpose. Unsurprisingly, a small number of .c files turn out to have been depending on those inclusions, so add them back in the .c files as needed. Discussion: https://postgr.es/m/26788.1487455319@sss.pgh.pa.us Discussion: https://postgr.es/m/27694.1487456324@sss.pgh.pa.us	2017-02-23 15:57:08 -05:00
Tom Lane	b9d092c962	Remove now-dead code for !HAVE_INT64_TIMESTAMP. This is a basically mechanical removal of #ifdef HAVE_INT64_TIMESTAMP tests and the negative-case controlled code. Discussion: https://postgr.es/m/26788.1487455319@sss.pgh.pa.us	2017-02-23 14:04:43 -05:00
Peter Eisentraut	74321d87fb	Fix whitespace	2017-02-21 15:44:07 -05:00
Fujii Masao	898a792eb8	Fix connection leak in DROP SUBSCRIPTION command. Previously the command forgot to close the connection to the publisher when it failed to drop the replication slot.	2017-02-22 03:36:02 +09:00
Peter Eisentraut	38d103763d	Make more use of castNode()	2017-02-21 11:59:09 -05:00
Robert Haas	a3dc8e495b	Make partitions automatically inherit OIDs. Previously, if the parent was specified as WITH OIDS, each child also had to be explicitly specified as WITH OIDS. Amit Langote, per a report from Simon Riggs. Some additional work on the documentation changes by me. Discussion: http://postgr.es/m/CANP8+jJBpWocfKrbJcaf3iBt9E3U=WPE_NC8YE6rye+YJ1sYnQ@mail.gmail.com	2017-02-19 21:29:27 +05:30
Peter Eisentraut	6d16ecc646	Add CREATE COLLATION IF NOT EXISTS clause The core of the functionality was already implemented when pg_import_system_collations was added. This just exposes it as an option in the SQL command.	2017-02-15 10:01:28 -05:00
Robert Haas	e28b115612	Don't disallow dropping NOT NULL for a list partition key. Range partitioning doesn't support nulls in the partitioning columns, but list partitioning does. Amit Langote, per a complaint from Amul Sul	2017-02-14 12:13:41 -05:00
Noah Misch	f30f34e589	Ignore tablespace ACLs when ignoring schema ACLs. The ALTER TABLE ALTER TYPE implementation can issue DROP INDEX and CREATE INDEX to refit existing indexes for the new column type. Since this CREATE INDEX is an implementation detail of an index alteration, the ensuing DefineIndex() should skip ACL checks specific to index creation. It already skips the namespace ACL check. Make it skip the tablespace ACL check, too. Back-patch to 9.2 (all supported versions). Reviewed by Tom Lane.	2017-02-12 16:03:41 -05:00
Peter Eisentraut	2ea5b06c7a	Add CREATE SEQUENCE AS <data type> clause This stores a data type, required to be an integer type, with the sequence. The sequences min and max values default to the range supported by the type, and they cannot be set to values exceeding that range. The internal implementation of the sequence is not affected. Change the serial types to create sequences of the appropriate type. This makes sure that the min and max values of the sequence for a serial column match the range of values supported by the table column. So the sequence can no longer overflow the table column. This also makes monitoring for sequence exhaustion/wraparound easier, which currently requires various contortions to cross-reference the sequences with the table columns they are used with. This commit also effectively reverts the pg_sequence column reordering in `f3b421da5f`, because the new seqtypid column allows us to fill the hole in the struct and create a more natural overall column ordering. Reviewed-by: Steve Singer <steve@ssinger.info> Reviewed-by: Michael Paquier <michael.paquier@gmail.com>	2017-02-10 15:34:35 -05:00
Tom Lane	86d911ec0f	Allow index AMs to cache data across aminsert calls within a SQL command. It's always been possible for index AMs to cache data across successive amgettuple calls within a single SQL command: the IndexScanDesc.opaque field is meant for precisely that. However, no comparable facility exists for amortizing setup work across successive aminsert calls. This patch adds such a feature and teaches GIN, GIST, and BRIN to use it to amortize catalog lookups they'd previously been doing on every call. (The other standard index AMs keep everything they need in the relcache, so there's little to improve there.) For GIN, the overall improvement in a statement that inserts many rows can be as much as 10%, though it seems a bit less for the other two. In addition, this makes a really significant difference in runtime for CLOBBER_CACHE_ALWAYS tests, since in those builds the repeated catalog lookups are vastly more expensive. The reason this has been hard up to now is that the aminsert function is not passed any useful place to cache per-statement data. What I chose to do is to add suitable fields to struct IndexInfo and pass that to aminsert. That's not widening the index AM API very much because IndexInfo is already within the ken of ambuild; in fact, by passing the same info to aminsert as to ambuild, this is really removing an inconsistency in the AM API. Discussion: https://postgr.es/m/27568.1486508680@sss.pgh.pa.us	2017-02-09 11:52:12 -05:00
Robert Haas	a507b86900	Add WAL consistency checking facility. When the new GUC wal_consistency_checking is set to a non-empty value, it triggers recording of additional full-page images, which are compared on the standby against the results of applying the WAL record (without regard to those full-page images). Allowable differences such as hints are masked out, and the resulting pages are compared; any difference results in a FATAL error on the standby. Kuntal Ghosh, based on earlier patches by Michael Paquier and Heikki Linnakangas. Extensively reviewed and revised by Michael Paquier and by me, with additional reviews and comments from Amit Kapila, Álvaro Herrera, Simon Riggs, and Peter Eisentraut.	2017-02-08 15:45:30 -05:00
Heikki Linnakangas	181bdb90ba	Fix typos in comments. Backpatch to all supported versions, where applicable, to make backpatching of future fixes go more smoothly. Josh Soref Discussion: https://www.postgresql.org/message-id/CACZqfqCf+5qRztLPgmmosr-B0Ye4srWzzw_mo4c_8_B_mtjmJQ@mail.gmail.com	2017-02-06 11:33:58 +02:00
Fujii Masao	39b8cc991f	Be sure to release LogicalRepLauncherLock in DROP SUBSCRIPTION command. Previously DROP SUBSCRIPTION command forgot to release the lock at all. Original patches by Kyotaro Horiguchi and Michael Paquier, but I didn't use them. Discussion: http://postgr.es/m/20170201.173623.66249355.horiguchi.kyotaro@lab.ntt.co.jp	2017-02-04 03:18:13 +09:00
Tom Lane	aedd554f84	Fix CatalogTupleInsert/Update abstraction for case of shared indstate. Add CatalogTupleInsertWithInfo and CatalogTupleUpdateWithInfo to let callers use the CatalogTupleXXX abstraction layer even in cases where we want to share the results of CatalogOpenIndexes across multiple inserts/updates for efficiency. This finishes the job begun in commit `2f5c9d9c9`, by allowing some remaining simple_heap_insert/update calls to be replaced. The abstraction layer is now complete enough that we don't have to export CatalogIndexInsert at all anymore. Also, this fixes several places in which `2f5c9d9c9` introduced performance regressions by using retail CatalogTupleInsert or CatalogTupleUpdate even though the previous coding had been able to amortize CatalogOpenIndexes work across multiple tuples. A possible future improvement is to arrange for the indexing.c functions to cache the CatalogIndexState somewhere, maybe in the relcache, in which case we could get rid of CatalogTupleInsertWithInfo and CatalogTupleUpdateWithInfo again. But that's a task for another day. Discussion: https://postgr.es/m/27502.1485981379@sss.pgh.pa.us	2017-02-01 17:18:36 -05:00
Tom Lane	ab02896510	Provide CatalogTupleDelete() as a wrapper around simple_heap_delete(). This extends the work done in commit `2f5c9d9c9` to provide a more nearly complete abstraction layer hiding the details of index updating for catalog changes. That commit only invented abstractions for catalog inserts and updates, leaving nearby code for catalog deletes still calling the heap-level routines directly. That seems rather ugly from here, and it does little to help if we ever want to shift to a storage system in which indexing work is needed at delete time. Hence, create a wrapper function CatalogTupleDelete(), and replace calls of simple_heap_delete() on catalog tuples with it. There are now very few direct calls of [simple_]heap_delete remaining in the tree. Discussion: https://postgr.es/m/462.1485902736@sss.pgh.pa.us	2017-02-01 16:13:30 -05:00
Heikki Linnakangas	dbd69118c0	Replace isMD5() with a more future-proof way to check if pw is encrypted. The rule is that if pg_authid.rolpassword begins with "md5" and has the right length, it's an MD5 hash, otherwise it's a plaintext password. The idiom has been to use isMD5() to check for that, but that gets awkward, when we add new kinds of verifiers, like the verifiers for SCRAM authentication in the pending SCRAM patch set. Replace isMD5() with a new get_password_type() function, so that when new verifier types are added, we don't need to remember to modify every place that currently calls isMD5(), to also recognize the new kinds of verifiers. Also, use the new plain_crypt_verify function in passwordcheck, so that it doesn't need to know about MD5, or in the future, about other kinds of hashes or password verifiers. Reviewed by Michael Paquier and Peter Eisentraut. Discussion: https://www.postgresql.org/message-id/2d07165c-1793-e243-a2a9-e45b624c7580@iki.fi	2017-02-01 13:11:37 +02:00
Alvaro Herrera	2f5c9d9c9c	Tweak catalog indexing abstraction for upcoming WARM Split the existing CatalogUpdateIndexes into two different routines, CatalogTupleInsert and CatalogTupleUpdate, which do both the heap insert/update plus the index update. This removes over 300 lines of boilerplate code all over src/backend/catalog/ and src/backend/commands. The resulting code is much more pleasing to the eye. Also, by encapsulating what happens in detail during an UPDATE, this facilitates the upcoming WARM patch, which is going to add a few more lines to the update case making the boilerplate even more boring. The original CatalogUpdateIndexes is removed; there was only one use left, and since it's just three lines, we can as well expand it in place there. We could keep it, but WARM is going to break all the UPDATE out-of-core callsites anyway, so there seems to be no benefit in doing so. Author: Pavan Deolasee Discussion: https://www.postgr.es/m/CABOikdOcFYSZ4vA2gYfs=M2cdXzXX4qGHeEiW3fu9PCfkHLa2A@mail.gmail.com	2017-01-31 18:42:24 -03:00
Stephen Frost	e54f75722c	Handle ALTER EXTENSION ADD/DROP with pg_init_privs In commit `6c268df`, pg_init_privs was added to track the initial privileges of catalog objects and extensions. Unfortunately, that commit didn't include understanding of ALTER EXTENSION ADD/DROP, which allows the objects associated with an extension to be changed after the initial CREATE EXTENSION script has been run. The result of this meant that ACLs for objects added through ALTER EXTENSION ADD were not recorded into pg_init_privs and we would end up including those ACLs in pg_dump when we shouldn't have. This commit corrects that by making sure to have pg_init_privs updated when ALTER EXTENSION ADD/DROP is run, recording the permissions as they are at ALTER EXTENSION ADD time, and removing any if/when ALTER EXTENSION DROP is called. This issue was pointed out by Moshe Jacobson as commentary on bug #14456 (which was actually a bug about versions prior to 9.6 not handling custom ACLs on extensions correctly, an issue now addressed with pg_init_privs in 9.6). Back-patch to 9.6 where pg_init_privs was introduced.	2017-01-29 23:05:07 -05:00
Tom Lane	7afd56c3c6	Use castNode() in a bunch of statement-list-related code. When I wrote commit `ab1f0c822`, I really missed the castNode() macro that Peter E. had proposed shortly before. This back-fills the uses I would have put it to. It's probably not all that significant, but there are more assertions here than there were before, and conceivably they will help catch any bugs associated with those representation changes. I left behind a number of usages like "(Query *) copyObject(query_var)". Those could have been converted as well, but Peter has proposed another notational improvement that would handle copyObject cases automatically, so I let that be for now.	2017-01-26 22:09:34 -05:00
Andres Freund	9ba8a9ce45	Use the new castNode() macro in a number of places. This is far from a pervasive conversion, but it's a good starting point. Author: Peter Eisentraut, with some minor changes by me Reviewed-By: Tom Lane Discussion: https://postgr.es/m/c5d387d9-3440-f5e0-f9d4-71d53b9fbe52@2ndquadrant.com	2017-01-26 16:47:03 -08:00
Peter Eisentraut	3d9e73ea5f	Update copyright years in some recently added files	2017-01-25 12:32:05 -05:00
Peter Eisentraut	65df150a18	Close replication connection when slot creation errors From: Petr Jelinek <pjmodos@pjmodos.net>	2017-01-25 10:47:53 -05:00
Robert Haas	27cdb3414b	Reindent table partitioning code. We've accumulated quite a bit of stuff with which pgindent is not quite happy in this code; clean it up to provide a less-annoying base for future pgindent runs.	2017-01-24 10:20:02 -05:00
Robert Haas	b1ecb9b3fc	Fix interaction of partitioned tables with BulkInsertState. When copying into a partitioned table, the target heap may change from one tuple to next. We must ask ReadBufferBI() to get a new buffer every time such change occurs. To do that, use new function ReleaseBulkInsertStatePin(). This fixes the bug that tuples ended up being inserted into the wrong partition, which occurred exactly because the wrong buffer was used. Amit Langote, per a suggestion from Robert Haas. Some cosmetic adjustments by me. Reports by 高增琦 (Gao Zengqi), Venkata B Nagothi, and Ragnar Ouchterlony. Discussion: http://postgr.es/m/CAFmBtr32FDOqofo8yG-4mjzL1HnYHxXK5S9OGFJ%3D%3DcJpgEW4vA%40mail.gmail.com Discussion: http://postgr.es/m/CAEyp7J9WiX0L3DoiNcRrY-9iyw%3DqP%2Bj%3DDLsAnNFF1xT2J1ggfQ%40mail.gmail.com Discussion: http://postgr.es/m/16d73804-c9cd-14c5-463e-5caad563ff77%40agama.tv Discussion: http://postgr.es/m/CA+TgmoaiZpDVUUN8LZ4jv1qFE_QyR+H9ec+79f5vNczYarg5Zg@mail.gmail.com	2017-01-24 08:50:16 -05:00
Peter Eisentraut	0bc1207aeb	Fix default minimum value for descending sequences For some reason that is lost in history, a descending sequence would default its minimum value to -2^63+1 (-PG_INT64_MAX) instead of -2^63 (PG_INT64_MIN), even though explicitly specifying a minimum value of -2^63 would work. Fix this inconsistency by using the full range by default. Reported-by: Daniel Verite <daniel@manitou-mail.org> Reviewed-by: Michael Paquier <michael.paquier@gmail.com>	2017-01-23 14:00:58 -05:00
Peter Eisentraut	46d482814c	Don't error when no system locales were found initdb used to warn about that, but it was changed to an error in pg_import_system_locales, but some build farm members failed because of that. Change it back to a warning.	2017-01-23 13:45:32 -05:00
Alvaro Herrera	7e26e02eec	Prefetch blocks during lazy vacuum's truncation scan Vacuum truncation scan can be sped up on rotating media by prefetching blocks in forward direction. That makes the blocks already present in memory by the time they are needed, while also letting OS read-ahead kick in. The truncate scan has been measured to be five times faster than without this patch (that was on a slow disk, but it shouldn't hurt on fast disks.) Author: Álvaro Herrera, loosely based on a submission by Claudio Freire Discussion: https://postgr.es/m/CAGTBQpa6NFGO_6g_y_7zQx8L9GcHDSQKYdo1tGuh791z6PYgEg@mail.gmail.com	2017-01-23 12:55:18 -03:00
Peter Eisentraut	f21a563d25	Move some things from builtins.h to new header files This avoids that builtins.h has to include additional header files.	2017-01-20 20:29:53 -05:00
Alvaro Herrera	50cf1c80e6	Record dependencies on owners for logical replication objects This was forgotten in `665d1fad99` and caused the whole buildfarm to become red for a little while. Author: Petr Jelínek Also fix a typo in a nearby error message.	2017-01-20 16:45:02 -03:00
Peter Eisentraut	665d1fad99	Logical replication - Add PUBLICATION catalogs and DDL - Add SUBSCRIPTION catalog and DDL - Define logical replication protocol and output plugin - Add logical replication workers From: Petr Jelinek <petr@2ndquadrant.com> Reviewed-by: Steve Singer <steve@ssinger.info> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Erik Rijkers <er@xs4all.nl> Reviewed-by: Peter Eisentraut <peter.eisentraut@2ndquadrant.com>	2017-01-20 09:04:49 -05:00
Andres Freund	ea15e18677	Remove obsoleted code relating to targetlist SRF evaluation. Since `69f4b9c` plain expression evaluation (and thus normal projection) can't return sets of tuples anymore. Thus remove code dealing with that possibility. This will require adjustments in external code using ExecEvalExpr()/ExecProject() - that should neither be hard nor very common. Author: Andres Freund and Tom Lane Discussion: https://postgr.es/m/20160822214023.aaxz5l4igypowyri@alap3.anarazel.de	2017-01-19 14:40:41 -08:00
Alvaro Herrera	8eace46d34	Fix race condition in reading commit timestamps If a user requests the commit timestamp for a transaction old enough that its data is concurrently being truncated away by vacuum at just the right time, they would receive an ugly internal file-not-found error message from slru.c rather than the expected NULL return value. In a primary server, the window for the race is very small: the lookup has to occur exactly between the two calls by vacuum, and there's not a lot that happens between them (mostly just a multixact truncate). In a standby server, however, the window is larger because the truncation is executed as soon as the WAL record for it is replayed, but the advance of the oldest-Xid is not executed until the next checkpoint record. To fix in the primary, simply reverse the order of operations in vac_truncate_clog. To fix in the standby, augment the WAL truncation record so that the standby is aware of the new oldest-XID value and can apply the update immediately. WAL version bumped because of this. No backpatch, because of the low importance of the bug and its rarity. Author: Craig Ringer Reviewed-By: Petr Jelínek, Peter Eisentraut Discussion: https://postgr.es/m/CAMsr+YFhVtRQT1VAwC+WGbbxZZRzNou=N9Ed-FrCqkwQ8H8oJQ@mail.gmail.com	2017-01-19 18:24:17 -03:00
Robert Haas	05bd889904	Fix RETURNING to work correctly with partition tuple routing. In ExecInsert(), do not switch back to the root partitioned table ResultRelInfo until after we finish ExecProcessReturning(), so that RETURNING projection is done using the partition's descriptor. For the projection to work correctly, we must initialize the same for each leaf partition during ModifyTableState initialization. Amit Langote	2017-01-19 13:20:11 -05:00
Robert Haas	39162b2030	Fix failure to enforce partitioning contraint for internal partitions. When a tuple is inherited into a partitioning root, no partition constraints need to be enforced; when it is inserted into a leaf, the parent's partitioning quals needed to be enforced. The previous coding got both of those cases right. When a tuple is inserted into an intermediate level of the partitioning hierarchy (i.e. a table which is both a partition itself and in turn partitioned), it must enforce the partitioning qual inherited from its parent. That case got overlooked; repair. Amit Langote	2017-01-19 12:30:27 -05:00
Andres Freund	69f4b9c85f	Move targetlist SRF handling from expression evaluation to new executor node. Evaluation of set returning functions (SRFs_ in the targetlist (like SELECT generate_series(1,5)) so far was done in the expression evaluation (i.e. ExecEvalExpr()) and projection (i.e. ExecProject/ExecTargetList) code. This meant that most executor nodes performing projection, and most expression evaluation functions, had to deal with the possibility that an evaluated expression could return a set of return values. That's bad because it leads to repeated code in a lot of places. It also, and that's my (Andres's) motivation, made it a lot harder to implement a more efficient way of doing expression evaluation. To fix this, introduce a new executor node (ProjectSet) that can evaluate targetlists containing one or more SRFs. To avoid the complexity of the old way of handling nested expressions returning sets (e.g. having to pass up ExprDoneCond, and dealing with arguments to functions returning sets etc.), those SRFs can only be at the top level of the node's targetlist. The planner makes sure (via split_pathtarget_at_srfs()) that SRF evaluation is only necessary in ProjectSet nodes and that SRFs are only present at the top level of the node's targetlist. If there are nested SRFs the planner creates multiple stacked ProjectSet nodes. The ProjectSet nodes always get input from an underlying node. We also discussed and prototyped evaluating targetlist SRFs using ROWS FROM(), but that turned out to be more complicated than we'd hoped. While moving SRF evaluation to ProjectSet would allow to retain the old "least common multiple" behavior when multiple SRFs are present in one targetlist (i.e. continue returning rows until all SRFs are at the end of their input at the same time), we decided to instead only return rows till all SRFs are exhausted, returning NULL for already exhausted ones. We deemed the previous behavior to be too confusing, unexpected and actually not particularly useful. As a side effect, the previously prohibited case of multiple set returning arguments to a function, is now allowed. Not because it's particularly desirable, but because it ends up working and there seems to be no argument for adding code to prohibit it. Currently the behavior for COALESCE and CASE containing SRFs has changed, returning multiple rows from the expression, even when the SRF containing "arm" of the expression is not evaluated. That's because the SRFs are evaluated in a separate ProjectSet node. As that's quite confusing, we're likely to instead prohibit SRFs in those places. But that's still being discussed, and the code would reside in places not touched here, so that's a task for later. There's a lot of, now superfluous, code dealing with set return expressions around. But as the changes to get rid of those are verbose largely boring, it seems better for readability to keep the cleanup as a separate commit. Author: Tom Lane and Andres Freund Discussion: https://postgr.es/m/20160822214023.aaxz5l4igypowyri@alap3.anarazel.de	2017-01-18 13:40:27 -08:00
Alvaro Herrera	9a34123bc3	Make messages mentioning type names more uniform This avoids additional translatable strings for each distinct type, as well as making our quoting style around type names more consistent (namely, that we don't quote type names). This continues what started as `f402b99501`. Discussion: https://postgr.es/m/20160401170642.GA57509@alvherre.pgsql	2017-01-18 16:08:20 -03:00
Tom Lane	0333a73400	Avoid conflicts with collation aliases generated by stripping. This resulted in failures depending on the order of "locale -a" output. The original coding in initdb sorted the results, but that should be unnecessary as long as "locale -a" doesn't print duplicate names. The original entries will then all be non-dups, and while we might generate duplicate aliases by stripping, they should be for different encodings and thus not conflict. Even if the latter assumption fails somehow, it won't be fatal because we're using if_not_exists mode for the aliases. Discussion: https://postgr.es/m/26116.1484751196%40sss.pgh.pa.us	2017-01-18 13:44:19 -05:00
Peter Eisentraut	aa17c06fb5	Add function to import operating system collations Move this logic out of initdb into a user-callable function. This simplifies the code and makes it possible to update the standard collations later on if additional operating system collations appear. Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Euler Taveira <euler@timbira.com.br>	2017-01-18 09:35:56 -05:00
Peter Eisentraut	352a24a1f9	Generate fmgr prototypes automatically Gen_fmgrtab.pl creates a new file fmgrprotos.h, which contains prototypes for all functions registered in pg_proc.h. This avoids having to manually maintain these prototypes across a random variety of header files. It also automatically enforces a correct function signature, and since there are warnings about missing prototypes, it will detect functions that are defined but not registered in pg_proc.h (or otherwise used). Reviewed-by: Pavel Stehule <pavel.stehule@gmail.com>	2017-01-17 14:06:07 -05:00
Tom Lane	ab1f0c8225	Change representation of statement lists, and add statement location info. This patch makes several changes that improve the consistency of representation of lists of statements. It's always been the case that the output of parse analysis is a list of Query nodes, whatever the types of the individual statements in the list. This patch brings similar consistency to the outputs of raw parsing and planning steps: * The output of raw parsing is now always a list of RawStmt nodes; the statement-type-dependent nodes are one level down from that. * The output of pg_plan_queries() is now always a list of PlannedStmt nodes, even for utility statements. In the case of a utility statement, "planning" just consists of wrapping a CMD_UTILITY PlannedStmt around the utility node. This list representation is now used in Portal and CachedPlan plan lists, replacing the former convention of intermixing PlannedStmts with bare utility-statement nodes. Now, every list of statements has a consistent head-node type depending on how far along it is in processing. This allows changing many places that formerly used generic "Node *" pointers to use a more specific pointer type, thus reducing the number of IsA() tests and casts needed, as well as improving code clarity. Also, the post-parse-analysis representation of DECLARE CURSOR is changed so that it looks more like EXPLAIN, PREPARE, etc. That is, the contained SELECT remains a child of the DeclareCursorStmt rather than getting flipped around to be the other way. It's now true for both Query and PlannedStmt that utilityStmt is non-null if and only if commandType is CMD_UTILITY. That allows simplifying a lot of places that were testing both fields. (I think some of those were just defensive programming, but in many places, it was actually necessary to avoid confusing DECLARE CURSOR with SELECT.) Because PlannedStmt carries a canSetTag field, we're also able to get rid of some ad-hoc rules about how to reconstruct canSetTag for a bare utility statement; specifically, the assumption that a utility is canSetTag if and only if it's the only one in its list. While I see no near-term need for relaxing that restriction, it's nice to get rid of the ad-hocery. The API of ProcessUtility() is changed so that what it's passed is the wrapper PlannedStmt not just the bare utility statement. This will affect all users of ProcessUtility_hook, but the changes are pretty trivial; see the affected contrib modules for examples of the minimum change needed. (Most compilers should give pointer-type-mismatch warnings for uncorrected code.) There's also a change in the API of ExplainOneQuery_hook, to pass through cursorOptions instead of expecting hook functions to know what to pick. This is needed because of the DECLARE CURSOR changes, but really should have been done in 9.6; it's unlikely that any extant hook functions know about using CURSOR_OPT_PARALLEL_OK. Finally, teach gram.y to save statement boundary locations in RawStmt nodes, and pass those through to Query and PlannedStmt nodes. This allows more intelligent handling of cases where a source query string contains multiple statements. This patch doesn't actually do anything with the information, but a follow-on patch will. (Passing this information through cleanly is the true motivation for these changes; while I think this is all good cleanup, it's unlikely we'd have bothered without this end goal.) catversion bump because addition of location fields to struct Query affects stored rules. This patch is by me, but it owes a good deal to Fabien Coelho who did a lot of preliminary work on the problem, and also reviewed the patch. Discussion: https://postgr.es/m/alpine.DEB.2.20.1612200926310.29821@lancre	2017-01-14 16:02:35 -05:00
Robert Haas	0563a3a8b5	Fix a bug in how we generate partition constraints. Move the code for doing parent attnos to child attnos mapping for Vars in partition constraint expressions to a separate function map_partition_varattnos() and call it from the appropriate places. Doing it in get_qual_from_partbound(), as is now, would produce wrong result in certain multi-level partitioning cases, because it only considers the current pair of parent-child relations. In certain multi-level partitioning cases, attnums for the same key attribute(s) might differ between various levels causing the same attribute to be numbered differently in different instances of the Var corresponding to a given attribute. With this commit, in generate_partition_qual(), we first generate the the whole partition constraint (considering all levels of partitioning) and then do the mapping, so that Vars in the final expression are numbered according the leaf relation (to which it is supposed to apply). Amit Langote, reviewed by me.	2017-01-13 14:04:35 -05:00
Alvaro Herrera	3957b58b88	Fix ALTER TABLE / SET TYPE for irregular inheritance If inherited tables don't have exactly the same schema, the USING clause in an ALTER TABLE / SET DATA TYPE misbehaves when applied to the children tables since commit `9550e8348b`. Starting with that commit, the attribute numbers in the USING expression are fixed during parse analysis. This can lead to bogus errors being reported during execution, such as: ERROR: attribute 2 has wrong type DETAIL: Table has type smallint, but query expects integer. Since it wouldn't do to revert to the original coding, we now apply a transformation to map the attribute numbers to the correct ones for each child. Reported by Justin Pryzby Analysis by Tom Lane; patch by me. Discussion: https://postgr.es/m/20170102225618.GA10071@telsasoft.com	2017-01-09 19:26:58 -03:00
Tom Lane	7c3abe3c92	Get rid of ParseState.p_value_substitute; use a columnref hook instead. I noticed that p_value_substitute, which is a single-purpose kluge I added in 2002 (commit `b0422b215`), could be replaced by having domainAddConstraint install a parser hook that looks for the name "value". The parser hook code only dates back to 2009, so it's not surprising that we had to kluge this in 2002, but we can do it more cleanly now.	2017-01-07 16:02:16 -05:00
Tom Lane	c52d37c8b3	Invalidate cached plans on FDW option changes. This fixes problems where a plan must change but fails to do so, as seen in a bug report from Rajkumar Raghuwanshi. For ALTER FOREIGN TABLE OPTIONS, do this through the standard method of forcing a relcache flush on the table. For ALTER FOREIGN DATA WRAPPER and ALTER SERVER, just flush the whole plan cache on any change in pg_foreign_data_wrapper or pg_foreign_server. That matches the way we handle some other low-probability cases such as opclass changes, and it's unclear that the case arises often enough to be worth working harder. Besides, that gives a patch that is simple enough to back-patch with confidence. Back-patch to 9.3. In principle we could apply the code change to 9.2 as well, but (a) we lack postgres_fdw to test it with, (b) it's doubtful that anyone is doing anything exciting enough with FDWs that far back to need this desperately, and (c) the patch doesn't apply cleanly. Patch originally by Amit Langote, reviewed by Etsuro Fujita and Ashutosh Bapat, who each contributed substantial changes as well. Discussion: https://postgr.es/m/CAKcux6m5cA6rRPTKkqVdJ-R=KKDfe35Q_ZuUqxDSV_4hwga=og@mail.gmail.com	2017-01-06 14:12:52 -05:00
Tom Lane	d86f40009b	Handle OID column inheritance correctly in ALTER TABLE ... INHERIT. Inheritance operations must treat the OID column, if any, much like regular user columns. But MergeAttributesIntoExisting() neglected to do that, leading to weird results after a table with OIDs is associated to a parent with OIDs via ALTER TABLE ... INHERIT. Report and patch by Amit Langote, reviewed by Ashutosh Bapat, some adjustments by me. It's been broken all along, so back-patch to all supported branches. Discussion: https://postgr.es/m/cb13cfe7-a48c-5720-c383-bb843ab28298@lab.ntt.co.jp	2017-01-04 18:00:11 -05:00
Robert Haas	18fc5192a6	Remove unnecessary arguments from partitioning functions. RelationGetPartitionQual() and generate_partition_qual() are always called with recurse = true, so we don't need an argument for that. Extracted by me from a larger patch by Amit Langote.	2017-01-04 14:56:37 -05:00
Robert Haas	f1b4c771ea	Fix reporting of constraint violations for table partitioning. After a tuple is routed to a partition, it has been converted from the root table's row type to the partition's row type. ExecConstraints needs to report the failure using the original tuple and the parent's tuple descriptor rather than the ones for the selected partition. Amit Langote	2017-01-04 14:36:34 -05:00
Robert Haas	345b2dcf07	Move partition_tuple_slot out of EState. Commit `2ac3ef7a01` added a TupleTapleSlot for partition tuple slot to EState (es_partition_tuple_slot) but it's more logical to have it as part of ModifyTableState (mt_partition_tuple_slot) and CopyState (partition_tuple_slot). Discussion: http://postgr.es/m/1bd459d9-4c0c-197a-346e-e5e59e217d97@lab.ntt.co.jp Amit Langote, per a gripe from me	2017-01-04 13:16:59 -05:00
Bruce Momjian	1d25779284	Update copyright via script for 2017	2017-01-03 13:48:53 -05:00
Peter Eisentraut	db779d941e	Fix typo in comment	2016-12-29 11:27:41 -05:00
Peter Eisentraut	2e254130d1	Make more use of RoleSpec struct Most code was casting this through a generic Node. By declaring everything as RoleSpec appropriately, we can remove a bunch of casts and ad-hoc node type checking. Reviewed-by: Alvaro Herrera <alvherre@2ndquadrant.com>	2016-12-29 10:49:39 -05:00
Magnus Hagander	6cfa54e384	Fix typo comments Erik Rijkers	2016-12-27 10:24:21 +01:00
Tom Lane	fe591f8bf6	Replace enum InhOption with simple boolean. Now that it has only INH_NO and INH_YES values, it's just weird that it's not a plain bool, so make it that way. Also rename RangeVar.inhOpt to "inh", to be like RangeTblEntry.inh. My recollection is that we gave it a different name specifically because it had a different representation than the derived bool value, but it no longer does. And this is a good forcing function to be sure we catch any places that are affected by the change. Bump catversion because of possible effect on stored RangeVar nodes. I'm not exactly convinced that we ever store RangeVar on disk, but we have a readfuncs function for it, so be cautious. (If we do do so, then commit `e13486eba` was in error not to bump catversion.) Follow-on to commit `e13486eba`. Discussion: http://postgr.es/m/CA+TgmoYe+EG7LdYX6pkcNxr4ygkP4+A=jm9o-CPXyOvRiCNwaQ@mail.gmail.com	2016-12-23 13:35:18 -05:00
Peter Eisentraut	158df30359	Remove unnecessary casts of makeNode() result makeNode() is already a macro that has the right result pointer type, so casting it again to the same type is unnecessary.	2016-12-23 13:17:20 -05:00
Robert Haas	e13486eba0	Remove sql_inheritance GUC. This backward-compatibility GUC is long overdue for removal. Discussion: http://postgr.es/m/CA+TgmoYe+EG7LdYX6pkcNxr4ygkP4+A=jm9o-CPXyOvRiCNwaQ@mail.gmail.com	2016-12-23 07:35:01 -05:00
Robert Haas	2ac3ef7a01	Fix tuple routing in cases where tuple descriptors don't match. The previous coding failed to work correctly when we have a multi-level partitioned hierarchy where tables at successive levels have different attribute numbers for the partition key attributes. To fix, have each PartitionDispatch object store a standalone TupleTableSlot initialized with the TupleDesc of the corresponding partitioned table, along with a TupleConversionMap to map tuples from the its parent's rowtype to own rowtype. After tuple routing chooses a leaf partition, we must use the leaf partition's tuple descriptor, not the root table's. To that end, a dedicated TupleTableSlot for tuple routing is now allocated in EState. Amit Langote	2016-12-22 17:36:37 -05:00
Stephen Frost	12bd7dd317	Use TSConfigRelationId in AlterTSConfiguration() When we are altering a text search configuration, we are getting the tuple from pg_ts_config and using its OID, so use TSConfigRelationId when invoking any post-alter hooks and setting the object address. Further, in the functions called from AlterTSConfiguration(), we're saving information about the command via EventTriggerCollectAlterTSConfig(), so we should be setting commandCollected to true. Also add a regression test to test_ddl_deparse for ALTER TEXT SEARCH CONFIGURATION. Author: Artur Zakirov, a few additional comments by me Discussion: https://www.postgresql.org/message-id/57a71eba-f2c7-e7fd-6fc0-2126ec0b39bd%40postgrespro.ru Back-patch the fix for the InvokeObjectPostAlterHook() call to 9.3 where it was introduced, and the fix for the ObjectAddressSet() call and setting commandCollected to true to 9.5 where those changes to ProcessUtilitySlow() were introduced.	2016-12-22 17:08:43 -05:00
Robert Haas	3ee8067284	Code review for ATExecAttachPartition. Amit Langote. Most of this reported by Álvaro Herrera.	2016-12-22 12:40:45 -05:00
Dean Rasheed	58b1362642	Fix order of operations in CREATE OR REPLACE VIEW. When CREATE OR REPLACE VIEW acts on an existing view, don't update the view options until after the view query has been updated. This is necessary in the case where CREATE OR REPLACE VIEW is used on an existing view that is not updatable, and the new view is updatable and specifies the WITH CHECK OPTION. In this case, attempting to apply the new options to the view before updating its query fails, because the options are applied using the ALTER TABLE infrastructure which checks that WITH CHECK OPTION is only applied to an updatable view. If new columns are being added to the view, that is also done using the ALTER TABLE infrastructure, but it is important that that still be done before updating the view query, because the rules system checks that the query columns match those on the view relation. Added a comment to explain that, in case someone is tempted to move that to where the view options are now being set. Back-patch to 9.4 where WITH CHECK OPTION was added. Report: https://postgr.es/m/CAEZATCUp%3Dz%3Ds4SzZjr14bfct_bdJNwMPi-gFi3Xc5k1ntbsAgQ%40mail.gmail.com	2016-12-21 16:58:18 +00:00
Robert Haas	cd510f0413	Convert elog() to ereport() and do some wordsmithing. It's not entirely clear that we should log a message here at all, but it's certainly wrong to use elog() for a message that should clearly be translatable. Amit Langote	2016-12-21 11:47:50 -05:00
Robert Haas	1fc5c49450	Refactor partition tuple routing code to reduce duplication. Amit Langote	2016-12-21 11:36:10 -05:00
Peter Eisentraut	f3b421da5f	Reorder pg_sequence columns to avoid alignment issue On AIX, doubles are aligned at 4 bytes, but int64 is aligned at 8 bytes. Our code assumes that doubles have alignment that can also be applied to int64, but that fails in this case. One effect is that heap_form_tuple() writes tuples in a different layout than Form_pg_sequence expects. Rather than rewrite the whole alignment code, work around the issue by reordering the columns in pg_sequence so that the first int64 column naturally comes out at an 8-byte boundary.	2016-12-21 09:06:49 -05:00
Peter Eisentraut	1753b1b027	Add pg_sequence system catalog Move sequence metadata (start, increment, etc.) into a proper system catalog instead of storing it in the sequence heap object. This separates the metadata from the sequence data. Sequence metadata is now operated on transactionally by DDL commands, whereas previously rollbacks of sequence-related DDL commands would be ignored. Reviewed-by: Andreas Karlsson <andreas@proxel.se>	2016-12-20 08:28:18 -05:00
Robert Haas	7cd0fd655d	Invalid parent's relcache after CREATE TABLE .. PARTITION OF. Otherwise, subsequent commands in the same transaction see the wrong partition descriptor. Amit Langote. Reported by Tomas Vondra and David Fetter. Reviewed by me. Discussion: http://postgr.es/m/22dd313b-d7fd-22b5-0787-654845c8f849%402ndquadrant.com Discussion: http://postgr.es/m/20161215090916.GB20659%40fetter.org	2016-12-19 22:53:30 -05:00
Robert Haas	4b9a98e154	Clean up code, comments, and formatting for table partitioning. Amit Langote, plus pgindent-ing by me. Inspired in part by review comments from Tomas Vondra.	2016-12-13 10:59:14 -05:00
Robert Haas	3856cf9607	Remove should_free arguments to tuplesort routines. Since commit `e94568ecc1`, the answer is always "false", and we do not need to complicate the API by arranging to return a constant value. Peter Geoghegan Discussion: http://postgr.es/m/CAM3SWZQWZZ_N=DmmL7tKy_OUjGH_5mN=N=A6h7kHyyDvEhg2DA@mail.gmail.com	2016-12-12 15:57:35 -05:00
Tom Lane	563d575fd7	Fix creative, but unportable, spelling of "ptr != NULL". Or at least I suppose that's what was really meant here. But even aside from the not-per-project-style use of "0" to mean "NULL", I doubt it's safe to assume that all valid pointers are > NULL. Per buildfarm member pademelon.	2016-12-12 11:23:23 -05:00
Robert Haas	f0e44751d7	Implement table partitioning. Table partitioning is like table inheritance and reuses much of the existing infrastructure, but there are some important differences. The parent is called a partitioned table and is always empty; it may not have indexes or non-inherited constraints, since those make no sense for a relation with no data of its own. The children are called partitions and contain all of the actual data. Each partition has an implicit partitioning constraint. Multiple inheritance is not allowed, and partitioning and inheritance can't be mixed. Partitions can't have extra columns and may not allow nulls unless the parent does. Tuples inserted into the parent are automatically routed to the correct partition, so tuple-routing ON INSERT triggers are not needed. Tuple routing isn't yet supported for partitions which are foreign tables, and it doesn't handle updates that cross partition boundaries. Currently, tables can be range-partitioned or list-partitioned. List partitioning is limited to a single column, but range partitioning can involve multiple columns. A partitioning "column" can be an expression. Because table partitioning is less general than table inheritance, it is hoped that it will be easier to reason about properties of partitions, and therefore that this will serve as a better foundation for a variety of possible optimizations, including query planner optimizations. The tuple routing based which this patch does based on the implicit partitioning constraints is an example of this, but it seems likely that many other useful optimizations are also possible. Amit Langote, reviewed and tested by Robert Haas, Ashutosh Bapat, Amit Kapila, Rajkumar Raghuwanshi, Corey Huinker, Jaime Casanova, Rushabh Lathia, Erik Rijkers, among others. Minor revisions by me.	2016-12-07 13:17:55 -05:00
Robert Haas	4212cb7326	Fix interaction of parallel query with prepared statements. Previously, a prepared statement created via a Parse message could get a parallel plan, but one created with a PREPARE statement could not. This state of affairs was due to confusion on my (rhaas) part: I erroneously believed that a CREATE TABLE .. AS EXECUTE statement could only be performed with a prepared statement by PREPARE, but in fact one created by a Prepare message works just as well. Therefore, it makes no sense to allow parallel query in one case but not the other. To fix, allow parallel query with all prepared statements, but run the parallel plan serially (i.e. without workers) in the case of CREATE TABLE .. AS EXECUTE. Also, document this. Amit Kapila and Tobias Bussman, plus an extra sentence of documentation by me.	2016-12-06 11:11:54 -05:00
Stephen Frost	093129c9d9	Add support for restrictive RLS policies We have had support for restrictive RLS policies since 9.5, but they were only available through extensions which use the appropriate hooks. This adds support into the grammer, catalog, psql and pg_dump for restrictive RLS policies, thus reducing the cases where an extension is necessary. In passing, also move away from using "AND"d and "OR"d in comments. As pointed out by Alvaro, it's not really appropriate to attempt to make verbs out of "AND" and "OR", so reword those comments which attempted to. Reviewed By: Jeevan Chalke, Dean Rasheed Discussion: https://postgr.es/m/20160901063404.GY4028@tamriel.snowman.net	2016-12-05 15:50:55 -05:00
Alvaro Herrera	fa2fa99552	Permit dump/reload of not-too-large >1GB tuples Our documentation states that our maximum field size is 1 GB, and that our maximum row size of 1.6 TB. However, while this might be attainable in theory with enough contortions, it is not workable in practice; for starters, pg_dump fails to dump tables containing rows larger than 1 GB, even if individual columns are well below the limit; and even if one does manage to manufacture a dump file containing a row that large, the server refuses to load it anyway. This commit enables dumping and reloading of such tuples, provided two conditions are met: 1. no single column is larger than 1 GB (in output size -- for bytea this includes the formatting overhead) 2. the whole row is not larger than 2 GB There are three related changes to enable this: a. StringInfo's API now has two additional functions that allow creating a string that grows beyond the typical 1GB limit (and "long" string). ABI compatibility is maintained. We still limit these strings to 2 GB, though, for reasons explained below. b. COPY now uses long StringInfos, so that pg_dump doesn't choke trying to emit rows longer than 1GB. c. heap_form_tuple now uses the MCXT_ALLOW_HUGE flag in its allocation for the input tuple, which means that large tuples are accepted on input. Note that at this point we do not apply any further limit to the input tuple size. The main reason to limit to 2 GB is that the FE/BE protocol uses 32 bit length words to describe each row; and because the documentation is ambiguous on its signedness and libpq does consider it signed, we cannot use the highest-order bit. Additionally, the StringInfo API uses "int" (which is 4 bytes wide in most platforms) in many places, so we'd need to change that API too in order to improve, which has lots of fallout. Backpatch to 9.5, which is the oldest that has MemoryContextAllocExtended, a necessary piece of infrastructure. We could apply to 9.4 with very minimal additional effort, but any further than that would require backpatching "huge" allocations too. This is the largest set of changes we could find that can be back-patched without breaking compatibility with existing systems. Fixing a bigger set of problems (for example, dumping tuples bigger than 2GB, or dumping fields bigger than 1GB) would require changing the FE/BE protocol and/or changing the StringInfo API in an ABI-incompatible way, neither of which would be back-patchable. Authors: Daniel Vérité, Álvaro Herrera Reviewed by: Tomas Vondra Discussion: https://postgr.es/m/20160229183023.GA286012@alvherre.pgsql	2016-12-02 00:34:01 -03:00
Tom Lane	4e026b32d4	Check for pending trigger events on far end when dropping an FK constraint. When dropping a foreign key constraint with ALTER TABLE DROP CONSTRAINT, we refuse the drop if there are any pending trigger events on the named table; this ensures that we won't remove the pg_trigger row that will be consulted by those events. But we should make the same check for the referenced relation, else we might remove a due-to-be-referenced pg_trigger row for that relation too, resulting in "could not find trigger NNN" or "relation NNN has no triggers" errors at commit. Per bug #14431 from Benjie Gillam. Back-patch to all supported branches. Report: <20161124114911.6530.31200@wrigleys.postgresql.org>	2016-11-25 13:44:47 -05:00
Peter Eisentraut	67dc4ccbb2	Add pg_sequences view Like pg_tables, pg_views, and others, this view contains information about sequences in a way that is independent of the system catalog layout but more comprehensive than the information schema. To help implement the view, add a new internal function pg_sequence_last_value() to return the last value of a sequence. This is kept separate from pg_sequence_parameters() to separate querying run-time state from catalog-like information. Reviewed-by: Andreas Karlsson <andreas@proxel.se>	2016-11-18 14:59:03 -05:00
Tom Lane	279c439c7f	Support "COPY view FROM" for views with INSTEAD OF INSERT triggers. We just pass the data to the INSTEAD trigger. Haribabu Kommi, reviewed by Dilip Kumar Patch: <CAJrrPGcSQkrNkO+4PhLm4B8UQQQmU9YVUuqmtgM=pmzMfxWaWQ@mail.gmail.com>	2016-11-10 14:13:43 -05:00
Robert Haas	dce429b117	Fix typo. Michael Paquier	2016-11-08 15:33:57 -05:00
Kevin Grittner	8c48375e5f	Implement syntax for transition tables in AFTER triggers. This is infrastructure for the complete SQL standard feature. No support is included at this point for execution nodes or PLs. The intent is to add that soon. As this patch leaves things, standard syntax can create tuplestores to contain old and/or new versions of rows affected by a statement. References to these tuplestores are in the TriggerData structure. C triggers can access the tuplestores directly, so they are usable, but they cannot yet be referenced within a SQL statement.	2016-11-04 10:49:50 -05:00
Tom Lane	a522fc3d80	Fix incorrect trigger-property updating in ALTER CONSTRAINT. The code to change the deferrability properties of a foreign-key constraint updated all the associated triggers to match; but a moment's examination of the code that creates those triggers in the first place shows that only some of them should track the constraint's deferrability properties. This leads to odd failures in subsequent exercise of the foreign key, as the triggers are fired at the wrong times. Fix that, and add a regression test comparing the trigger properties produced by ALTER CONSTRAINT with those you get by creating the constraint as-intended to begin with. Per report from James Parks. Back-patch to 9.4 where this ALTER functionality was introduced. Report: <CAJ3Xv+jzJ8iNNUcp4RKW8b6Qp1xVAxHwSXVpjBNygjKxcVuE9w@mail.gmail.com>	2016-10-26 17:05:06 -04:00
Tom Lane	709e461bef	Fix EXPLAIN so that it doesn't emit invalid XML in corner cases. With track_io_timing = on, EXPLAIN (ANALYZE, BUFFERS) will emit fields named like "I/O Read Time". The slash makes that invalid as an XML element name, so that adding FORMAT XML would produce invalid XML. We already have code in there to translate spaces to dashes, so let's generalize that to convert anything that isn't a valid XML name character, viz letters, digits, hyphens, underscores, and periods. We could just reject slashes, which would run a bit faster. But the fact that this went unnoticed for so long doesn't give me a warm feeling that we'd notice the next creative violation, so let's make it a permanent fix. Reported by Markus Winand, though this isn't his initial patch proposal. Back-patch to 9.2 where track_io_timing was added. The problem is only latent in 9.1, so I don't feel a need to fix it there. Discussion: <E0BF6A45-68E8-45E6-918F-741FB332C6BB@winand.at>	2016-10-20 17:17:50 -04:00
Tom Lane	2f1eaf87e8	Drop server support for FE/BE protocol version 1.0. While this isn't a lot of code, it's been essentially untestable for a very long time, because libpq doesn't support anything older than protocol 2.0, and has not since release 6.3. There's no reason to believe any other client-side code still uses that protocol, either. Discussion: <2661.1475849167@sss.pgh.pa.us>	2016-10-11 12:19:18 -04:00
Heikki Linnakangas	6fb12cbcd6	Remove some unnecessary #includes. Amit Langote	2016-10-10 12:22:58 +03:00
Tom Lane	e55a946a81	Fix two bugs in merging of inherited CHECK constraints. Historically, we've allowed users to add a CHECK constraint to a child table and then add an identical CHECK constraint to the parent. This results in "merging" the two constraints so that the pre-existing child constraint ends up with both conislocal = true and coninhcount > 0. However, if you tried to do it in the other order, you got a duplicate constraint error. This is problematic for pg_dump, which needs to issue separated ADD CONSTRAINT commands in some cases, but has no good way to ensure that the constraints will be added in the required order. And it's more than a bit arbitrary, too. The goal of complaining about duplicated ADD CONSTRAINT commands can be served if we reject the case of adding a constraint when the existing one already has conislocal = true; but if it has conislocal = false, let's just make the ADD CONSTRAINT set conislocal = true. In this way, either order of adding the constraints has the same end result. Another problem was that the code allowed creation of a parent constraint marked convalidated that is merged with a child constraint that is !convalidated. In this case, an inheritance scan of the parent table could emit some rows violating the constraint condition, which would be an unexpected result given the marking of the parent constraint as validated. Hence, forbid merging of constraints in this case. (Note: valid child and not-valid parent seems fine, so continue to allow that.) Per report from Benedikt Grundmann. Back-patch to 9.2 where we introduced possibly-not-valid check constraints. The second bug obviously doesn't apply before that, and I think the first doesn't either, because pg_dump only gets into this situation when dealing with not-valid constraints. Report: <CADbMkNPT-Jz5PRSQ4RbUASYAjocV_KHUWapR%2Bg8fNvhUAyRpxA%40mail.gmail.com> Discussion: <22108.1475874586@sss.pgh.pa.us>	2016-10-08 19:29:27 -04:00
Stephen Frost	814b9e9b8e	Fix RLS with COPY (col1, col2) FROM tab Attempting to COPY a subset of columns from a table with RLS enabled would fail due to an invalid query being constructed (using a single ColumnRef with the list of fields to exact in 'fields', but that's for the different levels of an indirection for a single column, not for specifying multiple columns). Correct by building a ColumnRef and then RestTarget for each column being requested and then adding those to the targetList for the select query. Include regression tests to hopefully catch if this is broken again in the future. Patch-By: Adam Brightwell Reviewed-By: Michael Paquier	2016-10-03 16:22:57 -04:00
Peter Eisentraut	967ed9205b	Remove dead line of code	2016-09-28 12:00:00 -04:00
Heikki Linnakangas	babe05bc2b	Turn password_encryption GUC into an enum. This makes the parameter easier to extend, to support other password-based authentication protocols than MD5. (SCRAM is being worked on.) The GUC still accepts on/off as aliases for "md5" and "plain", although we may want to remove those once we actually add support for another password hash type. Michael Paquier, reviewed by David Steele, with some further edits by me. Discussion: <CAB7nPqSMXU35g=W9X74HVeQp0uvgJxvYOuA4A-A3M+0wfEBv-w@mail.gmail.com>	2016-09-28 12:22:44 +03:00
Robert Haas	5225c66336	Clarify policy on marking inherited constraints as valid. Amit Langote and Robert Haas	2016-09-15 17:24:54 -04:00
Tom Lane	55c3391d1e	Be pickier about converting between Name and Datum. We were misapplying NameGetDatum() to plain C strings in some places. This worked, because it was just a pointer cast anyway, but it's a type cheat in some sense. Use CStringGetDatum instead, and modify the NameGetDatum macro so it won't compile if applied to something that's not a pointer to NameData. This should result in no changes to generated code, but it is logically cleaner. Mark Dilger, tweaked a bit by me Discussion: <EFD8AC94-4C1F-40C1-A5EA-304080089C1B@gmail.com>	2016-09-13 17:17:48 -04:00
Tom Lane	40b449ae84	Allow CREATE EXTENSION to follow extension update paths. Previously, to update an extension you had to produce both a version-update script and a new base installation script. It's become more and more obvious that that's tedious, duplicative, and error-prone. This patch attempts to improve matters by allowing the new base installation script to be omitted. CREATE EXTENSION will install a requested version if it can find a base script and a chain of update scripts that will get there. As in the existing update logic, shorter chains are preferred if there's more than one possibility, with an arbitrary tie-break rule for chains of equal length. Also adjust the pg_available_extension_versions view to show such versions as installable. While at it, refactor the code so that CASCADE processing works for extensions requested during ApplyExtensionUpdates(). Without this, addition of a new requirement in an updated extension would require creating a new base script, even if there was no other reason to do that. (It would be easy at this point to add a CASCADE option to ALTER EXTENSION UPDATE, to allow the same thing to happen during a manually-commanded version update, but I have not done that here.) Tom Lane, reviewed by Andres Freund Discussion: <20160905005919.jz2m2yh3und2dsuy@alap3.anarazel.de>	2016-09-11 14:15:07 -04:00
Tom Lane	0ab9c56d0f	Support renaming an existing value of an enum type. Not much to be said about this patch: it does what it says on the tin. In passing, rename AlterEnumStmt.skipIfExists to skipIfNewValExists to clarify what it actually does. In the discussion of this patch we considered supporting other similar options, such as IF EXISTS on the type as a whole or IF NOT EXISTS on the target name. This patch doesn't actually add any such feature, but it might happen later. Dagfinn Ilmari Mannsåker, reviewed by Emre Hasegeli Discussion: <CAO=2mx6uvgPaPDf-rHqG8=1MZnGyVDMQeh8zS4euRyyg4D35OQ@mail.gmail.com>	2016-09-07 16:11:56 -04:00
Tom Lane	4f405c8ef4	Add a HINT for client-vs-server COPY failure cases. Users often get confused between COPY and \copy and try to use client-side paths with COPY. The server then cannot find the file (if remote), or sees a permissions problem (if local), or some variant of that. Emit a hint about this in the most common cases. In future we might want to expand the set of errnos for which the hint gets printed, but be conservative for now. Craig Ringer, reviewed by Christoph Berg and Tom Lane Discussion: <CAMsr+YEqtD97qPEzQDqrCt5QiqPbWP_X4hmvy2pQzWC0GWiyPA@mail.gmail.com>	2016-09-06 23:55:55 -04:00
Peter Eisentraut	49eb0fd097	Add location field to DefElem Add a location field to the DefElem struct, used to parse many utility commands. Update various error messages to supply error position information. To propogate the error position information in a more systematic way, create a ParseState in standard_ProcessUtility() and pass that to interested functions implementing the utility commands. This seems better than passing the query string and then reassembling a parse state ad hoc, which violates the encapsulation of the ParseState type. Reviewed-by: Pavel Stehule <pavel.stehule@gmail.com>	2016-09-06 12:00:00 -04:00
Simon Riggs	dcb12ce8d8	Fix VACUUM_TRUNCATE_LOCK_WAIT_INTERVAL lazy_truncate_heap() was waiting for VACUUM_TRUNCATE_LOCK_WAIT_INTERVAL, but in microseconds not milliseconds as originally intended. Found by code inspection. Simon Riggs	2016-09-06 15:35:47 +01:00
Tom Lane	25794e841e	Cosmetic code cleanup in commands/extension.c. Some of the comments added by the CREATE EXTENSION CASCADE patch were a bit sloppy, and I didn't care for redeclaring the same local variable inside a nested block either. No functional changes.	2016-09-05 18:53:33 -04:00
Tom Lane	15bc038f9b	Relax transactional restrictions on ALTER TYPE ... ADD VALUE. To prevent possibly breaking indexes on enum columns, we must keep uncommitted enum values from getting stored in tables, unless we can be sure that any such column is new in the current transaction. Formerly, we enforced this by disallowing ALTER TYPE ... ADD VALUE from being executed at all in a transaction block, unless the target enum type had been created in the current transaction. This patch removes that restriction, and instead insists that an uncommitted enum value can't be referenced unless it belongs to an enum type created in the same transaction as the value. Per discussion, this should be a bit less onerous. It does require each function that could possibly return a new enum value to SQL operations to check this restriction, but there aren't so many of those that this seems unmaintainable. Andrew Dunstan and Tom Lane Discussion: <4075.1459088427@sss.pgh.pa.us>	2016-09-05 12:59:55 -04:00
Heikki Linnakangas	ec136d19b2	Move code shared between libpq and backend from backend/libpq/ to common/. When building libpq, ip.c and md5.c were symlinked or copied from src/backend/libpq into src/interfaces/libpq, but now that we have a directory specifically for routines that are shared between the server and client binaries, src/common/, move them there. Some routines in ip.c were only used in the backend. Keep those in src/backend/libpq, but rename to ifaddr.c to avoid confusion with the file that's now in common. Fix the comment in src/common/Makefile to reflect how libpq actually links those files. There are two more files that libpq symlinks directly from src/backend: encnames.c and wchar.c. I don't feel compelled to move those right now, though. Patch by Michael Paquier, with some changes by me. Discussion: <69938195-9c76-8523-0af8-eb718ea5b36e@iki.fi>	2016-09-02 13:49:59 +03:00
Tom Lane	ea268cdc9a	Add macros to make AllocSetContextCreate() calls simpler and safer. I found that half a dozen (nearly 5%) of our AllocSetContextCreate calls had typos in the context-sizing parameters. While none of these led to especially significant problems, they did create minor inefficiencies, and it's now clear that expecting people to copy-and-paste those calls accurately is not a great idea. Let's reduce the risk of future errors by introducing single macros that encapsulate the common use-cases. Three such macros are enough to cover all but two special-purpose contexts; those two calls can be left as-is, I think. While this patch doesn't in itself improve matters for third-party extensions, it doesn't break anything for them either, and they can gradually adopt the simplified notation over time. In passing, change TopMemoryContext to use the default allocation parameters. Formerly it could only be extended 8K at a time. That was probably reasonable when this code was written; but nowadays we create many more contexts than we did then, so that it's not unusual to have a couple hundred K in TopMemoryContext, even without considering various dubious code that sticks other things there. There seems no good reason not to let it use growing blocks like most other contexts. Back-patch to 9.6, mostly because that's still close enough to HEAD that it's easy to do so, and keeping the branches in sync can be expected to avoid some future back-patching pain. The bugs fixed by these changes don't seem to be significant enough to justify fixing them further back. Discussion: <21072.1472321324@sss.pgh.pa.us>	2016-08-27 17:50:38 -04:00
Tom Lane	b5bce6c1ec	Final pgindent + perltidy run for 9.6.	2016-08-15 13:42:51 -04:00
Tom Lane	ed0097e4f9	Add SQL-accessible functions for inspecting index AM properties. Per discussion, we should provide such functions to replace the lost ability to discover AM properties by inspecting pg_am (cf commit `65c5fcd35`). The added functionality is also meant to displace any code that was looking directly at pg_index.indoption, since we'd rather not believe that the bit meanings in that field are part of any client API contract. As future-proofing, define the SQL API to not assume that properties that are currently AM-wide or index-wide will remain so unless they logically must be; instead, expose them only when inquiring about a specific index or even specific index column. Also provide the ability for an index AM to override the behavior. In passing, document pg_am.amtype, overlooked in commit `473b93287`. Andrew Gierth, with kibitzing by me and others Discussion: <87mvl5on7n.fsf@news-spur.riddles.org.uk>	2016-08-13 18:31:14 -04:00
Tom Lane	4b234fd8bf	Fix inappropriate printing of never-measured times in EXPLAIN. EXPLAIN (ANALYZE, TIMING OFF) would print an elapsed time of zero for a trigger function, because no measurement has been taken but it printed the field anyway. This isn't what EXPLAIN does elsewhere, so suppress it. In the same vein, EXPLAIN (ANALYZE, BUFFERS) with non-text output format would print buffer I/O timing numbers even when no measurement has been taken because track_io_timing is off. That seems not per policy, either, so change it. Back-patch to 9.2 where these features were introduced. Maksim Milyutin Discussion: <081c0540-ecaa-bd29-3fd2-6358f3b359a9@postgrespro.ru>	2016-08-12 12:13:04 -04:00
Tom Lane	95bee941be	Fix misestimation of n_distinct for a nearly-unique column with many nulls. If ANALYZE found no repeated non-null entries in its sample, it set the column's stadistinct value to -1.0, intending to indicate that the entries are all distinct. But what this value actually means is that the number of distinct values is 100% of the table's rowcount, and thus it was overestimating the number of distinct values by however many nulls there are. This could lead to very poor selectivity estimates, as for example in a recent report from Andreas Joseph Krogh. We should discount the stadistinct value by whatever we've estimated the nulls fraction to be. (That is what will happen if we choose to use a negative stadistinct for a column that does have repeated entries, so this code path was just inconsistent.) In addition to fixing the stadistinct entries stored by several different ANALYZE code paths, adjust the logic where get_variable_numdistinct() forces an "all distinct" estimate on the basis of finding a relevant unique index. Unique indexes don't reject nulls, so there's no reason to assume that the null fraction doesn't apply. Back-patch to all supported branches. Back-patching is a bit of a judgment call, but this problem seems to affect only a few users (else we'd have identified it long ago), and it's bad enough when it does happen that destabilizing plan choices in a worse direction seems unlikely. Patch by me, with documentation wording suggested by Dean Rasheed Report: <VisenaEmail.26.df42f82acae38a58.156463942b8@tc7-visena> Discussion: <16143.1470350371@sss.pgh.pa.us>	2016-08-07 18:52:02 -04:00
Tom Lane	9ee1cf04ab	Fix TOAST access failure in RETURNING queries. Discussion of commit `3e2f3c2e4` exposed a problem that is of longer standing: since we don't detoast data while sticking it into a portal's holdStore for PORTAL_ONE_RETURNING and PORTAL_UTIL_SELECT queries, and we release the query's snapshot as soon as we're done loading the holdStore, later readout of the holdStore can do TOAST fetches against data that can no longer be seen by any of the session's live snapshots. This means that a concurrent VACUUM could remove the TOAST data before we can fetch it. Commit `3e2f3c2e4` exposed the problem by showing that sometimes we had no live snapshots while fetching TOAST data, but we'd be at risk anyway. I believe this code was all right when written, because our management of a session's exposed xmin was such that the TOAST references were safe until end of transaction. But that's no longer true now that we can advance or clear our PGXACT.xmin intra-transaction. To fix, copy the query's snapshot during FillPortalStore() and save it in the Portal; release it only when the portal is dropped. This essentially implements a policy that we must hold a relevant snapshot whenever we access potentially-toasted data. We had already come to that conclusion in other places, cf commits `08e261cbc9` and `ec543db77b`. I'd have liked to add a regression test case for this, but I didn't see a way to make one that's not unreasonably bloated; it seems to require returning a toasted value to the client, and those will be big. In passing, improve PortalRunUtility() so that it positively verifies that its ending PopActiveSnapshot() call will pop the expected snapshot, removing a rather shaky assumption about which utility commands might do their own PopActiveSnapshot(). There's no known bug here, but now that we're actively referencing the snapshot it's almost free to make this code a bit more bulletproof. We might want to consider back-patching something like this into older branches, but it would be prudent to let it prove itself more in HEAD beforehand. Discussion: <87vazemeda.fsf@credativ.de>	2016-08-07 17:46:08 -04:00
Peter Eisentraut	40fcfec82c	Message style improvements	2016-07-25 22:07:44 -04:00
Kevin Grittner	1c15aac53f	Add comment & docs about no vacuum truncation with sto. Omission noted by Andres Freund.	2016-07-19 16:25:53 -05:00
Andres Freund	eca0f1db14	Clear all-frozen visibilitymap status when locking tuples. Since `a892234` & `fd31cd265` the visibilitymap's freeze bit is used to avoid vacuuming the whole relation in anti-wraparound vacuums. Doing so correctly relies on not adding xids to the heap without also unsetting the visibilitymap flag. Tuple locking related code has not done so. To allow selectively resetting all-frozen - to avoid pessimizing heap_lock_tuple - allow to selectively reset the all-frozen with visibilitymap_clear(). To avoid having to use visibilitymap_get_status (e.g. via VM_ALL_FROZEN) inside a critical section, have visibilitymap_clear() return whether any bits have been reset. There's a remaining issue (denoted by XXX): After the PageIsAllVisible() check in heap_lock_tuple() and heap_lock_updated_tuple_rec() the page status could theoretically change. Practically that currently seems impossible, because updaters will hold a page level pin already. Due to the next beta coming up, it seems better to get the required WAL magic bump done before resolving this issue. The added flags field fields to xl_heap_lock and xl_heap_lock_updated require bumping the WAL magic. Since there's already been a catversion bump since the last beta, that's not an issue. Reviewed-By: Robert Haas, Amit Kapila and Andres Freund Author: Masahiko Sawada, heavily revised by Andres Freund Discussion: CAEepm=3fWAbWryVW9swHyLTY4sXVf0xbLvXqOwUoDiNCx9mBjQ@mail.gmail.com Backpatch: -	2016-07-18 02:01:13 -07:00
Tom Lane	4d042999f9	Print a given subplan only once in EXPLAIN. We have, for a very long time, allowed the same subplan (same member of the PlannedStmt.subplans list) to be referenced by more than one SubPlan node; this avoids problems for cases such as subplans within an IndexScan's indxqual and indxqualorig fields. However, EXPLAIN had not gotten the memo and would print each reference as though it were an independent identical subplan. To fix, track plan_ids of subplans we've printed and don't print the same plan_id twice. Per report from Pavel Stehule. BTW: the particular case of IndexScan didn't cause visible duplication in a plain EXPLAIN, only EXPLAIN ANALYZE, because in the former case we short-circuit executor startup before the indxqual field is processed by ExecInitExpr. That seems like it could easily lead to other EXPLAIN problems in future, but it's not clear how to avoid it without breaking the "EXPLAIN a plan using hypothetical indexes" use-case. For now I've left that issue alone. Although this is a longstanding bug, it's purely cosmetic (no great harm is done by the repeat printout) and we haven't had field complaints before. So I'm hesitant to back-patch it, especially since there is some small risk of ABI problems due to the need to add a new field to ExplainState. In passing, rearrange order of fields in ExplainState to be less random, and update some obsolete comments about when/where to initialize them. Report: <CAFj8pRAimq+NK-menjt+3J4-LFoodDD8Or6=Lc_stcFD+eD4DA@mail.gmail.com>	2016-07-11 18:14:29 -04:00
Robert Haas	10c0558ffe	Fix several mistakes around parallel workers and client_encoding. Previously, workers sent data to the leader using the client encoding. That mostly worked, but the leader the converted the data back to the server encoding. Since not all encoding conversions are reversible, that could provoke failures. Fix by using the database encoding for all communication between worker and leader. Also, while temporary changes to GUC settings, as from the SET clause of a function, are in general OK for parallel query, changing client_encoding this way inside of a parallel worker is not OK. Previously, that would have confused the leader; with these changes, it would not confuse the leader, but it wouldn't do anything either. So refuse such changes in parallel workers. Also, the previous code naively assumed that when it received a NotifyResonse from the worker, it could pass that directly back to the user. But now that worker-to-leader communication always uses the database encoding, that's clearly no longer correct - though, actually, the old way was always broken for V2 clients. So disassemble and reconstitute the message instead. Issues reported by Peter Eisentraut. Patch by me, reviewed by Peter Eisentraut.	2016-06-30 18:35:32 -04:00
Tom Lane	8ebb69f854	Fix some infelicities in EXPLAIN output for parallel query plans. In non-text output formats, parallelized aggregates were reporting "Partial" or "Finalize" as a field named "Operation", which might be all right in the absence of any context --- but other plan node types use that field to report SQL-visible semantics, such as Select/Insert/Update/Delete. So that naming choice didn't seem good to me. I changed it to "Partial Mode". Also, the field did not appear at all for a non-parallelized Agg plan node, which is contrary to expectation in non-text formats. We're notionally producing objects that conform to a schema, so the set of fields for a given node type and EXPLAIN mode should be well-defined. I set it up to fill in "Simple" in such cases. Other fields that were added for parallel query, namely "Parallel Aware" and Gather's "Single Copy", had not gotten the word on that point either. Make them appear always in non-text output. Also, the latter two fields were nominally producing boolean output, but were getting it wrong, because bool values shouldn't be quoted in JSON or YAML. Somehow we'd not needed an ExplainPropertyBool formatting subroutine before 9.6; but now we do, so invent it. Discussion: <16002.1466972724@sss.pgh.pa.us>	2016-06-29 18:51:20 -04:00
Tom Lane	874fe3aea1	Fix CREATE MATVIEW/CREATE TABLE AS ... WITH NO DATA to not plan the query. Previously, these commands always planned the given query and went through executor startup before deciding not to actually run the query if WITH NO DATA is specified. This behavior is problematic for pg_dump because it may cause errors to be raised that we would rather not see before a REFRESH MATERIALIZED VIEW command is issued. See for example bug #13907 from Marian Krucina. This change is not sufficient to fix that particular bug, because we also need to tweak pg_dump to issue the REFRESH later, but it's a necessary step on the way. A user-visible side effect of doing things this way is that the returned command tag for WITH NO DATA cases will now be "CREATE MATERIALIZED VIEW" or "CREATE TABLE AS", not "SELECT 0". We could preserve the old behavior but it would take more code, and arguably that was just an implementation artifact not intended behavior anyhow. In 9.5 and HEAD, also get rid of the static variable CreateAsReladdr, which was trouble waiting to happen; there is not any prohibition on nested CREATE commands. Back-patch to 9.3 where CREATE MATERIALIZED VIEW was introduced. Michael Paquier and Tom Lane Report: <20160202161407.2778.24659@wrigleys.postgresql.org>	2016-06-27 15:57:50 -04:00
Tom Lane	19e972d558	Rethink node-level representation of partial-aggregation modes. The original coding had three separate booleans representing partial aggregation behavior, which was confusing, unreadable, and error-prone, not least because the booleans weren't always listed in the same order. It was also inadequate for the allegedly-desirable future extension to support intermediate partial aggregation, because we'd need separate markers for serialization and deserialization in such a case. Merge these bools into an enum "AggSplit" to provide symbolic names for the supported operating modes (and document what those are). By assigning the values of the enum constants carefully, we can treat AggSplit values as options bitmasks so that tests of what to do aren't noticeably more expensive than before. While at it, get rid of Aggref.aggoutputtype. That's not needed since commit `59a3795c2` got rid of setrefs.c's special-purpose Aggref comparison code, and it likewise seemed more confusing than helpful. Assorted comment cleanup as well (there's still more that I want to do in that line). catversion bump for change in Aggref node contents. Should be the last one for partial-aggregation changes. Discussion: <29309.1466699160@sss.pgh.pa.us>	2016-06-26 14:33:38 -04:00
Tom Lane	f8ace5477e	Fix type-safety problem with parallel aggregate serial/deserialization. The original specification for this called for the deserialization function to have signature "deserialize(serialtype) returns transtype", which is a security violation if transtype is INTERNAL (which it always would be in practice) and serialtype is not (which ditto). The patch blithely overrode the opr_sanity check for that, which was sloppy-enough work in itself, but the indisputable reason this cannot be allowed to stand is that CREATE FUNCTION will reject such a signature and thus it'd be impossible for extensions to create parallelizable aggregates. The minimum fix to make the signature type-safe is to add a second, dummy argument of type INTERNAL. But to lock it down a bit more and make misuse of INTERNAL-accepting functions less likely, let's get rid of the ability to specify a "serialtype" for an aggregate and just say that the only useful serialtype is BYTEA --- which, in practice, is the only interesting value anyway, due to the usefulness of the send/recv infrastructure for this purpose. That means we only have to allow "serialize(internal) returns bytea" and "deserialize(bytea, internal) returns internal" as the signatures for these support functions. In passing fix bogus signature of int4_avg_combine, which I found thanks to adding an opr_sanity check on combinefunc signatures. catversion bump due to removing pg_aggregate.aggserialtype and adjusting signatures of assorted built-in functions. David Rowley and Tom Lane Discussion: <27247.1466185504@sss.pgh.pa.us>	2016-06-22 16:52:41 -04:00
Robert Haas	ede62e56fb	Add VACUUM (DISABLE_PAGE_SKIPPING) for emergencies. If you really want to vacuum every single page in the relation, regardless of apparent visibility status or anything else, you can use this option. In previous releases, this behavior could be achieved using VACUUM (FREEZE), but because we can now recognize all-frozen pages as not needing to be frozen again, that no longer works. There should be no need for routine use of this option, but maybe bugs or disaster recovery will necessitate its use. Patch by me, reviewed by Andres Freund.	2016-06-17 15:48:57 -04:00
Robert Haas	38e9f90a22	Fix lazy_scan_heap so that it won't mark pages all-frozen too soon. Commit `a892234f83` added a new bit per page to the visibility map fork indicating whether the page is all-frozen, but incorrectly assumed that if lazy_scan_heap chose to freeze a tuple then that tuple would not need to later be frozen again. This turns out to be false, because xmin and xmax (and conceivably xvac, if dealing with tuples from very old releases) could be frozen at separate times. Thanks to Andres Freund for help in uncovering and tracking down this issue.	2016-06-15 14:30:06 -04:00

... 8 9 10 11 12 ...

3951 Commits