postgresql

Commit Graph

Author	SHA1	Message	Date
Michael Paquier	a904abe2e2	Fix concurrent indexing operations with temporary tables Attempting to use CREATE INDEX, DROP INDEX or REINDEX with CONCURRENTLY on a temporary relation with ON COMMIT actions triggered unexpected errors because those operations use multiple transactions internally to complete their work. Here is for example one confusing error when using ON COMMIT DELETE ROWS: ERROR: index "foo" already contains data Issues related to temporary relations and concurrent indexing are fixed in this commit by enforcing the non-concurrent path to be taken for temporary relations even if using CONCURRENTLY, transparently to the user. Using a non-concurrent path does not matter in practice as locks cannot be taken on a temporary relation by a session different than the one owning the relation, and the non-concurrent operation is more effective. The problem exists with REINDEX since v12 with the introduction of CONCURRENTLY, and with CREATE/DROP INDEX since CONCURRENTLY exists for those commands. In all supported versions, this caused only confusing error messages to be generated. Note that with REINDEX, it was also possible to issue a REINDEX CONCURRENTLY for a temporary relation owned by a different session, leading to a server crash. The idea to enforce transparently the non-concurrent code path for temporary relations comes originally from Andres Freund. Reported-by: Manuel Rigger Author: Michael Paquier, Heikki Linnakangas Reviewed-by: Andres Freund, Álvaro Herrera, Heikki Linnakangas Discussion: https://postgr.es/m/CA+u7OA6gP7YAeCguyseusYcc=uR8+ypjCcgDDCTzjQ+k6S9ksQ@mail.gmail.com Backpatch-through: 9.4	2020-01-22 09:49:18 +09:00
Bruce Momjian	7559d8ebfa	Update copyrights for 2020 Backpatch-through: update all files in master, backpatch legal files through 9.4	2020-01-01 12:21:45 -05:00
Michael Paquier	7854e07f25	Revert "Rename files and headers related to index AM" This follows multiple complains from Peter Geoghegan, Andres Freund and Alvaro Herrera that this issue ought to be dug more before actually happening, if it happens. Discussion: https://postgr.es/m/20191226144606.GA5659@alvherre.pgsql	2019-12-27 08:09:00 +09:00
Michael Paquier	8ce3aa9b59	Rename files and headers related to index AM The following renaming is done so as source files related to index access methods are more consistent with table access methods (the original names used for index AMs ware too generic, and could be confused as including features related to table AMs): - amapi.h -> indexam.h. - amapi.c -> indexamapi.c. Here we have an equivalent with backend/access/table/tableamapi.c. - amvalidate.c -> indexamvalidate.c. - amvalidate.h -> indexamvalidate.h. - genam.c -> indexgenam.c. - genam.h -> indexgenam.h. This has been discussed during the development of v12 when table AM was worked on, but the renaming never happened. Author: Michael Paquier Reviewed-by: Fabien Coelho, Julien Rouhaud Discussion: https://postgr.es/m/20191223053434.GF34339@paquier.xyz	2019-12-25 10:23:39 +09:00
Michael Paquier	e1551f96e6	Refactor attribute mappings used in logical tuple conversion Tuple conversion support in tupconvert.c is able to convert rowtypes between two relations, inner and outer, which are logically equivalent but have a different ordering or even dropped columns (used mainly for inheritance tree and partitions). This makes use of attribute mappings, which are simple arrays made of AttrNumber elements with a length matching the number of attributes of the outer relation. The length of the attribute mapping has been treated as completely independent of the mapping itself until now, making it easy to pass down an incorrect mapping length. This commit refactors the code related to attribute mappings and moves it into an independent facility called attmap.c, extracted from tupconvert.c. This merges the attribute mapping with its length, avoiding to try to guess what is the length of a mapping to use as this is computed once, when the map is built. This will avoid mistakes like what has been fixed in `dc816e58`, which has used an incorrect mapping length by matching it with the number of attributes of an inner relation (a child partition) instead of an outer relation (a partitioned table). Author: Michael Paquier Reviewed-by: Amit Langote Discussion: https://postgr.es/m/20191121042556.GD153437@paquier.xyz	2019-12-18 16:23:02 +09:00
Tom Lane	2acab054b3	Fix error reporting for index expressions of prohibited types. If CheckAttributeType() threw an error about the datatype of an index expression column, it would report an empty column name, which is pretty unhelpful and certainly not the intended behavior. I (tgl) evidently broke this in commit `cfc5008a5`, by not noticing that the column's attname was used above where I'd placed the assignment of it. In HEAD and v12, this is trivially fixable by moving up the assignment of attname. Before v12 the code is a bit more messy; to avoid doing substantial refactoring, I took the lazy way out and just put in two copies of the assignment code. Report and patch by Amit Langote. Back-patch to all supported branches. Discussion: https://postgr.es/m/CA+HiwqFA+BGyBFimjiYXXMa2Hc3fcL0+OJOyzUNjhU4NCa_XXw@mail.gmail.com	2019-12-17 17:44:27 -05:00
Tom Lane	c35b714caf	Fix misbehavior with expression indexes on ON COMMIT DELETE ROWS tables. We implement ON COMMIT DELETE ROWS by truncating tables marked that way, which requires also truncating/rebuilding their indexes. But RelationTruncateIndexes asks the relcache for up-to-date copies of any index expressions, which may cause execution of eval_const_expressions on them, which can result in actual execution of subexpressions. This is a bad thing to have happening during ON COMMIT. Manuel Rigger reported that use of a SQL function resulted in crashes due to expectations that ActiveSnapshot would be set, which it isn't. The most obvious fix perhaps would be to push a snapshot during PreCommit_on_commit_actions, but I think that would just open the door to more problems: CommitTransaction explicitly expects that no user-defined code can be running at this point. Fortunately, since we know that no tuples exist to be indexed, there seems no need to use the real index expressions or predicates during RelationTruncateIndexes. We can set up dummy index expressions instead (we do need something that will expose the right data type, as there are places that build index tupdescs based on this), and just ignore predicates and exclusion constraints. In a green field it'd likely be better to reimplement ON COMMIT DELETE ROWS using the same "init fork" infrastructure used for unlogged relations. That seems impractical without catalog changes though, and even without that it'd be too big a change to back-patch. So for now do it like this. Per private report from Manuel Rigger. This has been broken forever, so back-patch to all supported branches.	2019-12-01 13:09:26 -05:00
Amit Kapila	14aec03502	Make the order of the header file includes consistent in backend modules. Similar to commits `7e735035f2` and `dddf4cdc33`, this commit makes the order of header file inclusion consistent for backend modules. In the passing, removed a couple of duplicate inclusions. Author: Vignesh C Reviewed-by: Kuntal Ghosh and Amit Kapila Discussion: https://postgr.es/m/CALDaNm2Sznv8RR6Ex-iJO6xAdsxgWhCoETkaYX=+9DW3q0QCfA@mail.gmail.com	2019-11-12 08:30:16 +05:30
Peter Eisentraut	604bd36711	PG_FINALLY This gives an alternative way of catching exceptions, for the common case where the cleanup code is the same in the error and non-error cases. So instead of PG_TRY(); { ... code that might throw ereport(ERROR) ... } PG_CATCH(); { cleanup(); PG_RE_THROW(); } PG_END_TRY(); cleanup(); one can write PG_TRY(); { ... code that might throw ereport(ERROR) ... } PG_FINALLY(); { cleanup(); } PG_END_TRY(); Discussion: https://www.postgresql.org/message-id/flat/95a822c3-728b-af0e-d7e5-71890507ae0c%402ndquadrant.com	2019-11-01 11:18:03 +01:00
Michael Paquier	d80be6f2f6	Fix handling of pg_class.relispartition at swap phase in REINDEX CONCURRENTLY When cancelling REINDEX CONCURRENTLY after swapping the old and new indexes (for example interruption at step 5), the old index remains around and is marked as invalid. The old index should also be manually droppable to clean up the parent relation from any invalid indexes still remaining. For a partition index reindexed, pg_class.relispartition was not getting updated, causing the index to not be droppable as DROP INDEX would look for dependencies in a partition tree, which do not exist anymore after the swap phase is done. The fix here is simple: when swapping the old and new indexes, make sure that pg_class.relispartition is correctly switched, similarly to what is done for the index name. Reported-by: Justin Pryzby Author: Michael Paquier Discussion: https://postgr.es/m/20191015164047.GA22729@telsasoft.com Backpatch-through: 12	2019-10-29 11:08:09 +09:00
Michael Paquier	68ac9cf249	Fix dependency handling at swap phase of REINDEX CONCURRENTLY When swapping the dependencies of the old and new indexes, the code has been correctly switching all links in pg_depend from the old to the new index for both referencing and referenced entries. However it forgot the fact that the new index may itself have existing entries in pg_depend, like references to the parent table attributes. This resulted in duplicated entries in pg_depend after running REINDEX CONCURRENTLY. Fix this problem by removing any existing entries in pg_depend on the new index before switching the dependencies of the old index to the new one. More regression tests are added to check the consistency of entries in pg_depend for indexes, including partition indexes. Author: Michael Paquier Discussion: https://postgr.es/m/20191025064318.GF8671@paquier.xyz Backpatch-through: 12	2019-10-28 11:57:31 +09:00
Alvaro Herrera	d2efb90dba	Update comments about progress reporting by index_drop Michaël Paquier complained that index_drop is requesting progress reporting for non-obvious reasons, so let's add a comment to explain why. Discussion: https://postgr.es/m/20191017010412.GH2602@paquier.xyz	2019-10-18 07:23:05 -03:00
Tom Lane	2810396312	Fix up handling of nondeterministic collations with pattern_ops opclasses. text_pattern_ops and its siblings can't be used with nondeterministic collations, because they use the text_eq operator which will not behave as bitwise equality if applied with a nondeterministic collation. The initial implementation of that restriction was to insert a run-time test in the related comparison functions, but that is inefficient, may throw misleading errors, and will throw errors in some cases that would work. It seems sufficient to just prevent the combination during CREATE INDEX, so do that instead. Lacking any better way to identify the opclasses involved, we need to hard-wire tests for them, which requires hand-assigned values for their OIDs, which forces a catversion bump because they previously had OIDs that would be assigned automatically. That's slightly annoying in the v12 branch, but fortunately we're not at rc1 yet, so just do it. Back-patch to v12 where nondeterministic collations were added. In passing, run make reformat-dat-files, which found some unrelated whitespace issues (slightly different ones in HEAD and v12). Peter Eisentraut, with small corrections by me Discussion: https://postgr.es/m/22566.1568675619@sss.pgh.pa.us	2019-09-21 16:29:17 -04:00
Alvaro Herrera	6212276e43	Fix progress reporting of CLUSTER / VACUUM FULL The progress state was being clobbered once the first index completed being rebuilt, causing the final phases of the operation not show anything in the progress view. This was inadvertently broken in `03f9e5cba0`, which added progress tracking for REINDEX. (The reason this bugfix is this small is that I had already noticed this problem when writing monitoring for CREATE INDEX, and had already worked around it, as can be seen in discussion starting at https://postgr.es/m/20190329150218.GA25010@alvherre.pgsql Fixing the problem is just a matter of fixing one place touched by the REINDEX monitoring.) Reported by: Álvaro Herrera Author: Álvaro Herrera Discussion: https://postgr.es/m/20190801184333.GA21369@alvherre.pgsql	2019-09-13 14:54:26 -03:00
Michael Paquier	69edf4f880	Refactor BuildIndexInfo() with the new makeIndexInfo() This portion of the code got forgotten in `7cce159` which has introduced a new routine to build this node, and this finishes the unification of the places where IndexInfo is initialized. Author: Michael Paquier Discussion: https://postgr.es/m/20190801041322.GA3435@paquier.xyz	2019-08-04 11:18:57 +09:00
Michael Paquier	7cce159349	Fix handling of expressions and predicates in REINDEX CONCURRENTLY When copying the definition of an index rebuilt concurrently for the new entry, the index information was taken directly from the old index using the relation cache. In this case, predicates and expressions have some post-processing to prepare things for the planner, which loses some information including the collations added in any of them. This inconsistency can cause issues when attempting for example a table rewrite, and makes the new indexes rebuilt concurrently inconsistent with the old entries. In order to fix the problem, fetch expressions and predicates directly from the catalog of the old entry, and fill in IndexInfo for the new index with that. This makes the process more consistent with DefineIndex(), and the code is refactored with the addition of a routine to create an IndexInfo node. Reported-by: Manuel Rigger Author: Michael Paquier Discussion: https://postgr.es/m/CA+u7OA5Hp0ra235F3czPom_FyAd-3+XwSJmX95r1+sRPOJc9VQ@mail.gmail.com Backpatch-through: 12	2019-07-29 09:58:49 +09:00
Tom Lane	1cff1b95ab	Represent Lists as expansible arrays, not chains of cons-cells. Originally, Postgres Lists were a more or less exact reimplementation of Lisp lists, which consist of chains of separately-allocated cons cells, each having a value and a next-cell link. We'd hacked that once before (commit `d0b4399d8`) to add a separate List header, but the data was still in cons cells. That makes some operations -- notably list_nth() -- O(N), and it's bulky because of the next-cell pointers and per-cell palloc overhead, and it's very cache-unfriendly if the cons cells end up scattered around rather than being adjacent. In this rewrite, we still have List headers, but the data is in a resizable array of values, with no next-cell links. Now we need at most two palloc's per List, and often only one, since we can allocate some values in the same palloc call as the List header. (Of course, extending an existing List may require repalloc's to enlarge the array. But this involves just O(log N) allocations not O(N).) Of course this is not without downsides. The key difficulty is that addition or deletion of a list entry may now cause other entries to move, which it did not before. For example, that breaks foreach() and sister macros, which historically used a pointer to the current cons-cell as loop state. We can repair those macros transparently by making their actual loop state be an integer list index; the exposed "ListCell " pointer is no longer state carried across loop iterations, but is just a derived value. (In practice, modern compilers can optimize things back to having just one loop state value, at least for simple cases with inline loop bodies.) In principle, this is a semantics change for cases where the loop body inserts or deletes list entries ahead of the current loop index; but I found no such cases in the Postgres code. The change is not at all transparent for code that doesn't use foreach() but chases lists "by hand" using lnext(). The largest share of such code in the backend is in loops that were maintaining "prev" and "next" variables in addition to the current-cell pointer, in order to delete list cells efficiently using list_delete_cell(). However, we no longer need a previous-cell pointer to delete a list cell efficiently. Keeping a next-cell pointer doesn't work, as explained above, but we can improve matters by changing such code to use a regular foreach() loop and then using the new macro foreach_delete_current() to delete the current cell. (This macro knows how to update the associated foreach loop's state so that no cells will be missed in the traversal.) There remains a nontrivial risk of code assuming that a ListCell pointer will remain good over an operation that could now move the list contents. To help catch such errors, list.c can be compiled with a new define symbol DEBUG_LIST_MEMORY_USAGE that forcibly moves list contents whenever that could possibly happen. This makes list operations significantly more expensive so it's not normally turned on (though it is on by default if USE_VALGRIND is on). There are two notable API differences from the previous code: * lnext() now requires the List's header pointer in addition to the current cell's address. * list_delete_cell() no longer requires a previous-cell argument. These changes are somewhat unfortunate, but on the other hand code using either function needs inspection to see if it is assuming anything it shouldn't, so it's not all bad. Programmers should be aware of these significant performance changes: * list_nth() and related functions are now O(1); so there's no major access-speed difference between a list and an array. * Inserting or deleting a list element now takes time proportional to the distance to the end of the list, due to moving the array elements. (However, it typically doesn't require palloc or pfree, so except in long lists it's probably still faster than before.) Notably, lcons() used to be about the same cost as lappend(), but that's no longer true if the list is long. Code that uses lcons() and list_delete_first() to maintain a stack might usefully be rewritten to push and pop at the end of the list rather than the beginning. * There are now list_insert_nth...() and list_delete_nth...() functions that add or remove a list cell identified by index. These have the data-movement penalty explained above, but there's no search penalty. * list_concat() and variants now copy the second list's data into storage belonging to the first list, so there is no longer any sharing of cells between the input lists. The second argument is now declared "const List *" to reflect that it isn't changed. This patch just does the minimum needed to get the new implementation in place and fix bugs exposed by the regression tests. As suggested by the foregoing, there's a fair amount of followup work remaining to do. Also, the ENABLE_LIST_COMPAT macros are finally removed in this commit. Code using those should have been gone a dozen years ago. Patch by me; thanks to David Rowley, Jesper Pedersen, and others for review. Discussion: https://postgr.es/m/11587.1550975080@sss.pgh.pa.us	2019-07-15 13:41:58 -04:00
Peter Eisentraut	12e037e209	Upgrade internal error message to external As part of REINDEX CONCURRENTLY, this formerly internal-only error message becomes potentially user-visible (see regression tests), so change from errmsg_internal() to errmsg(), and update comment.	2019-06-24 10:39:12 +02:00
Michael Paquier	3412030205	Fix more typos and inconsistencies in the tree Author: Alexander Lakhin Discussion: https://postgr.es/m/0a5419ea-1452-a4e6-72ff-545b1a5a8076@gmail.com	2019-06-17 16:13:16 +09:00
Peter Eisentraut	c880096dc1	Add command column to pg_stat_progress_create_index This allows determining which command is running, similar to pg_stat_progress_cluster. Discussion: https://www.postgresql.org/message-id/flat/f0e56b3b-74b7-6cbc-e207-a5ed6bee18dc%402ndquadrant.com	2019-06-04 09:29:02 +02:00
Tom Lane	8255c7a5ee	Phase 2 pgindent run for v12. Switch to 2.1 version of pg_bsd_indent. This formats multiline function declarations "correctly", that is with additional lines of parameter declarations indented to match where the first line's left parenthesis is. Discussion: https://postgr.es/m/CAEepm=0P3FeTXRcU5B2W3jv3PgRVZ-kGUXLGfd42FFhUROO3ug@mail.gmail.com	2019-05-22 13:04:48 -04:00
Tom Lane	be76af171c	Initial pgindent run for v12. This is still using the 2.0 version of pg_bsd_indent. I thought it would be good to commit this separately, so as to document the differences between 2.0 and 2.1 behavior. Discussion: https://postgr.es/m/16296.1558103386@sss.pgh.pa.us	2019-05-22 12:55:34 -04:00
Tom Lane	f912d7dec2	Fix reindexing of pg_class indexes some more. Commits `3dbb317d3` et al failed under CLOBBER_CACHE_ALWAYS testing. Investigation showed that to reindex pg_class_oid_index, we must suppress accesses to the index (via SetReindexProcessing) before we call RelationSetNewRelfilenode, or at least before we do CommandCounterIncrement therein; otherwise, relcache reloads happening within the CCI may try to fetch pg_class rows using the index's new relfilenode value, which is as yet an empty file. Of course, the point of `3dbb317d3` was that that ordering didn't work either, because then RelationSetNewRelfilenode's own update of the index's pg_class row cannot access the index, should it need to. There are various ways we might have got around that, but Andres Freund came up with a brilliant solution: for a mapped index, we can really just skip the pg_class update altogether. The only fields it was actually changing were relpages etc, but it was just setting them to zeroes which is useless make-work. (Correct new values will be installed at the end of index build.) All pg_class indexes are mapped and probably always will be, so this eliminates the problem by removing work rather than adding it, always a pleasant outcome. Having taught RelationSetNewRelfilenode to do it that way, we can revert the code reordering in reindex_index. (But I left the moved setup code where it was; there seems no reason why it has to run without use of the old index. If you're trying to fix a busted pg_class index, you'll have had to disable system index use altogether to get this far.) Moreover, this means we don't need RelationSetIndexList at all, because reindex_relation's hacking to make "REINDEX TABLE pg_class" work is likewise now unnecessary. We'll leave that code in place in the back branches, but a follow-on patch will remove it in HEAD. In passing, do some minor cleanup for commit `5c1560606` (in HEAD only), notably removing a duplicate newrnode assignment. Patch by me, using a core idea due to Andres Freund. Back-patch to all supported branches, as `3dbb317d3` was. Discussion: https://postgr.es/m/28926.1556664156@sss.pgh.pa.us	2019-05-02 19:11:28 -04:00
Andres Freund	3dbb317d32	Fix potential assertion failure when reindexing a pg_class index. When reindexing individual indexes on pg_class it was possible to either trigger an assertion failure: TRAP: FailedAssertion("!(!ReindexIsProcessingIndex(((index)->rd_id))) That's because reindex_index() called SetReindexProcessing() - which enables an asserts ensuring no index insertions happen into the index - before calling RelationSetNewRelfilenode(). That not correct for indexes on pg_class, because RelationSetNewRelfilenode() updates the relevant pg_class row, which needs to update the indexes. The are two reasons this wasn't noticed earlier. Firstly the bug doesn't trigger when reindexing all of pg_class, as reindex_relation has code "hiding" all yet-to-be-reindexed indexes. Secondly, the bug only triggers when the the update to pg_class doesn't turn out to be a HOT update - otherwise there's no index insertion to trigger the bug. Most of the time there's enough space, making this bug hard to trigger. To fix, move RelationSetNewRelfilenode() to before the SetReindexProcessing() (and, together with some other code, to outside of the PG_TRY()). To make sure the error checking intended by SetReindexProcessing() is more robust, modify CatalogIndexInsert() to check ReindexIsProcessingIndex() even when the update is a HOT update. Also add a few regression tests for REINDEXing of system catalogs. The last two improvements would have prevented some of the issues fixed in `5c1560606d` from being introduced in the first place. Reported-By: Michael Paquier Diagnosed-By: Tom Lane and Andres Freund Author: Andres Freund Reviewed-By: Tom Lane Discussion: https://postgr.es/m/20190418011430.GA19133@paquier.xyz Backpatch: 9.4-, the bug is present in all branches	2019-04-29 19:42:08 -07:00
Tom Lane	f4a3fdfbdc	Avoid order-of-execution problems with ALTER TABLE ADD PRIMARY KEY. Up to now, DefineIndex() was responsible for adding attnotnull constraints to the columns of a primary key, in any case where it hadn't been convenient for transformIndexConstraint() to mark those columns as is_not_null. It (or rather its minion index_check_primary_key) did this by executing an ALTER TABLE SET NOT NULL command for the target table. The trouble with this solution is that if we're creating the index due to ALTER TABLE ADD PRIMARY KEY, and the outer ALTER TABLE has additional sub-commands, the inner ALTER TABLE's operations executed at the wrong time with respect to the outer ALTER TABLE's operations. In particular, the inner ALTER would perform a validation scan at a point where the table's storage might be inconsistent with its catalog entries. (This is on the hairy edge of being a security problem, but AFAICS it isn't one because the inner scan would only be interested in the tuples' null bitmaps.) This can result in unexpected failures, such as the one seen in bug #15580 from Allison Kaptur. To fix, let's remove the attempt to do SET NOT NULL from DefineIndex(), reducing index_check_primary_key's role to verifying that the columns are already not null. (It shouldn't ever see such a case, but it seems wise to keep the check for safety.) Instead, make transformIndexConstraint() generate ALTER TABLE SET NOT NULL subcommands to be executed ahead of the ADD PRIMARY KEY operation in every case where it can't force the column to be created already-not-null. This requires only minor surgery in parse_utilcmd.c, and it makes for a much more satisfying spec for transformIndexConstraint(): it's no longer having to take it on faith that someone else will handle addition of NOT NULL constraints. To make that work, we have to move the execution of AT_SetNotNull into an ALTER pass that executes ahead of AT_PASS_ADD_INDEX. I moved it to AT_PASS_COL_ATTRS, and put that after AT_PASS_ADD_COL to avoid failure when the column is being added in the same command. This incidentally fixes a bug in the only previous usage of AT_PASS_COL_ATTRS, for AT_SetIdentity: it didn't work either for a newly-added column. Playing around with this exposed a separate bug in ALTER TABLE ONLY ... ADD PRIMARY KEY for partitioned tables. The intent of the ONLY modifier in that context is to prevent doing anything that would require holding lock for a long time --- but the implied SET NOT NULL would recurse to the child partitions, and do an expensive validation scan for any child where the column(s) were not already NOT NULL. To fix that, invent a new ALTER subcommand AT_CheckNotNull that just insists that a child column be already NOT NULL, and apply that, not AT_SetNotNull, when recursing to children in this scenario. This results in a slightly laxer definition of ALTER TABLE ONLY ... SET NOT NULL for partitioned tables, too: that command will now work as long as all children are already NOT NULL, whereas before it just threw up its hands if there were any partitions. In passing, clean up the API of generateClonedIndexStmt(): remove a useless argument, ensure that the output argument is not left undefined, update the header comment. A small side effect of this change is that no-such-column errors in ALTER TABLE ADD PRIMARY KEY now produce a different message that includes the table name, because they are now detected by the SET NOT NULL step which has historically worded its error that way. That seems fine to me, so I didn't make any effort to avoid the wording change. The basic bug #15580 is of very long standing, and these other bugs aren't new in v12 either. However, this is a pretty significant change in the way ALTER TABLE ADD PRIMARY KEY works. On balance it seems best not to back-patch, at least not till we get some more confidence that this patch has no new bugs. Patch by me, but thanks to Jie Zhang for a preliminary version. Discussion: https://postgr.es/m/15580-d1a6de5a3d65da51@postgresql.org Discussion: https://postgr.es/m/1396E95157071C4EBBA51892C5368521017F2E6E63@G08CNEXMBPEKD02.g08.fujitsu.local	2019-04-23 12:25:27 -04:00
Peter Eisentraut	ef6f30fe77	Fix REINDEX CONCURRENTLY of partitions In case of a partition index, when swapping the old and new index, we also need to attach the new index as a partition and detach the old one. Also, to handle partition indexes, we not only need to change dependencies referencing the index, but also dependencies of the index referencing something else. The previous code did this only specifically for a constraint, but we also need to do this for partitioned indexes. So instead write a generic function that does it for all dependencies. Author: Michael Paquier <michael@paquier.xyz> Author: Peter Eisentraut <peter.eisentraut@2ndquadrant.com> Discussion: https://www.postgresql.org/message-id/flat/DF4PR8401MB11964EDB77C860078C343BEBEE5A0%40DF4PR8401MB1196.NAMPRD84.PROD.OUTLOOK.COM#154df1fedb735190a773481765f7b874	2019-04-12 08:36:05 +02:00
Peter Eisentraut	03f9e5cba0	Report progress of REINDEX operations This uses the same infrastructure that the CREATE INDEX progress reporting uses. Add a column to pg_stat_progress_create_index to report the OID of the index being worked on. This was not necessary for CREATE INDEX, but it's useful for REINDEX. Also edit the phase descriptions a bit to be more consistent with the source code comments. Discussion: https://www.postgresql.org/message-id/ef6a6757-c36a-9e81-123f-13b19e36b7d7%402ndquadrant.com	2019-04-07 12:35:29 +02:00
Alvaro Herrera	ab0dfc961b	Report progress of CREATE INDEX operations This uses the progress reporting infrastructure added by `c16dc1aca5`, adding support for CREATE INDEX and CREATE INDEX CONCURRENTLY. There are two pieces to this: one is index-AM-agnostic, and the other is AM-specific. The latter is fairly elaborate for btrees, including reportage for parallel index builds and the separate phases that btree index creation uses; other index AMs, which are much simpler in their building procedures, have simplistic reporting only, but that seems sufficient, at least for non-concurrent builds. The index-AM-agnostic part is fairly complete, providing insight into the CONCURRENTLY wait phases as well as block-based progress during the index validation table scan. (The index validation index scan requires patching each AM, which has not been included here.) Reviewers: Rahila Syed, Pavan Deolasee, Tatsuro Yamada Discussion: https://postgr.es/m/20181220220022.mg63bhk26zdpvmcj@alvherre.pgsql	2019-04-02 15:18:08 -03:00
Peter Eisentraut	5dc92b844e	REINDEX CONCURRENTLY This adds the CONCURRENTLY option to the REINDEX command. A REINDEX CONCURRENTLY on a specific index creates a new index (like CREATE INDEX CONCURRENTLY), then renames the old index away and the new index in place and adjusts the dependencies, and then drops the old index (like DROP INDEX CONCURRENTLY). The REINDEX command also has the capability to run its other variants (TABLE, DATABASE) with the CONCURRENTLY option (but not SYSTEM). The reindexdb command gets the --concurrently option. Author: Michael Paquier, Andreas Karlsson, Peter Eisentraut Reviewed-by: Andres Freund, Fujii Masao, Jim Nasby, Sergei Kornilov Discussion: https://www.postgresql.org/message-id/flat/60052986-956b-4478-45ed-8bd119e9b9cf%402ndquadrant.com#74948a1044c56c5e817a5050f554ddee	2019-03-29 08:26:33 +01:00
Andres Freund	d25f519107	tableam: relation creation, VACUUM FULL/CLUSTER, SET TABLESPACE. This moves the responsibility for: - creating the storage necessary for a relation, including creating a new relfilenode for a relation with existing storage - non-transactional truncation of a relation - VACUUM FULL / CLUSTER's rewrite of a table below tableam. This is fairly straight forward, with a bit of complexity smattered in to move the computation of xid / multixid horizons below the AM, as they don't make sense for every table AM. Author: Andres Freund Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de	2019-03-28 20:01:43 -07:00
Andres Freund	2a96909a4a	tableam: Support for an index build's initial table scan(s). To support building indexes over tables of different AMs, the scans to do so need to be routed through the table AM. While moving a fair amount of code, nearly all the changes are just moving code to below a callback. Currently the range based interface wouldn't make much sense for non block based table AMs. But that seems aceptable for now. Author: Andres Freund Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de	2019-03-27 19:59:06 -07:00
Alvaro Herrera	126d631222	Fix partitioned index creation bug with dropped columns ALTER INDEX .. ATTACH PARTITION fails if the partitioned table where the index is defined contains more dropped columns than its partition, with this message: ERROR: incorrect attribute map The cause was that one caller of CompareIndexInfo was passing the number of attributes of the partition rather than the parent, which confused the length check. Repair. This can cause pg_upgrade to fail when used on such a database. Leave some more objects around after regression tests, so that the case is detected by pg_upgrade test suite. Remove some spurious empty lines noticed while looking for other cases of the same problem. Discussion: https://postgr.es/m/20190326213924.GA2322@alvherre.pgsql	2019-03-26 20:19:28 -03:00
Robert Haas	6f97457e0d	Add progress reporting for CLUSTER and VACUUM FULL. This uses the same progress reporting infrastructure added in commit `c16dc1aca5` and extends it to these additional cases. We lack the ability to track the internal progress of sorts and index builds so the information reported is coarse-grained for some parts of the operation, but it still seems like a significant improvement over having nothing at all. Tatsuro Yamada, reviewed by Thomas Munro, Masahiko Sawada, Michael Paquier, Jeff Janes, Alvaro Herrera, Rafia Sabih, and by me. A fair amount of polishing also by me. Discussion: http://postgr.es/m/59A77072.3090401@lab.ntt.co.jp	2019-03-25 10:59:04 -04:00
Alvaro Herrera	7e7c57bbb2	Fix dependency recording bug for partitioned PKs When DefineIndex recurses to create constraints on partitions, it needs to use the value returned by index_constraint_create to set up partition dependencies. However, in the course of fixing the DEPENDENCY_INTERNAL_AUTO mess, commit `1d92a0c9f7` introduced some code to that function that clobbered the return value, causing the recorded OID to be of the wrong object. Close examination of pg_depend after creating the tables leads to indescribable objects :-( My sin (in commit `bdc3d7fa23`, while preparing for DDL deparsing in event triggers) was to use a variable name for the return value that's typically used for throwaway objects in dependency-setting calls ("referenced"). Fix by changing the variable names to match extended practice (the return value is "myself" rather than "referenced".) The pg_upgrade test notices the problem (in an indirect way: the pg_dump outputs are in different order), but only if you create the objects in a specific way that wasn't being used in the existing tests. Add a stanza to leave some objects around that shows the bug. Catversion bump because preexisting databases might have bogus pg_depend entries. Discussion: https://postgr.es/m/20190318204235.GA30360@alvherre.pgsql	2019-03-21 18:34:29 -03:00
Andres Freund	c2fe139c20	tableam: Add and use scan APIs. Too allow table accesses to be not directly dependent on heap, several new abstractions are needed. Specifically: 1) Heap scans need to be generalized into table scans. Do this by introducing TableScanDesc, which will be the "base class" for individual AMs. This contains the AM independent fields from HeapScanDesc. The previous heap_{beginscan,rescan,endscan} et al. have been replaced with a table_ version. There's no direct replacement for heap_getnext(), as that returned a HeapTuple, which is undesirable for a other AMs. Instead there's table_scan_getnextslot(). But note that heap_getnext() lives on, it's still used widely to access catalog tables. This is achieved by new scan_begin, scan_end, scan_rescan, scan_getnextslot callbacks. 2) The portion of parallel scans that's shared between backends need to be able to do so without the user doing per-AM work. To achieve that new parallelscan_{estimate, initialize, reinitialize} callbacks are introduced, which operate on a new ParallelTableScanDesc, which again can be subclassed by AMs. As it is likely that several AMs are going to be block oriented, block oriented callbacks that can be shared between such AMs are provided and used by heap. table_block_parallelscan_{estimate, intiialize, reinitialize} as callbacks, and table_block_parallelscan_{nextpage, init} for use in AMs. These operate on a ParallelBlockTableScanDesc. 3) Index scans need to be able to access tables to return a tuple, and there needs to be state across individual accesses to the heap to store state like buffers. That's now handled by introducing a sort-of-scan IndexFetchTable, which again is intended to be subclassed by individual AMs (for heap IndexFetchHeap). The relevant callbacks for an AM are index_fetch_{end, begin, reset} to create the necessary state, and index_fetch_tuple to retrieve an indexed tuple. Note that index_fetch_tuple implementations need to be smarter than just blindly fetching the tuples for AMs that have optimizations similar to heap's HOT - the currently alive tuple in the update chain needs to be fetched if appropriate. Similar to table_scan_getnextslot(), it's undesirable to continue to return HeapTuples. Thus index_fetch_heap (might want to rename that later) now accepts a slot as an argument. Core code doesn't have a lot of call sites performing index scans without going through the systable_* API (in contrast to loads of heap_getnext calls and working directly with HeapTuples). Index scans now store the result of a search in IndexScanDesc->xs_heaptid, rather than xs_ctup->t_self. As the target is not generally a HeapTuple anymore that seems cleaner. To be able to sensible adapt code to use the above, two further callbacks have been introduced: a) slot_callbacks returns a TupleTableSlotOps* suitable for creating slots capable of holding a tuple of the AMs type. table_slot_callbacks() and table_slot_create() are based upon that, but have additional logic to deal with views, foreign tables, etc. While this change could have been done separately, nearly all the call sites that needed to be adapted for the rest of this commit also would have been needed to be adapted for table_slot_callbacks(), making separation not worthwhile. b) tuple_satisfies_snapshot checks whether the tuple in a slot is currently visible according to a snapshot. That's required as a few places now don't have a buffer + HeapTuple around, but a slot (which in heap's case internally has that information). Additionally a few infrastructure changes were needed: I) SysScanDesc, as used by systable_{beginscan, getnext} et al. now internally uses a slot to keep track of tuples. While systable_getnext() still returns HeapTuples, and will so for the foreseeable future, the index API (see 1) above) now only deals with slots. The remainder, and largest part, of this commit is then adjusting all scans in postgres to use the new APIs. Author: Andres Freund, Haribabu Kommi, Alvaro Herrera Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql	2019-03-11 12:46:41 -07:00
Andres Freund	8586bf7ed8	tableam: introduce table AM infrastructure. This introduces the concept of table access methods, i.e. CREATE ACCESS METHOD ... TYPE TABLE and CREATE TABLE ... USING (storage-engine). No table access functionality is delegated to table AMs as of this commit, that'll be done in following commits. Subsequent commits will incrementally abstract table access functionality to be routed through table access methods. That change is too large to be reviewed & committed at once, so it'll be done incrementally. Docs will be updated at the end, as adding them incrementally would likely make them less coherent, and definitely is a lot more work, without a lot of benefit. Table access methods are specified similar to index access methods, i.e. pg_am.amhandler returns, as INTERNAL, a pointer to a struct with callbacks. In contrast to index AMs that struct needs to live as long as a backend, typically that's achieved by just returning a pointer to a constant struct. Psql's \d+ now displays a table's access method. That can be disabled with HIDE_TABLEAM=true, which is mainly useful so regression tests can be run against different AMs. It's quite possible that this behaviour still needs to be fine tuned. For now it's not allowed to set a table AM for a partitioned table, as we've not resolved how partitions would inherit that. Disallowing allows us to introduce, if we decide that's the way forward, such a behaviour without a compatibility break. Catversion bumped, to add the heap table AM and references to it. Author: Haribabu Kommi, Andres Freund, Alvaro Herrera, Dimitri Golgov and others Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql https://postgr.es/m/20190107235616.6lur25ph22u5u5av@alap3.anarazel.de https://postgr.es/m/20190304234700.w5tmhducs5wxgzls@alap3.anarazel.de	2019-03-06 09:54:38 -08:00
Tom Lane	1d92a0c9f7	Redesign the partition dependency mechanism. The original setup for dependencies of partitioned objects had serious problems: 1. It did not verify that a drop cascading to a partition-child object also cascaded to at least one of the object's partition parents. Now, normally a child object would share all its dependencies with one or another parent (e.g. a child index's opclass dependencies would be shared with the parent index), so that this oversight is usually harmless. But if some dependency failed to fit this pattern, the child could be dropped while all its parents remain, creating a logically broken situation. (It's easy to construct artificial cases that break it, such as attaching an unrelated extension dependency to the child object and then dropping the extension. I'm not sure if any less-artificial cases exist.) 2. Management of partition dependencies during ATTACH/DETACH PARTITION was complicated and buggy; for example, after detaching a partition table it was possible to create cases where a formerly-child index should be dropped and was not, because the correct set of dependencies had not been reconstructed. Less seriously, because multiple partition relationships were represented identically in pg_depend, there was an order-of-traversal dependency on which partition parent was cited in error messages. We also had some pre-existing order-of-traversal hazards for error messages related to internal and extension dependencies. This is cosmetic to users but causes testing problems. To fix #1, add a check at the end of the partition tree traversal to ensure that at least one partition parent got deleted. To fix #2, establish a new policy that partition dependencies are in addition to, not instead of, a child object's usual dependencies; in this way ATTACH/DETACH PARTITION need not cope with adding or removing the usual dependencies. To fix the cosmetic problem, distinguish between primary and secondary partition dependency entries in pg_depend, by giving them different deptypes. (They behave identically except for having different priorities for being cited in error messages.) This means that the former 'I' dependency type is replaced with new 'P' and 'S' types. This also fixes a longstanding bug that after handling an internal dependency by recursing to the owning object, findDependentObjects did not verify that the current target was now scheduled for deletion, and did not apply the current recursion level's objflags to it. Perhaps that should be back-patched; but in the back branches it would only matter if some concurrent transaction had removed the internal-linkage pg_depend entry before the recursive call found it, or the recursive call somehow failed to find it, both of which seem unlikely. Catversion bump because the contents of pg_depend change for partitioning relationships. Patch HEAD only. It's annoying that we're not fixing #2 in v11, but there seems no practical way to do so given that the problem is exactly a poor choice of what entries to put in pg_depend. We can't really fix that while staying compatible with what's in pg_depend in existing v11 installations. Discussion: https://postgr.es/m/CAH2-Wzkypv1R+teZrr71U23J578NnTBt2X8+Y=Odr4pOdW1rXg@mail.gmail.com	2019-02-11 14:41:17 -05:00
Tom Lane	5f5c014590	Allow RECORD and RECORD[] to be specified in function coldeflists. We can't allow these pseudo-types to be used as table column types, because storing an anonymous record value in a table would result in data that couldn't be understood by other sessions. However, it seems like there's no harm in allowing the case in a column definition list that's specifying what a function-returning-record returns. The data involved is all local to the current session, so we should be just as able to resolve its actual tuple type as we are for the function-returning-record's top-level tuple output. Elvis Pranskevichus, with cosmetic changes by me Discussion: https://postgr.es/m/11038447.kQ5A9Uj5xi@hammer.magicstack.net	2019-01-30 19:25:33 -05:00
Tom Lane	f09346a9c6	Refactor planner's header files. Create a new header optimizer/optimizer.h, which exposes just the planner functions that can be used "at arm's length", without need to access Paths or the other planner-internal data structures defined in nodes/relation.h. This is intended to provide the whole planner API seen by most of the rest of the system; although FDWs still need to use additional stuff, and more thought is also needed about just what selfuncs.c should rely on. The main point of doing this now is to limit the amount of new #include baggage that will be needed by "planner support functions", which I expect to introduce later, and which will be in relevant datatype modules rather than anywhere near the planner. This commit just moves relevant declarations into optimizer.h from other header files (a couple of which go away because everything got moved), and adjusts #include lists to match. There's further cleanup that could be done if we want to decide that some stuff being exposed by optimizer.h doesn't belong in the planner at all, but I'll leave that for another day. Discussion: https://postgr.es/m/11460.1548706639@sss.pgh.pa.us	2019-01-29 15:48:51 -05:00
Michael Paquier	289198c0d9	Remove argument isprimary from index_build() The flag was introduced in `3fdeb18`, but `f66e8bf` actually forgot to finish the cleanup as index_update_stats() has simplified its interface. Author: Michael Paquier Discussion: https://postgr.es/m/20190122080852.GB3873@paquier.xyz	2019-01-24 07:57:09 +09:00
Andres Freund	346ed70b0a	Rename RelationData.rd_amroutine to rd_indam. The upcoming table AM support makes rd_amroutine to generic, as its only about index AMs. The new name makes that clear, and is shorter to boot. Author: Andres Freund Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de	2019-01-21 17:36:55 -08:00
Andres Freund	c91560defc	Move remaining code from tqual.[ch] to heapam.h / heapam_visibility.c. Given these routines are heap specific, and that there will be more generic visibility support in via table AM, it makes sense to move the prototypes to heapam.h (routines like HeapTupleSatisfiesVacuum will not be exposed in a generic fashion, because they are too storage specific). Similarly, the code in tqual.c is specific to heap, so moving it into access/heap/ makes sense. Author: Andres Freund Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de	2019-01-21 17:07:10 -08:00
Andres Freund	e0c4ec0728	Replace uses of heap_open et al with the corresponding table_* function. Author: Andres Freund Discussion: https://postgr.es/m/20190111000539.xbv7s6w7ilcvm7dp@alap3.anarazel.de	2019-01-21 10:51:37 -08:00
Tom Lane	1c53c4dec3	Finish reverting "recheck_on_update" patch. This reverts commit `c203d6cf8` and some follow-on fixes, completing the task begun in commit `5d28c9bd7`. If that feature is ever resurrected, the code will look quite a bit different from this, so it seems best to start from a clean slate. The v11 branch is not touched; in that branch, the recheck_on_update storage option remains present, but nonfunctional and undocumented. Discussion: https://postgr.es/m/20190114223409.3tcvejfhlvbucrv5@alap3.anarazel.de	2019-01-15 12:07:10 -05:00
Andres Freund	4c850ecec6	Don't include heapam.h from others headers. heapam.h previously was included in a number of widely used headers (e.g. execnodes.h, indirectly in executor.h, ...). That's problematic on its own, as heapam.h contains a lot of low-level details that don't need to be exposed that widely, but becomes more problematic with the upcoming introduction of pluggable table storage - it seems inappropriate for heapam.h to be included that widely afterwards. heapam.h was largely only included in other headers to get the HeapScanDesc typedef (which was defined in heapam.h, even though HeapScanDescData is defined in relscan.h). The better solution here seems to be to just use the underlying struct (forward declared where necessary). Similar for BulkInsertState. Another problem was that LockTupleMode was used in executor.h - parts of the file tried to cope without heapam.h, but due to the fact that it indirectly included it, several subsequent violations of that goal were not not noticed. We could just reuse the approach of declaring parameters as int, but it seems nicer to move LockTupleMode to lockoptions.h - that's not a perfect location, but also doesn't seem bad. As a number of files relied on implicitly included heapam.h, a significant number of files grew an explicit include. It's quite probably that a few external projects will need to do the same. Author: Andres Freund Reviewed-By: Alvaro Herrera Discussion: https://postgr.es/m/20190114000701.y4ttcb74jpskkcfb@alap3.anarazel.de	2019-01-14 16:24:41 -08:00
Bruce Momjian	97c39498e5	Update copyright for 2019 Backpatch-through: certain files through 9.4	2019-01-02 12:44:25 -05:00
Peter Eisentraut	ae4472c619	Remove obsolete IndexIs* macros Remove IndexIsValid(), IndexIsReady(), IndexIsLive() in favor of accessing the index structure directly. These macros haven't been used consistently, and the original reason of maintaining source compatibility with PostgreSQL 9.2 is gone. Discussion: https://www.postgresql.org/message-id/flat/d419147c-09d4-6196-5d9d-0234b230880a%402ndquadrant.com	2018-12-27 10:07:46 +01:00
Andres Freund	578b229718	Remove WITH OIDS support, change oid catalog column visibility. Previously tables declared WITH OIDS, including a significant fraction of the catalog tables, stored the oid column not as a normal column, but as part of the tuple header. This special column was not shown by default, which was somewhat odd, as it's often (consider e.g. pg_class.oid) one of the more important parts of a row. Neither pg_dump nor COPY included the contents of the oid column by default. The fact that the oid column was not an ordinary column necessitated a significant amount of special case code to support oid columns. That already was painful for the existing, but upcoming work aiming to make table storage pluggable, would have required expanding and duplicating that "specialness" significantly. WITH OIDS has been deprecated since 2005 (commit ff02d0a05280e0). Remove it. Removing includes: - CREATE TABLE and ALTER TABLE syntax for declaring the table to be WITH OIDS has been removed (WITH (oids[ = true]) will error out) - pg_dump does not support dumping tables declared WITH OIDS and will issue a warning when dumping one (and ignore the oid column). - restoring an pg_dump archive with pg_restore will warn when restoring a table with oid contents (and ignore the oid column) - COPY will refuse to load binary dump that includes oids. - pg_upgrade will error out when encountering tables declared WITH OIDS, they have to be altered to remove the oid column first. - Functionality to access the oid of the last inserted row (like plpgsql's RESULT_OID, spi's SPI_lastoid, ...) has been removed. The syntax for declaring a table WITHOUT OIDS (or WITH (oids = false) for CREATE TABLE) is still supported. While that requires a bit of support code, it seems unnecessary to break applications / dumps that do not use oids, and are explicit about not using them. The biggest user of WITH OID columns was postgres' catalog. This commit changes all 'magic' oid columns to be columns that are normally declared and stored. To reduce unnecessary query breakage all the newly added columns are still named 'oid', even if a table's column naming scheme would indicate 'reloid' or such. This obviously requires adapting a lot code, mostly replacing oid access via HeapTupleGetOid() with access to the underlying Form_pg_->oid column. The bootstrap process now assigns oids for all oid columns in genbki.pl that do not have an explicit value (starting at the largest oid previously used), only oids assigned later by oids will be above FirstBootstrapObjectId. As the oid column now is a normal column the special bootstrap syntax for oids has been removed. Oids are not automatically assigned during insertion anymore, all backend code explicitly assigns oids with GetNewOidWithIndex(). For the rare case that insertions into the catalog via SQL are called for the new pg_nextoid() function can be used (which only works on catalog tables). The fact that oid columns on system tables are now normal columns means that they will be included in the set of columns expanded by (i.e. SELECT * FROM pg_class will now include the table's oid, previously it did not). It'd not technically be hard to hide oid column by default, but that'd mean confusing behavior would either have to be carried forward forever, or it'd cause breakage down the line. While it's not unlikely that further adjustments are needed, the scope/invasiveness of the patch makes it worthwhile to get merge this now. It's painful to maintain externally, too complicated to commit after the code code freeze, and a dependency of a number of other patches. Catversion bump, for obvious reasons. Author: Andres Freund, with contributions by John Naylor Discussion: https://postgr.es/m/20180930034810.ywp2c7awz7opzcfr@alap3.anarazel.de	2018-11-20 16:00:17 -08:00
Andres Freund	4da597edf1	Make TupleTableSlots extensible, finish split of existing slot type. This commit completes the work prepared in `1a0586de36`, splitting the old TupleTableSlot implementation (which could store buffer, heap, minimal and virtual slots) into four different slot types. As described in the aforementioned commit, this is done with the goal of making tuple table slots extensible, to allow for pluggable table access methods. To achieve runtime extensibility for TupleTableSlots, operations on slots that can differ between types of slots are performed using the TupleTableSlotOps struct provided at slot creation time. That includes information from the size of TupleTableSlot struct to be allocated, initialization, deforming etc. See the struct's definition for more detailed information about callbacks TupleTableSlotOps. I decided to rename TTSOpsBufferTuple to TTSOpsBufferHeapTuple and ExecCopySlotTuple to ExecCopySlotHeapTuple, as that seems more consistent with other naming introduced in recent patches. There's plenty optimization potential in the slot implementation, but according to benchmarking the state after this commit has similar performance characteristics to before this set of changes, which seems sufficient. There's a few changes in execReplication.c that currently need to poke through the slot abstraction, that'll be repaired once the pluggable storage patchset provides the necessary infrastructure. Author: Andres Freund and Ashutosh Bapat, with changes by Amit Khandekar Discussion: https://postgr.es/m/20181105210039.hh4vvi4vwoq5ba2q@alap3.anarazel.de	2018-11-16 16:35:15 -08:00
Andres Freund	1a0586de36	Introduce notion of different types of slots (without implementing them). Upcoming work intends to allow pluggable ways to introduce new ways of storing table data. Accessing those table access methods from the executor requires TupleTableSlots to be carry tuples in the native format of such storage methods; otherwise there'll be a significant conversion overhead. Different access methods will require different data to store tuples efficiently (just like virtual, minimal, heap already require fields in TupleTableSlot). To allow that without requiring additional pointer indirections, we want to have different structs (embedding TupleTableSlot) for different types of slots. Thus different types of slots are needed, which requires adapting creators of slots. The slot that most efficiently can represent a type of tuple in an executor node will often depend on the type of slot a child node uses. Therefore we need to track the type of slot is returned by nodes, so parent slots can create slots based on that. Relatedly, JIT compilation of tuple deforming needs to know which type of slot a certain expression refers to, so it can create an appropriate deforming function for the type of tuple in the slot. But not all nodes will only return one type of slot, e.g. an append node will potentially return different types of slots for each of its subplans. Therefore add function that allows to query the type of a node's result slot, and whether it'll always be the same type (whether it's fixed). This can be queried using ExecGetResultSlotOps(). The scan, result, inner, outer type of slots are automatically inferred from ExecInitScanTupleSlot(), ExecInitResultSlot(), left/right subtrees respectively. If that's not correct for a node, that can be overwritten using new fields in PlanState. This commit does not introduce the actually abstracted implementation of different kind of TupleTableSlots, that will be left for a followup commit. The different types of slots introduced will, for now, still use the same backing implementation. While this already partially invalidates the big comment in tuptable.h, it seems to make more sense to update it later, when the different TupleTableSlot implementations actually exist. Author: Ashutosh Bapat and Andres Freund, with changes by Amit Khandekar Discussion: https://postgr.es/m/20181105210039.hh4vvi4vwoq5ba2q@alap3.anarazel.de	2018-11-15 22:00:30 -08:00

1 2 3 4 5 ...

545 Commits