Commit Graph

134 Commits

Author SHA1 Message Date
Andres Freund dc7420c2c9 snapshot scalability: Don't compute global horizons while building snapshots.
To make GetSnapshotData() more scalable, it cannot not look at at each proc's
xmin: While snapshot contents do not need to change whenever a read-only
transaction commits or a snapshot is released, a proc's xmin is modified in
those cases. The frequency of xmin modifications leads to, particularly on
higher core count systems, many cache misses inside GetSnapshotData(), despite
the data underlying a snapshot not changing. That is the most
significant source of GetSnapshotData() scaling poorly on larger systems.

Without accessing xmins, GetSnapshotData() cannot calculate accurate horizons /
thresholds as it has so far. But we don't really have to: The horizons don't
actually change that much between GetSnapshotData() calls. Nor are the horizons
actually used every time a snapshot is built.

The trick this commit introduces is to delay computation of accurate horizons
until there use and using horizon boundaries to determine whether accurate
horizons need to be computed.

The use of RecentGlobal[Data]Xmin to decide whether a row version could be
removed has been replaces with new GlobalVisTest* functions.  These use two
thresholds to determine whether a row can be pruned:
1) definitely_needed, indicating that rows deleted by XIDs >= definitely_needed
   are definitely still visible.
2) maybe_needed, indicating that rows deleted by XIDs < maybe_needed can
   definitely be removed
GetSnapshotData() updates definitely_needed to be the xmin of the computed
snapshot.

When testing whether a row can be removed (with GlobalVisTestIsRemovableXid())
and the tested XID falls in between the two (i.e. XID >= maybe_needed && XID <
definitely_needed) the boundaries can be recomputed to be more accurate. As it
is not cheap to compute accurate boundaries, we limit the number of times that
happens in short succession.  As the boundaries used by
GlobalVisTestIsRemovableXid() are never reset (with maybe_needed updated by
GetSnapshotData()), it is likely that further test can benefit from an earlier
computation of accurate horizons.

To avoid regressing performance when old_snapshot_threshold is set (as that
requires an accurate horizon to be computed), heap_page_prune_opt() doesn't
unconditionally call TransactionIdLimitedForOldSnapshots() anymore. Both the
computation of the limited horizon, and the triggering of errors (with
SetOldSnapshotThresholdTimestamp()) is now only done when necessary to remove
tuples.

This commit just removes the accesses to PGXACT->xmin from
GetSnapshotData(), but other members of PGXACT residing in the same
cache line are accessed. Therefore this in itself does not result in a
significant improvement. Subsequent commits will take advantage of the
fact that GetSnapshotData() now does not need to access xmins anymore.

Note: This contains a workaround in heap_page_prune_opt() to keep the
snapshot_too_old tests working. While that workaround is ugly, the tests
currently are not meaningful, and it seems best to address them separately.

Author: Andres Freund <andres@anarazel.de>
Reviewed-By: Robert Haas <robertmhaas@gmail.com>
Reviewed-By: Thomas Munro <thomas.munro@gmail.com>
Reviewed-By: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/20200301083601.ews6hz5dduc3w2se@alap3.anarazel.de
2020-08-12 16:03:49 -07:00
Tom Lane 9f9682783b Invent "amadjustmembers" AM method for validating opclass members.
This allows AM-specific knowledge to be applied during creation of
pg_amop and pg_amproc entries.  Specifically, the AM knows better than
core code which entries to consider as required or optional.  Giving
the latter entries the appropriate sort of dependency allows them to
be dropped without taking out the whole opclass or opfamily; which
is something we'd like to have to correct obsolescent entries in
extensions.

This callback also opens the door to performing AM-specific validity
checks during opclass creation, rather than hoping than an opclass
developer will remember to test with "amvalidate".  For the most part
I've not actually added any such checks yet; that can happen in a
follow-on patch.  (Note that we shouldn't remove any tests from
"amvalidate", as those are still needed to cross-check manually
constructed entries in the initdb data.  So adding tests to
"amadjustmembers" will be somewhat duplicative, but it seems like
a good idea anyway.)

Patch by me, reviewed by Alexander Korotkov, Hamid Akhtar, and
Anastasia Lubennikova.

Discussion: https://postgr.es/m/4578.1565195302@sss.pgh.pa.us
2020-08-01 17:12:47 -04:00
Alexander Korotkov a44dd932ff Fix masking of SP-GiST pages during xlog consistency check
spg_mask() didn't take into account that pd_lower equal to SizeOfPageHeaderData
is still valid value.  This commit fixes that.  Backpatch to 11, where
spg_mask() pg_lower check was introduced.

Reported-by: Michael Paquier
Discussion: https://postgr.es/m/20200615131405.GM52676%40paquier.xyz
Backpatch-through: 11
2020-06-20 17:34:51 +03:00
Tom Lane 5cbfce562f Initial pgindent and pgperltidy run for v13.
Includes some manual cleanup of places that pgindent messed up,
most of which weren't per project style anyway.

Notably, it seems some people didn't absorb the style rules of
commit c9d297751, because there were a bunch of new occurrences
of function calls with a newline just after the left paren, all
with faulty expectations about how the rest of the call would get
indented.
2020-05-14 13:06:50 -04:00
Alexander Korotkov 364bdd0b41 Fix missing SP-GiST support in 911e702077
911e702077 misses setting of amoptsprocnum for SP-GiST.  This commit fixes
that.
2020-03-30 23:45:03 +03:00
Alexander Korotkov 911e702077 Implement operator class parameters
PostgreSQL provides set of template index access methods, where opclasses have
much freedom in the semantics of indexing.  These index AMs are GiST, GIN,
SP-GiST and BRIN.  There opclasses define representation of keys, operations on
them and supported search strategies.  So, it's natural that opclasses may be
faced some tradeoffs, which require user-side decision.  This commit implements
opclass parameters allowing users to set some values, which tell opclass how to
index the particular dataset.

This commit doesn't introduce new storage in system catalog.  Instead it uses
pg_attribute.attoptions, which is used for table column storage options but
unused for index attributes.

In order to evade changing signature of each opclass support function, we
implement unified way to pass options to opclass support functions.  Options
are set to fn_expr as the constant bytea expression.  It's possible due to the
fact that opclass support functions are executed outside of expressions, so
fn_expr is unused for them.

This commit comes with some examples of opclass options usage.  We parametrize
signature length in GiST.  That applies to multiple opclasses: tsvector_ops,
gist__intbig_ops, gist_ltree_ops, gist__ltree_ops, gist_trgm_ops and
gist_hstore_ops.  Also we parametrize maximum number of integer ranges for
gist__int_ops.  However, the main future usage of this feature is expected
to be json, where users would be able to specify which way to index particular
json parts.

Catversion is bumped.

Discussion: https://postgr.es/m/d22c3a18-31c7-1879-fc11-4c1ce2f5e5af%40postgrespro.ru
Author: Nikita Glukhov, revised by me
Reviwed-by: Nikolay Shaplov, Robert Haas, Tom Lane, Tomas Vondra, Alvaro Herrera
2020-03-30 19:17:23 +03:00
Alvaro Herrera c9d2977519 Clean up newlines following left parentheses
We used to strategically place newlines after some function call left
parentheses to make pgindent move the argument list a few chars to the
left, so that the whole line would fit under 80 chars.  However,
pgindent no longer does that, so the newlines just made the code
vertically longer for no reason.  Remove those newlines, and reflow some
of those lines for some extra naturality.

Reviewed-by: Michael Paquier, Tom Lane
Discussion: https://postgr.es/m/20200129200401.GA6303@alvherre.pgsql
2020-01-30 13:42:14 -03:00
Amit Kapila 4d8a8d0c73 Introduce IndexAM fields for parallel vacuum.
Introduce new fields amusemaintenanceworkmem and amparallelvacuumoptions
in IndexAmRoutine for parallel vacuum.  The amusemaintenanceworkmem tells
whether a particular IndexAM uses maintenance_work_mem or not.  This will
help in controlling the memory used by individual workers as otherwise,
each worker can consume memory equal to maintenance_work_mem.  The
amparallelvacuumoptions tell whether a particular IndexAM participates in
a parallel vacuum and if so in which phase (bulkdelete, vacuumcleanup) of
vacuum.

Author: Masahiko Sawada and Amit Kapila
Reviewed-by: Dilip Kumar, Amit Kapila, Tomas Vondra and Robert Haas
Discussion:
https://postgr.es/m/CAD21AoDTPMgzSkV4E3SFo1CH_x50bf5PqZFQf4jmqjk-C03BWg@mail.gmail.com
https://postgr.es/m/CAA4eK1LmcD5aPogzwim5Nn58Ki+74a6Edghx4Wd8hAskvHaq5A@mail.gmail.com
2020-01-15 07:24:14 +05:30
Bruce Momjian 7559d8ebfa Update copyrights for 2020
Backpatch-through: update all files in master, backpatch legal files through 9.4
2020-01-01 12:21:45 -05:00
Michael Paquier 7854e07f25 Revert "Rename files and headers related to index AM"
This follows multiple complains from Peter Geoghegan, Andres Freund and
Alvaro Herrera that this issue ought to be dug more before actually
happening, if it happens.

Discussion: https://postgr.es/m/20191226144606.GA5659@alvherre.pgsql
2019-12-27 08:09:00 +09:00
Michael Paquier 8ce3aa9b59 Rename files and headers related to index AM
The following renaming is done so as source files related to index
access methods are more consistent with table access methods (the
original names used for index AMs ware too generic, and could be
confused as including features related to table AMs):
- amapi.h -> indexam.h.
- amapi.c -> indexamapi.c.  Here we have an equivalent with
backend/access/table/tableamapi.c.
- amvalidate.c -> indexamvalidate.c.
- amvalidate.h -> indexamvalidate.h.
- genam.c -> indexgenam.c.
- genam.h -> indexgenam.h.

This has been discussed during the development of v12 when table AM was
worked on, but the renaming never happened.

Author: Michael Paquier
Reviewed-by: Fabien Coelho, Julien Rouhaud
Discussion: https://postgr.es/m/20191223053434.GF34339@paquier.xyz
2019-12-25 10:23:39 +09:00
Michael Paquier 4cb658af70 Refactor reloption handling for index AMs in-core
This reworks the reloption parsing and build of a couple of index AMs by
creating new structures for each index AM's options.  This split was
already done for BRIN, GIN and GiST (which actually has a fillfactor
parameter), but not for hash, B-tree and SPGiST which relied on
StdRdOptions due to an overlap with the default option set.

This saves a couple of bytes for rd_options in each relcache entry with
indexes making use of relation options, and brings more consistency
between all index AMs.  While on it, add a couple of AssertMacro() calls
to make sure that utility macros to grab values of reloptions are used
with the expected index AM.

Author: Nikolay Shaplov
Reviewed-by: Amit Langote, Michael Paquier, Álvaro Herrera, Dent John
Discussion: https://postgr.es/m/4127670.gFlpRb6XCm@x200m
2019-11-25 09:40:53 +09:00
Amit Kapila 14aec03502 Make the order of the header file includes consistent in backend modules.
Similar to commits 7e735035f2 and dddf4cdc33, this commit makes the order
of header file inclusion consistent for backend modules.

In the passing, removed a couple of duplicate inclusions.

Author: Vignesh C
Reviewed-by: Kuntal Ghosh and Amit Kapila
Discussion: https://postgr.es/m/CALDaNm2Sznv8RR6Ex-iJO6xAdsxgWhCoETkaYX=+9DW3q0QCfA@mail.gmail.com
2019-11-12 08:30:16 +05:30
Andres Freund aae50236e4 Pass ItemPointer not HeapTuple to IndexBuildCallback.
Not all AMs use HeapTuples internally, making it inconvenient to pass
a HeapTuple. As the index callbacks really only need the TID, not the
full tuple, modify callback to only take ItemPointer.

Author: Ashwin Agrawal
Reviewed-By: Andres Freund
Discussion: https://postgr.es/m/CALfoeis6=8ehuR=VNtHvj3z16cYfCwPdTcpaxU+sfSUJ5QgR3g@mail.gmail.com
2019-11-08 11:49:29 -08:00
Andres Freund 01368e5d9d Split all OBJS style lines in makefiles into one-line-per-entry style.
When maintaining or merging patches, one of the most common sources
for conflicts are the list of objects in makefiles. Especially when
the split across lines has been changed on both sides, which is
somewhat common due to attempting to stay below 80 columns, those
conflicts are unnecessarily laborious to resolve.

By splitting, and alphabetically sorting, OBJS style lines into one
object per line, conflicts should be less frequent, and easier to
resolve when they still occur.

Author: Andres Freund
Discussion: https://postgr.es/m/20191029200901.vww4idgcxv74cwes@alap3.anarazel.de
2019-11-05 14:41:07 -08:00
Alexander Korotkov 90c0987258 Fix bug in pairingheap_SpGistSearchItem_cmp()
Our item contains only so->numberOfNonNullOrderBys of distances.  Reflect that
in the loop upper bound.

Discussion: https://postgr.es/m/53536807-784c-e029-6e92-6da802ab8d60%40postgrespro.ru
Author: Nikita Glukhov
Backpatch-through: 12
2019-09-25 01:47:36 +03:00
Alexander Korotkov 6cae9d2c10 Improve handling of NULLs in KNN-GiST and KNN-SP-GiST
This commit improves subject in two ways:

 * It removes ugliness of 02f90879e7, which stores distance values and null
   flags in two separate arrays after GISTSearchItem struct.  Instead we pack
   both distance value and null flag in IndexOrderByDistance struct.  Alignment
   overhead should be negligible, because we typically deal with at most few
   "col op const" expressions in ORDER BY clause.
 * It fixes handling of "col op NULL" expression in KNN-SP-GiST.  Now, these
   expression are not passed to support functions, which can't deal with them.
   Instead, NULL result is implicitly assumed.  It future we may decide to
   teach support functions to deal with NULL arguments, but current solution is
   bugfix suitable for backpatch.

Reported-by: Nikita Glukhov
Discussion: https://postgr.es/m/826f57ee-afc7-8977-c44c-6111d18b02ec%40postgrespro.ru
Author: Nikita Glukhov
Reviewed-by: Alexander Korotkov
Backpatch-through: 9.4
2019-09-19 21:48:39 +03:00
Alexander Korotkov 02f90879e7 Fix handling of NULL distances in KNN-GiST
In order to implement NULL LAST semantic GiST previously assumed distance to
the NULL value to be Inf.  However, our distance functions can return Inf and
NaN for non-null values.  In such cases, NULL LAST semantic appears to be
broken.  This commit fixes that by introducing separate array of null flags for
distances.

Backpatch to all supported versions.

Discussion: https://postgr.es/m/CAPpHfdsNvNdA0DBS%2BwMpFrgwT6C3-q50sFVGLSiuWnV3FqOJuQ%40mail.gmail.com
Author: Alexander Korotkov
Backpatch-through: 9.4
2019-09-08 22:08:12 +03:00
Michael Paquier eb43f3d193 Fix inconsistencies and typos in the tree
This is numbered take 8, and addresses again a set of issues with code
comments, variable names and unreferenced variables.

Author: Alexander Lakhin
Discussion: https://postgr.es/m/b137b5eb-9c95-9c2f-586e-38aba7d59788@gmail.com
2019-07-29 12:28:30 +09:00
Michael Paquier 23bccc823d Fix inconsistencies and typos in the tree
This is numbered take 7, and addresses a set of issues with code
comments, variable names and unreferenced variables.

Author: Alexander Lakhin
Discussion: https://postgr.es/m/dff75442-2468-f74f-568c-6006e141062f@gmail.com
2019-07-22 10:01:50 +09:00
Michael Paquier 0896ae561b Fix inconsistencies and typos in the tree
This is numbered take 7, and addresses a set of issues around:
- Fixes for typos and incorrect reference names.
- Removal of unneeded comments.
- Removal of unreferenced functions and structures.
- Fixes regarding variable name consistency.

Author: Alexander Lakhin
Discussion: https://postgr.es/m/10bfd4ac-3e7c-40ab-2b2e-355ed15495e8@gmail.com
2019-07-16 13:23:53 +09:00
Tom Lane be76af171c Initial pgindent run for v12.
This is still using the 2.0 version of pg_bsd_indent.
I thought it would be good to commit this separately,
so as to document the differences between 2.0 and 2.1 behavior.

Discussion: https://postgr.es/m/16296.1558103386@sss.pgh.pa.us
2019-05-22 12:55:34 -04:00
Peter Geoghegan ae7291acbc Standardize ItemIdData terminology.
The term "item pointer" should not be used to refer to ItemIdData
variables, since that is needlessly ambiguous.  Only
ItemPointerData/ItemPointer variables should be called item pointers.

To fix, establish the convention that ItemIdData variables should always
be referred to either as "item identifiers" or "line pointers".  The
term "item identifier" already predominates in docs and translatable
messages, and so should be the preferred alternative there.

Discussion: https://postgr.es/m/CAH2-Wz=c=MZQjUzde3o9+2PLAPuHTpVZPPdYxN=E4ndQ2--8ew@mail.gmail.com
2019-05-13 15:53:39 -07:00
Heikki Linnakangas 9155580fd5 Generate less WAL during GiST, GIN and SP-GiST index build.
Instead of WAL-logging every modification during the build separately,
first build the index without any WAL-logging, and make a separate pass
through the index at the end, to write all pages to the WAL. This
significantly reduces the amount of WAL generated, and is usually also
faster, despite the extra I/O needed for the extra scan through the index.
WAL generated this way is also faster to replay.

For GiST, the LSN-NSN interlock makes this a little tricky. All pages must
be marked with a valid (i.e. non-zero) LSN, so that the parent-child
LSN-NSN interlock works correctly. We now use magic value 1 for that during
index build. Change the fake LSN counter to begin from 1000, so that 1 is
safely smaller than any real or fake LSN. 2 would've been enough for our
purposes, but let's reserve a bigger range, in case we need more special
values in the future.

Author: Anastasia Lubennikova, Andrey V. Lepikhov
Reviewed-by: Heikki Linnakangas, Dmitry Dolgov
2019-04-03 17:03:15 +03:00
Alvaro Herrera ab0dfc961b Report progress of CREATE INDEX operations
This uses the progress reporting infrastructure added by c16dc1aca5,
adding support for CREATE INDEX and CREATE INDEX CONCURRENTLY.

There are two pieces to this: one is index-AM-agnostic, and the other is
AM-specific.  The latter is fairly elaborate for btrees, including
reportage for parallel index builds and the separate phases that btree
index creation uses; other index AMs, which are much simpler in their
building procedures, have simplistic reporting only, but that seems
sufficient, at least for non-concurrent builds.

The index-AM-agnostic part is fairly complete, providing insight into
the CONCURRENTLY wait phases as well as block-based progress during the
index validation table scan.  (The index validation index scan requires
patching each AM, which has not been included here.)

Reviewers: Rahila Syed, Pavan Deolasee, Tatsuro Yamada
Discussion: https://postgr.es/m/20181220220022.mg63bhk26zdpvmcj@alvherre.pgsql
2019-04-02 15:18:08 -03:00
Andres Freund 2a96909a4a tableam: Support for an index build's initial table scan(s).
To support building indexes over tables of different AMs, the scans to
do so need to be routed through the table AM.  While moving a fair
amount of code, nearly all the changes are just moving code to below a
callback.

Currently the range based interface wouldn't make much sense for non
block based table AMs. But that seems aceptable for now.

Author: Andres Freund
Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
2019-03-27 19:59:06 -07:00
Peter Eisentraut 5e1963fb76 Collations with nondeterministic comparison
This adds a flag "deterministic" to collations.  If that is false,
such a collation disables various optimizations that assume that
strings are equal only if they are byte-wise equal.  That then allows
use cases such as case-insensitive or accent-insensitive comparisons
or handling of strings with different Unicode normal forms.

This functionality is only supported with the ICU provider.  At least
glibc doesn't appear to have any locales that work in a
nondeterministic way, so it's not worth supporting this for the libc
provider.

The term "deterministic comparison" in this context is from Unicode
Technical Standard #10
(https://unicode.org/reports/tr10/#Deterministic_Comparison).

This patch makes changes in three areas:

- CREATE COLLATION DDL changes and system catalog changes to support
  this new flag.

- Many executor nodes and auxiliary code are extended to track
  collations.  Previously, this code would just throw away collation
  information, because the eventually-called user-defined functions
  didn't use it since they only cared about equality, which didn't
  need collation information.

- String data type functions that do equality comparisons and hashing
  are changed to take the (non-)deterministic flag into account.  For
  comparison, this just means skipping various shortcuts and tie
  breakers that use byte-wise comparison.  For hashing, we first need
  to convert the input string to a canonical "sort key" using the ICU
  analogue of strxfrm().

Reviewed-by: Daniel Verite <daniel@manitou-mail.org>
Reviewed-by: Peter Geoghegan <pg@bowt.ie>
Discussion: https://www.postgresql.org/message-id/flat/1ccc668f-4cbc-0bef-af67-450b47cdfee7@2ndquadrant.com
2019-03-22 12:12:43 +01:00
Andres Freund c2fe139c20 tableam: Add and use scan APIs.
Too allow table accesses to be not directly dependent on heap, several
new abstractions are needed. Specifically:

1) Heap scans need to be generalized into table scans. Do this by
   introducing TableScanDesc, which will be the "base class" for
   individual AMs. This contains the AM independent fields from
   HeapScanDesc.

   The previous heap_{beginscan,rescan,endscan} et al. have been
   replaced with a table_ version.

   There's no direct replacement for heap_getnext(), as that returned
   a HeapTuple, which is undesirable for a other AMs. Instead there's
   table_scan_getnextslot().  But note that heap_getnext() lives on,
   it's still used widely to access catalog tables.

   This is achieved by new scan_begin, scan_end, scan_rescan,
   scan_getnextslot callbacks.

2) The portion of parallel scans that's shared between backends need
   to be able to do so without the user doing per-AM work. To achieve
   that new parallelscan_{estimate, initialize, reinitialize}
   callbacks are introduced, which operate on a new
   ParallelTableScanDesc, which again can be subclassed by AMs.

   As it is likely that several AMs are going to be block oriented,
   block oriented callbacks that can be shared between such AMs are
   provided and used by heap. table_block_parallelscan_{estimate,
   intiialize, reinitialize} as callbacks, and
   table_block_parallelscan_{nextpage, init} for use in AMs. These
   operate on a ParallelBlockTableScanDesc.

3) Index scans need to be able to access tables to return a tuple, and
   there needs to be state across individual accesses to the heap to
   store state like buffers. That's now handled by introducing a
   sort-of-scan IndexFetchTable, which again is intended to be
   subclassed by individual AMs (for heap IndexFetchHeap).

   The relevant callbacks for an AM are index_fetch_{end, begin,
   reset} to create the necessary state, and index_fetch_tuple to
   retrieve an indexed tuple.  Note that index_fetch_tuple
   implementations need to be smarter than just blindly fetching the
   tuples for AMs that have optimizations similar to heap's HOT - the
   currently alive tuple in the update chain needs to be fetched if
   appropriate.

   Similar to table_scan_getnextslot(), it's undesirable to continue
   to return HeapTuples. Thus index_fetch_heap (might want to rename
   that later) now accepts a slot as an argument. Core code doesn't
   have a lot of call sites performing index scans without going
   through the systable_* API (in contrast to loads of heap_getnext
   calls and working directly with HeapTuples).

   Index scans now store the result of a search in
   IndexScanDesc->xs_heaptid, rather than xs_ctup->t_self. As the
   target is not generally a HeapTuple anymore that seems cleaner.

To be able to sensible adapt code to use the above, two further
callbacks have been introduced:

a) slot_callbacks returns a TupleTableSlotOps* suitable for creating
   slots capable of holding a tuple of the AMs
   type. table_slot_callbacks() and table_slot_create() are based
   upon that, but have additional logic to deal with views, foreign
   tables, etc.

   While this change could have been done separately, nearly all the
   call sites that needed to be adapted for the rest of this commit
   also would have been needed to be adapted for
   table_slot_callbacks(), making separation not worthwhile.

b) tuple_satisfies_snapshot checks whether the tuple in a slot is
   currently visible according to a snapshot. That's required as a few
   places now don't have a buffer + HeapTuple around, but a
   slot (which in heap's case internally has that information).

Additionally a few infrastructure changes were needed:

I) SysScanDesc, as used by systable_{beginscan, getnext} et al. now
   internally uses a slot to keep track of tuples. While
   systable_getnext() still returns HeapTuples, and will so for the
   foreseeable future, the index API (see 1) above) now only deals with
   slots.

The remainder, and largest part, of this commit is then adjusting all
scans in postgres to use the new APIs.

Author: Andres Freund, Haribabu Kommi, Alvaro Herrera
Discussion:
    https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
    https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql
2019-03-11 12:46:41 -07:00
Tom Lane f09346a9c6 Refactor planner's header files.
Create a new header optimizer/optimizer.h, which exposes just the
planner functions that can be used "at arm's length", without need
to access Paths or the other planner-internal data structures defined
in nodes/relation.h.  This is intended to provide the whole planner
API seen by most of the rest of the system; although FDWs still need
to use additional stuff, and more thought is also needed about just
what selfuncs.c should rely on.

The main point of doing this now is to limit the amount of new
#include baggage that will be needed by "planner support functions",
which I expect to introduce later, and which will be in relevant
datatype modules rather than anywhere near the planner.

This commit just moves relevant declarations into optimizer.h from
other header files (a couple of which go away because everything
got moved), and adjusts #include lists to match.  There's further
cleanup that could be done if we want to decide that some stuff
being exposed by optimizer.h doesn't belong in the planner at all,
but I'll leave that for another day.

Discussion: https://postgr.es/m/11460.1548706639@sss.pgh.pa.us
2019-01-29 15:48:51 -05:00
Andres Freund c91560defc Move remaining code from tqual.[ch] to heapam.h / heapam_visibility.c.
Given these routines are heap specific, and that there will be more
generic visibility support in via table AM, it makes sense to move the
prototypes to heapam.h (routines like HeapTupleSatisfiesVacuum will
not be exposed in a generic fashion, because they are too storage
specific).

Similarly, the code in tqual.c is specific to heap, so moving it into
access/heap/ makes sense.

Author: Andres Freund
Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
2019-01-21 17:07:10 -08:00
Bruce Momjian 97c39498e5 Update copyright for 2019
Backpatch-through: certain files through 9.4
2019-01-02 12:44:25 -05:00
Tom Lane 696b0c5fd0 Fix memory leak in repeated SPGIST index scans.
spgendscan neglected to pfree all the memory allocated by spgbeginscan.
It's possible to get away with that in most normal queries, since the
memory is allocated in the executor's per-query context which is about
to get deleted anyway; but it causes severe memory leakage during
creation or filling of large exclusion-constraint indexes.

Also, document that amendscan is supposed to free what ambeginscan
allocates.  The docs' lack of clarity on that point probably caused this
bug to begin with.  (There is discussion of changing that API spec going
forward, but I don't think it'd be appropriate for the back branches.)

Per report from Bruno Wolff.  It's been like this since the beginning,
so back-patch to all active branches.

In HEAD, also fix an independent leak caused by commit 2a6368343
(allocating memory during spgrescan instead of spgbeginscan, which
might be all right if it got cleaned up, but it didn't).  And do a bit
of code beautification on that commit, too.

Discussion: https://postgr.es/m/20181024012314.GA27428@wolff.to
2018-10-31 17:05:03 -04:00
Alexander Korotkov 4ec90f53f1 Minor formatting cleanup for 2a6368343f 2018-09-27 23:29:50 +03:00
Alexander Korotkov 0f64595894 Remove extra usage of BoxPGetDatum() macro
Author: Mark Dilger
Discussion: https://postgr.es/m/B2AEFCD0-836D-4654-9D59-3DF616E0A6F3%40gmail.com
2018-09-27 23:25:22 +03:00
Alexander Korotkov 2a6368343f Add support for nearest-neighbor (KNN) searches to SP-GiST
Currently, KNN searches were supported only by GiST.  SP-GiST also capable to
support them.  This commit implements that support.  SP-GiST scan stack is
replaced with queue, which serves as stack if no ordering is specified.  KNN
support is provided for three SP-GIST opclasses: quad_point_ops, kd_point_ops
and poly_ops (catversion is bumped).  Some common parts between GiST and SP-GiST
KNNs are extracted into separate functions.

Discussion: https://postgr.es/m/570825e8-47d0-4732-2bf6-88d67d2d51c8%40postgrespro.ru
Author: Nikita Glukhov, Alexander Korotkov based on GSoC work by Vlad Sterzhanov
Review: Andrey Borodin, Alexander Korotkov
2018-09-19 01:54:10 +03:00
Andrew Gierth 500d49794f Repair double-free in SP-GIST rescan (bug #15378)
spgrescan would first reset traversalCxt, and then traverse a
potentially non-empty stack containing pointers to traversalValues
which had been allocated in those contexts, freeing them a second
time. This bug originates in commit ccd6eb49a where traversalValue was
introduced.

Repair by traversing the stack before the context reset; this isn't
ideal, since it means doing retail pfree in a context that's about to
be reset, but the freeing of a stack entry is also done in other
places in the code during the scan so it's not worth trying to
refactor it further. Regression test added.

Backpatch to 9.6 where the problem was introduced.

Per bug #15378; analysis and patch by me, originally from a report on
IRC by user velix; see also PostGIS ticket #4174; review by Alexander
Korotkov.

Discussion: https://postgr.es/m/153663176628.23136.11901365223750051490@wrigleys.postgresql.org
2018-09-11 18:14:19 +01:00
Peter Eisentraut b19495772e doc: Update uses of the word "procedure"
Historically, the term procedure was used as a synonym for function in
Postgres/PostgreSQL.  Now we have procedures as separate objects from
functions, so we need to clean up the documentation to not mix those
terms.

In particular, mentions of "trigger procedures" are changed to "trigger
functions", and access method "support procedures" are changed to
"support functions".  (The latter already used FUNCTION in the SQL
syntax anyway.)  Also, the terminology in the SPI chapter has been
cleaned up.

A few tests, examples, and code comments are also adjusted to be
consistent with documentation changes, but not everything.

Reported-by: Peter Geoghegan <pg@bowt.ie>
Reviewed-by: Jonathan S. Katz <jonathan.katz@excoventures.com>
2018-08-22 14:44:49 +02:00
Alexander Korotkov 2b13702d5c Fix typo in SP-GiST error message
Error message didn't match the actual check.  Fix that.  Compression of leaf
SP-GiST values was introduced in 11.  So, backpatch.

Discussion: https://postgr.es/m/20180810.100742.15469435.horiguchi.kyotaro%40lab.ntt.co.jp
Author: Kyotaro Horiguchi
Backpatch-through: 11
2018-08-10 17:28:48 +03:00
Tom Lane bdf46af748 Post-feature-freeze pgindent run.
Discussion: https://postgr.es/m/15719.1523984266@sss.pgh.pa.us
2018-04-26 14:47:16 -04:00
Tom Lane b15e8f71db Fix broken collation-aware searches in SP-GiST text opclass.
spg_text_leaf_consistent() supposed that it should compare only
Min(querylen, entrylen) bytes of the two strings, and then deal with
any excess bytes in one string or the other by assuming the longer
string is greater if the prefixes are equal.  Quite aside from the
fact that that's just wrong in some locales (e.g., 'ch' is not less
than 'd' in cs_CZ), it also risked passing incomplete multibyte
characters to strcoll(), with ensuing bad results.

Instead, just pass the full strings to varstr_cmp, and let it decide
what to do about unequal-length strings.

Fortunately, this error doesn't imply any index corruption, it's just
that searches might return the wrong set of entries.

Per report from Emre Hasegeli, though this is not his patch.
Thanks to Peter Geoghegan for review and discussion.

This code was born broken, so back-patch to all supported branches.
In HEAD, I failed to resist the temptation to do a bit of cosmetic
cleanup/pgindent'ing on 710d90da1, too.

Discussion: https://postgr.es/m/CAE2gYzzb6K51VnTq5i5p52z+j9p2duEa-K1T3RrC_GQEynAKEg@mail.gmail.com
2018-04-16 16:06:58 -04:00
Teodor Sigaev 8224de4f42 Indexes with INCLUDE columns and their support in B-tree
This patch introduces INCLUDE clause to index definition.  This clause
specifies a list of columns which will be included as a non-key part in
the index.  The INCLUDE columns exist solely to allow more queries to
benefit from index-only scans.  Also, such columns don't need to have
appropriate operator classes.  Expressions are not supported as INCLUDE
columns since they cannot be used in index-only scans.

Index access methods supporting INCLUDE are indicated by amcaninclude flag
in IndexAmRoutine.  For now, only B-tree indexes support INCLUDE clause.

In B-tree indexes INCLUDE columns are truncated from pivot index tuples
(tuples located in non-leaf pages and high keys).  Therefore, B-tree indexes
now might have variable number of attributes.  This patch also provides
generic facility to support that: pivot tuples contain number of their
attributes in t_tid.ip_posid.  Free 13th bit of t_info is used for indicating
that.  This facility will simplify further support of index suffix truncation.
The changes of above are backward-compatible, pg_upgrade doesn't need special
handling of B-tree indexes for that.

Bump catalog version

Author: Anastasia Lubennikova with contribition by Alexander Korotkov and me
Reviewed by: Peter Geoghegan, Tomas Vondra, Antonin Houska, Jeff Janes,
			 David Rowley, Alexander Korotkov
Discussion: https://www.postgresql.org/message-id/flat/56168952.4010101@postgrespro.ru
2018-04-07 23:00:39 +03:00
Teodor Sigaev 710d90da1f Add prefix operator for TEXT type.
The prefix operator along with SP-GiST indexes can be used as an alternative
for LIKE 'word%' commands  and it doesn't have a limitation of string/prefix
length as B-Tree has.

Bump catalog version

Author: Ildus Kurbangaliev with some editorization by me
Review by: Arthur Zakirov, Alexander Korotkov, and me
Discussion: https://www.postgresql.org/message-id/flat/20180202180327.222b04b3@wp.localdomain
2018-04-03 19:46:45 +03:00
Tom Lane c79f6df75d Do index FSM vacuuming sooner.
In btree and SP-GiST indexes, move the responsibility for calling
IndexFreeSpaceMapVacuum from the vacuumcleanup phase to the bulkdelete
phase, and do it if and only if we found some pages that could be put into
FSM.  As in commit 851a26e26, the idea is to make free pages visible to FSM
searchers sooner when vacuuming very large tables (large enough to need
multiple bulkdelete scans).  This adds more redundant work than that commit
did, since we have to scan the entire index FSM each time rather than being
able to localize what needs to be updated; but it still seems worthwhile.
However, we can buy something back by not touching the FSM at all when
there are no pages that can be put in it.  That will result in slower
recovery from corrupt upper FSM pages in such a scenario, but it doesn't
seem like that's a case we need to optimize for.

Hash indexes don't use FSM at all.  GIN, GiST, and bloom indexes update
FSM during the vacuumcleanup phase not bulkdelete, so that doing something
comparable to this would be a much more invasive change, and it's not clear
it's worth it.  BRIN indexes do things sufficiently differently that this
change doesn't apply to them, either.

Claudio Freire, reviewed by Masahiko Sawada and Jing Wang, some additional
tweaks by me

Discussion: https://postgr.es/m/CAGTBQpYR0uJCNTt3M5GOzBRHo+-GccNO1nCaQ8yEJmZKSW5q1A@mail.gmail.com
2018-03-30 11:48:20 -04:00
Tom Lane 649f179250 Fix tuple counting in SP-GiST index build.
Count the number of tuples in the index honestly, instead of assuming
that it's the same as the number of tuples in the heap.  (It might be
different if the index is partial.)

Back-patch to all supported versions.

Tomas Vondra

Discussion: https://postgr.es/m/3b3d8eac-c709-0d25-088e-b98339a1b28a@2ndquadrant.com
2018-03-22 13:24:05 -04:00
Tom Lane 467963c3e9 Prevent query-lifespan memory leakage of SP-GiST traversal values.
The original coding of the SP-GiST scan traversalValue feature (commit
ccd6eb49a) arranged for traversal values to be stored in the query's main
executor context.  That's fine if there's only one index scan per query,
but if there are many, we have a memory leak as successive scans create
new traversal values.  Fix it by creating a separate memory context for
traversal values, which we can reset during spgrescan().  Back-patch
to 9.6 where this code was introduced.

In principle, adding the traversalCxt field to SpGistScanOpaqueData
creates an ABI break in the back branches.  But I (tgl) have little
sympathy for extensions including spgist_private.h, so I'm not very
worried about that.  Alternatively we could stick the new field at the
end of the struct in back branches, but that has its own downsides.

Anton Dignös, reviewed by Alexander Kuzmenkov

Discussion: https://postgr.es/m/CALNdv1jb6y2Te-m8xHLxLX12RsBmZJ1f4hESX7J0HjgyOhA9eA@mail.gmail.com
2018-03-19 23:59:30 -04:00
Robert Haas 9da0cc3528 Support parallel btree index builds.
To make this work, tuplesort.c and logtape.c must also support
parallelism, so this patch adds that infrastructure and then applies
it to the particular case of parallel btree index builds.  Testing
to date shows that this can often be 2-3x faster than a serial
index build.

The model for deciding how many workers to use is fairly primitive
at present, but it's better than not having the feature.  We can
refine it as we get more experience.

Peter Geoghegan with some help from Rushabh Lathia.  While Heikki
Linnakangas is not an author of this patch, he wrote other patches
without which this feature would not have been possible, and
therefore the release notes should possibly credit him as an author
of this feature.  Reviewed by Claudio Freire, Heikki Linnakangas,
Thomas Munro, Tels, Amit Kapila, me.

Discussion: http://postgr.es/m/CAM3SWZQKM=Pzc=CAHzRixKjp2eO5Q0Jg1SoFQqeXFQ647JiwqQ@mail.gmail.com
Discussion: http://postgr.es/m/CAH2-Wz=AxWqDoVvGU7dq856S4r6sJAj6DBn7VMtigkB33N5eyg@mail.gmail.com
2018-02-02 13:32:44 -05:00
Bruce Momjian 9d4649ca49 Update copyright for 2018
Backpatch-through: certain files through 9.3
2018-01-02 23:30:12 -05:00
Teodor Sigaev 854823fa33 Add optional compression method to SP-GiST
Patch allows to have different types of column and value stored in leaf tuples
of SP-GiST. The main application of feature is to transform complex column type
to simple indexed type or for truncating too long value, transformation could
be lossy.  Simple example: polygons are converted to their bounding boxes,
this opclass follows.

Authors: me, Heikki Linnakangas, Alexander Korotkov, Nikita Glukhov
Reviewed-By: all authors + Darafei Praliaskouski
Discussions:
https://www.postgresql.org/message-id/5447B3FF.2080406@sigaev.ru
https://www.postgresql.org/message-id/flat/54907069.1030506@sigaev.ru#54907069.1030506@sigaev.ru
2017-12-22 13:33:16 +03:00
Peter Eisentraut 2eb4a831e5 Change TRUE/FALSE to true/false
The lower case spellings are C and C++ standard and are used in most
parts of the PostgreSQL sources.  The upper case spellings are only used
in some files/modules.  So standardize on the standard spellings.

The APIs for ICU, Perl, and Windows define their own TRUE and FALSE, so
those are left as is when using those APIs.

In code comments, we use the lower-case spelling for the C concepts and
keep the upper-case spelling for the SQL concepts.

Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
2017-11-08 11:37:28 -05:00
Tom Lane 81e334ce4e Set the metapage's pd_lower correctly in brin, gin, and spgist indexes.
Previously, these index types left the pd_lower field set to the default
SizeOfPageHeaderData, which is really a lie because it ought to point past
whatever space is being used for metadata.  The coding accidentally failed
to fail because we never told xlog.c that the metapage is of standard
format --- but that's not very good, because it impedes WAL consistency
checking, and in some cases prevents compression of full-page images.

To fix, ensure that we set pd_lower correctly, not only when creating a
metapage but whenever we write it out (these apparently redundant steps are
needed to cope with pg_upgrade'd indexes that don't yet contain the right
value).  This allows telling xlog.c that the page is of standard format.

The WAL consistency check mask functions are made to mask only if pd_lower
appears valid, which I think is likely unnecessary complication, since
any metapage appearing in a v11 WAL stream should contain valid pd_lower.
But it doesn't cost much to be paranoid.

Amit Langote, reviewed by Michael Paquier and Amit Kapila

Discussion: https://postgr.es/m/0d273805-0e9e-ec1a-cb84-d4da400b8f85@lab.ntt.co.jp
2017-11-02 17:22:08 -04:00