Commit Graph

98 Commits

Author SHA1 Message Date
Bruce Momjian 27b77ecf9f Update copyright for 2022
Backpatch-through: 10
2022-01-07 19:04:57 -05:00
Bruce Momjian 845ac7f847 C comments: improve description of GiST NSN and GistBuildLSN
GiST indexes are complex, so adding more details in the code might help
someone.

Discussion: https://postgr.es/m/20210302164021.GA364@momjian.us
2021-03-10 17:03:10 -05:00
Bruce Momjian 8facf1ea00 README/C-comment: document GiST's NSN value 2021-02-13 13:50:49 -05:00
Bruce Momjian ca3b37487b Update copyright for 2021
Backpatch-through: 9.5
2021-01-02 13:06:25 -05:00
Michael Paquier 788dd0b839 Fix some typos
Author: Daniel Gustafsson
Discussion: https://postgr.es/m/C36ADFDF-D09A-4EE5-B186-CB46C3653F4C@yesql.se
2020-11-14 11:43:10 +09:00
Heikki Linnakangas 16fa9b2b30 Add support for building GiST index by sorting.
This adds a new optional support function to the GiST access method:
sortsupport. If it is defined, the GiST index is built by sorting all data
to the order defined by the sortsupport's comparator function, and packing
the tuples in that order to GiST pages. This is similar to how B-tree
index build works, and is much faster than inserting the tuples one by
one. The resulting index is smaller too, because the pages are packed more
tightly, upto 'fillfactor'. The normal build method works by splitting
pages, which tends to lead to more wasted space.

The quality of the resulting index depends on how good the opclass-defined
sort order is. A good order preserves locality of the input data.

As the first user of this facility, add 'sortsupport' function to the
point_ops opclass. It sorts the points in Z-order (aka Morton Code), by
interleaving the bits of the X and Y coordinates.

Author: Andrey Borodin
Reviewed-by: Pavel Borisov, Thomas Munro
Discussion: https://www.postgresql.org/message-id/1A36620E-CAD8-4267-9067-FB31385E7C0D%40yandex-team.ru
2020-09-17 11:33:40 +03:00
Alexander Korotkov 911e702077 Implement operator class parameters
PostgreSQL provides set of template index access methods, where opclasses have
much freedom in the semantics of indexing.  These index AMs are GiST, GIN,
SP-GiST and BRIN.  There opclasses define representation of keys, operations on
them and supported search strategies.  So, it's natural that opclasses may be
faced some tradeoffs, which require user-side decision.  This commit implements
opclass parameters allowing users to set some values, which tell opclass how to
index the particular dataset.

This commit doesn't introduce new storage in system catalog.  Instead it uses
pg_attribute.attoptions, which is used for table column storage options but
unused for index attributes.

In order to evade changing signature of each opclass support function, we
implement unified way to pass options to opclass support functions.  Options
are set to fn_expr as the constant bytea expression.  It's possible due to the
fact that opclass support functions are executed outside of expressions, so
fn_expr is unused for them.

This commit comes with some examples of opclass options usage.  We parametrize
signature length in GiST.  That applies to multiple opclasses: tsvector_ops,
gist__intbig_ops, gist_ltree_ops, gist__ltree_ops, gist_trgm_ops and
gist_hstore_ops.  Also we parametrize maximum number of integer ranges for
gist__int_ops.  However, the main future usage of this feature is expected
to be json, where users would be able to specify which way to index particular
json parts.

Catversion is bumped.

Discussion: https://postgr.es/m/d22c3a18-31c7-1879-fc11-4c1ce2f5e5af%40postgrespro.ru
Author: Nikita Glukhov, revised by me
Reviwed-by: Nikolay Shaplov, Robert Haas, Tom Lane, Tomas Vondra, Alvaro Herrera
2020-03-30 19:17:23 +03:00
Bruce Momjian 7559d8ebfa Update copyrights for 2020
Backpatch-through: update all files in master, backpatch legal files through 9.4
2020-01-01 12:21:45 -05:00
Heikki Linnakangas 6655a7299d Use full 64-bit XID for checking if a deleted GiST page is old enough.
Otherwise, after a deleted page gets even older, it becomes unrecyclable
again. B-tree has the same problem, and has had since time immemorial,
but let's at least fix this in GiST, where this is new.

Backpatch to v12, where GiST page deletion was introduced.

Reviewed-by: Andrey Borodin
Discussion: https://www.postgresql.org/message-id/835A15A5-F1B4-4446-A711-BF48357EB602%40yandex-team.ru
2019-07-24 20:24:07 +03:00
Heikki Linnakangas 9155580fd5 Generate less WAL during GiST, GIN and SP-GiST index build.
Instead of WAL-logging every modification during the build separately,
first build the index without any WAL-logging, and make a separate pass
through the index at the end, to write all pages to the WAL. This
significantly reduces the amount of WAL generated, and is usually also
faster, despite the extra I/O needed for the extra scan through the index.
WAL generated this way is also faster to replay.

For GiST, the LSN-NSN interlock makes this a little tricky. All pages must
be marked with a valid (i.e. non-zero) LSN, so that the parent-child
LSN-NSN interlock works correctly. We now use magic value 1 for that during
index build. Change the fake LSN counter to begin from 1000, so that 1 is
safely smaller than any real or fake LSN. 2 would've been enough for our
purposes, but let's reserve a bigger range, in case we need more special
values in the future.

Author: Anastasia Lubennikova, Andrey V. Lepikhov
Reviewed-by: Heikki Linnakangas, Dmitry Dolgov
2019-04-03 17:03:15 +03:00
Heikki Linnakangas 7df159a620 Delete empty pages during GiST VACUUM.
To do this, we scan GiST two times. In the first pass we make note of
empty leaf pages and internal pages. At second pass we scan through
internal pages, looking for downlinks to the empty pages.

Deleting internal pages is still not supported, like in nbtree, the last
child of an internal page is never deleted. That means that if you have a
workload where new keys are always inserted to different area than where
old keys are removed, the index will still grow without bound. But the rate
of growth will be an order of magnitude slower than before.

Author: Andrey Borodin
Discussion: https://www.postgresql.org/message-id/B1E4DF12-6CD3-4706-BDBD-BF3283328F60@yandex-team.ru
2019-03-22 13:21:45 +02:00
Bruce Momjian 97c39498e5 Update copyright for 2019
Backpatch-through: certain files through 9.4
2019-01-02 12:44:25 -05:00
Bruce Momjian 9d4649ca49 Update copyright for 2018
Backpatch-through: certain files through 9.3
2018-01-02 23:30:12 -05:00
Tom Lane c7b8998ebb Phase 2 of pgindent updates.
Change pg_bsd_indent to follow upstream rules for placement of comments
to the right of code, and remove pgindent hack that caused comments
following #endif to not obey the general rule.

Commit e3860ffa4d wasn't actually using
the published version of pg_bsd_indent, but a hacked-up version that
tried to minimize the amount of movement of comments to the right of
code.  The situation of interest is where such a comment has to be
moved to the right of its default placement at column 33 because there's
code there.  BSD indent has always moved right in units of tab stops
in such cases --- but in the previous incarnation, indent was working
in 8-space tab stops, while now it knows we use 4-space tabs.  So the
net result is that in about half the cases, such comments are placed
one tab stop left of before.  This is better all around: it leaves
more room on the line for comment text, and it means that in such
cases the comment uniformly starts at the next 4-space tab stop after
the code, rather than sometimes one and sometimes two tabs after.

Also, ensure that comments following #endif are indented the same
as comments following other preprocessor commands such as #else.
That inconsistency turns out to have been self-inflicted damage
from a poorly-thought-through post-indent "fixup" in pgindent.

This patch is much less interesting than the first round of indent
changes, but also bulkier, so I thought it best to separate the effects.

Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
2017-06-21 15:19:25 -04:00
Bruce Momjian 1d25779284 Update copyright via script for 2017 2017-01-03 13:48:53 -05:00
Bruce Momjian ee94300446 Update copyright for 2016
Backpatch certain files through 9.1
2016-01-02 13:33:40 -05:00
Teodor Sigaev 013ebc0a7b Microvacuum for GIST
Mark index tuple as dead if it's pointed by kill_prior_tuple during
ordinary (search) scan and remove it during insert process if there is no
enough space for new tuple to insert. This improves select performance
because index will not return tuple marked as dead and improves insert
performance because it reduces number of page split.

Anastasia Lubennikova <a.lubennikova@postgrespro.ru> with
 minor editorialization by me
2015-09-09 18:43:37 +03:00
Alvaro Herrera 26df7066cc Move strategy numbers to include/access/stratnum.h
For upcoming BRIN opclasses, it's convenient to have strategy numbers
defined in a single place.  Since there's nothing appropriate, create
it.  The StrategyNumber typedef now lives there, as well as existing
strategy numbers for B-trees (from skey.h) and R-tree-and-friends (from
gist.h).  skey.h is forced to include stratnum.h because of the
StrategyNumber typedef, but gist.h is not; extensions that currently
rely on gist.h for rtree strategy numbers might need to add a new

A few .c files can stop including skey.h and/or gist.h, which is a nice
side benefit.

Per discussion:
https://www.postgresql.org/message-id/20150514232132.GZ2523@alvh.no-ip.org

Authored by Emre Hasegeli and Álvaro.

(It's not clear to me why bootscanner.l has any #include lines at all.)
2015-05-15 17:03:16 -03:00
Heikki Linnakangas d04c8ed904 Add support for index-only scans in GiST.
This adds a new GiST opclass method, 'fetch', which is used to reconstruct
the original Datum from the value stored in the index. Also, the 'canreturn'
index AM interface function gains a new 'attno' argument. That makes it
possible to use index-only scans on a multi-column index where some of the
opclasses support index-only scans but some do not.

This patch adds support in the box and point opclasses. Other opclasses
can added later as follow-on patches (btree_gist would be particularly
interesting).

Anastasia Lubennikova, with additional fixes and modifications by me.
2015-03-26 19:12:00 +02:00
Bruce Momjian 4baaf863ec Update copyright for 2015
Backpatch certain files through 9.0
2015-01-06 11:43:47 -05:00
Heikki Linnakangas 2effb72e68 Remove obsolete cases from GiST update redo code.
The code that generated a record to clear the F_TUPLES_DELETED flag hasn't
existed since we got rid of old-style VACUUM FULL. I kept the code that sets
the flag, although it's not used for anything anymore, because it might
still be interesting information for debugging purposes that some tuples
have been deleted from a page.

Likewise, the code to turn the root page from non-leaf to leaf page was
removed when we got rid of old-style VACUUM FULL. Remove the code to replay
that action, too.
2014-11-07 15:13:02 +02:00
Bruce Momjian 0a78320057 pgindent run for 9.4
This includes removing tabs after periods in C comments, which was
applied to back branches, so this change should not effect backpatching.
2014-05-06 12:12:18 -04:00
Bruce Momjian 7e04792a1c Update copyright for 2014
Update all files in head, and files COPYRIGHT and legal.sgml in all back
branches.
2014-01-07 16:05:30 -05:00
Bruce Momjian 9af4159fce pgindent run for release 9.3
This is the first run of the Perl-based pgindent script.  Also update
pgindent instructions.
2013-05-29 16:58:43 -04:00
Tom Lane 0fd0f3688b Document and clean up gistsplit.c.
Improve comments, rename some variables and functions, slightly simplify
a couple of APIs, in an attempt to make this code readable by people other
than its original author.

Even though this is essentially just cosmetic, back-patch to all active
branches, because otherwise it's going to make back-patching future fixes
in this file very painful.
2013-02-10 11:58:15 -05:00
Heikki Linnakangas 9ee4d06f3f Make GiST indexes on-disk compatible with 9.2 again.
The patch that turned XLogRecPtr into a uint64 inadvertently changed the
on-disk format of GiST indexes, because the NSN field in the GiST page
opaque is an XLogRecPtr. That breaks pg_upgrade. Revert the format of that
field back to the two-field struct that XLogRecPtr was before. This is the
same we did to LSNs in the page header to avoid changing on-disk format.

Bump catversion, as this invalidates any existing GiST indexes built on
9.3devel.
2013-01-17 16:46:16 +02:00
Bruce Momjian bd61a623ac Update copyrights for 2013
Fully update git head, and update back branches in ./COPYRIGHT and
legal.sgml files.
2013-01-01 17:15:01 -05:00
Tom Lane ad10853b30 Assorted comment fixes, mostly just typos, but some obsolete statements.
YAMAMOTO Takashi
2012-01-29 19:23:56 -05:00
Bruce Momjian e126958c2e Update copyright notices for year 2012. 2012-01-01 18:01:58 -05:00
Peter Eisentraut dbbba5279f Start using flexible array members
Flexible array members are a C99 feature that avoids "cheating" in the
declaration of variable-length arrays at the end of structs.  With
Autoconf support, this should be transparent for older compilers.

We start with one use in gist.h because gcc 4.6 started to raise a
warning there.  Over time, it can be expanded to other places in the
source, but they will likely need some review of sizeof and offsetof
usage.  The current change in gist.h appears to be safe in this
regard.
2011-06-16 22:45:38 +03:00
Bruce Momjian bf50caf105 pgindent run before PG 9.1 beta 1. 2011-04-10 11:42:00 -04:00
Bruce Momjian 5d950e3b0c Stamp copyrights for year 2011. 2011-01-01 13:18:15 -05:00
Heikki Linnakangas 9de3aa65f0 Rewrite the GiST insertion logic so that we don't need the post-recovery
cleanup stage to finish incomplete inserts or splits anymore. There was two
reasons for the cleanup step:

1. When a new tuple was inserted to a leaf page, the downlink in the parent
needed to be updated to contain (ie. to be consistent with) the new key.
Updating the parent in turn might require recursively updating the parent of
the parent. We now handle that by updating the parent while traversing down
the tree, so that when we insert the leaf tuple, all the parents are already
consistent with the new key, and the tree is consistent at every step.

2. When a page is split, we need to insert the downlink for the new right
page(s), and update the downlink for the original page to not include keys
that moved to the right page(s). We now handle that by setting a new flag,
F_FOLLOW_RIGHT, on the non-rightmost pages in the split. When that flag is
set, scans always follow the rightlink, regardless of the NSN mechanism used
to detect concurrent page splits. That way the tree is consistent right after
split, even though the downlink is still missing. This is very similar to the
way B-tree splits are handled. When the downlink is inserted in the parent,
the flag is cleared. To keep the insertion algorithm simple, when an
insertion sees an incomplete split, indicated by the F_FOLLOW_RIGHT flag, it
finishes the split before doing anything else.

These changes allow removing the whole "invalid tuple" mechanism, but I
retained the scan code to still follow invalid tuples correctly. While we
don't create any such tuples anymore, we want to handle them gracefully in
case you pg_upgrade a GiST index that has them. If we encounter any on an
insert, though, we just throw an error saying that you need to REINDEX.

The issue that got me into doing this is that if you did a checkpoint while
an insert or split was in progress, and the checkpoint finishes quickly so
that there is no WAL record related to the insert between RedoRecPtr and the
checkpoint record, recovery from that checkpoint would not know to finish
the incomplete insert. IOW, we have the same issue we solved with the
rm_safe_restartpoint mechanism during normal operation too. It's highly
unlikely to happen in practice, and this fix is far too large to backpatch,
so we're just going to live with in previous versions, but this refactoring
fixes it going forward.

With this patch, you don't get the annoying
'index "FOO" needs VACUUM or REINDEX to finish crash recovery' notices
anymore if you crash at an unfortunate moment.
2010-12-23 16:21:47 +02:00
Tom Lane 554506871b KNNGIST, otherwise known as order-by-operator support for GIST.
This commit represents a rather heavily editorialized version of
Teodor's builtin_knngist_itself-0.8.2 and builtin_knngist_proc-0.8.1
patches.  I redid the opclass API to add a separate Distance method
instead of turning the Consistent method into an illogical mess,
fixed some bit-rot in the rbtree interfaces, and generally worked over
the code style and comments.

There's still no non-code documentation to speak of, but I'll work on
that separately.  Some contrib-module changes are also yet to come
(right now, point <-> point is the only KNN-ified operator).

Teodor Sigaev and Tom Lane
2010-12-03 20:53:29 -05:00
Magnus Hagander 9f2e211386 Remove cvs keywords from all files. 2010-09-20 22:08:53 +02:00
Bruce Momjian 0239800893 Update copyright for the year 2010. 2010-01-02 16:58:17 +00:00
Bruce Momjian 511db38ace Update copyright for 2009. 2009-01-01 17:24:05 +00:00
Alvaro Herrera a3540b0f65 Improve our #include situation by moving pointer types away from the
corresponding struct definitions.  This allows other headers to avoid including
certain highly-loaded headers such as rel.h and relscan.h, instead using just
relcache.h, heapam.h or genam.h, which are more lightweight and thus cause less
unnecessary dependencies.
2008-06-19 00:46:06 +00:00
Alvaro Herrera f8c4d7db60 Restructure some header files a bit, in particular heapam.h, by removing some
unnecessary #include lines in it.  Also, move some tuple routine prototypes and
macros to htup.h, which allows removal of heapam.h inclusion from some .c
files.

For this to work, a new header file access/sysattr.h needed to be created,
initially containing attribute numbers of system columns, for pg_dump usage.

While at it, make contrib ltree, intarray and hstore header files more
consistent with our header style.
2008-05-12 00:00:54 +00:00
Bruce Momjian 9098ab9e32 Update copyrights in source tree to 2008. 2008-01-01 19:46:01 +00:00
Tom Lane 56218fbc48 Minor tweaking of index special-space definitions so that the various
index types can be reliably distinguished by examining the special space
on an index page.  Per my earlier proposal, plus the realization that
there's no need for btree's vacuum cycle ID to cycle through every possible
16-bit value.  Restricting its range a little costs nearly nothing and
eliminates the possibility of collisions.
Memo to self: remember to make bitmap indexes play along with this scheme,
assuming that patch ever gets accepted.
2007-04-09 22:04:08 +00:00
Bruce Momjian 29dccf5fe0 Update CVS HEAD for 2007 copyright. Back branches are typically not
back-stamped for this.
2007-01-05 22:20:05 +00:00
Bruce Momjian f99a569a2e pgindent run for 8.2. 2006-10-04 00:30:14 +00:00
Tom Lane ba920e1c91 Rename contains/contained-by operators to @> and <@, per discussion that
agreed these symbols are less easily confused.  I made new pg_operator
entries (with new OIDs) for the old names, so as to provide backward
compatibility while making it pretty easy to remove the old names in
some future release cycle.  This commit only touches the core datatypes,
contrib will be fixed separately.
2006-09-10 00:29:35 +00:00
Teodor Sigaev 1f7ef548ec Changes
* new split algorithm (as proposed in http://archives.postgresql.org/pgsql-hackers/2006-06/msg00254.php)
  * possible call pickSplit() for second and below columns
  * add spl_(l|r)datum_exists to GIST_SPLITVEC -
    pickSplit should check its values to use already defined
    spl_(l|r)datum for splitting. pickSplit should set
    spl_(l|r)datum_exists to 'false' (if they was 'true') to
    signal to caller about using spl_(l|r)datum.
  * support for old pickSplit(): not very optimal
    but correct split
* remove 'bytes' field from GISTENTRY: in any case size of
  value is defined by it's type.
* split GIST_SPLITVEC to two structures: one for using in picksplit
  and second - for internal use.
* some code refactoring
* support of subsplit to rtree opclasses

TODO: add support of subsplit to contrib modules
2006-06-28 12:00:14 +00:00
Bruce Momjian 199f8f2858 Fix GEVHDRSZ for Win32.
Magnus Hagander
2006-06-25 01:02:12 +00:00
Bruce Momjian f2f5b05655 Update copyright for 2006. Update scripts. 2006-03-05 15:59:11 +00:00
Tom Lane 2a8d3d83ef R-tree is dead ... long live GiST. 2005-11-07 17:36:47 +00:00
Bruce Momjian 1dc3498251 Standard pgindent run for 8.1. 2005-10-15 02:49:52 +00:00
Teodor Sigaev 898a7bd13b Bug fixes for GiST crash recovery.
- add forgotten check of lsn for insert completion
- remove level of pages: hard to check in recovery
- some cleanups
2005-06-30 17:52:14 +00:00