Commit Graph

217 Commits

Author SHA1 Message Date
Tomas Vondra 7577dd8480 Properly detoast data in brin_form_tuple
brin_form_tuple failed to consider the values may be toasted, inserting
the toast pointer into the index. This may easily result in index
corruption, as the toast data may be deleted and cleaned up by vacuum.
The cleanup however does not care about indexes, leaving invalid toast
pointers behind, which triggers errors like this:

  ERROR:  missing chunk number 0 for toast value 16433 in pg_toast_16426

A less severe consequence are inconsistent failures due to the index row
being too large, depending on whether brin_form_tuple operated on plain
or toasted version of the row. For example

    CREATE TABLE t (val TEXT);
    INSERT INTO t VALUES ('... long value ...')
    CREATE INDEX idx ON t USING brin (val);

would likely succeed, as the row would likely include toast pointer.
Switching the order of INSERT and CREATE INDEX would likely fail:

    ERROR:  index row size 8712 exceeds maximum 8152 for index "idx"

because this happens before the row values are toasted.

The bug exists since PostgreSQL 9.5 where BRIN indexes were introduced.
So backpatch all the way back.

Author: Tomas Vondra
Reviewed-by: Alvaro Herrera
Backpatch-through: 9.5
Discussion: https://postgr.es/m/20201001184133.oq5uq75sb45pu3aw@development
Discussion: https://postgr.es/m/20201104010544.zexj52mlldagzowv%40development
2020-11-07 00:39:19 +01:00
Tom Lane 38a2d70329 Remove some more useless assignments.
Found with clang's scan-build tool.  It also whines about a lot of
other dead stores that we should *not* change IMO, either as a matter
of style or future-proofing.  But these places seem like clear
oversights.

Discussion: https://postgr.es/m/CAEudQAo1+AcGppxDSg8k+zF4+Kv+eJyqzEDdbpDg58-=MQcerQ@mail.gmail.com
2020-09-04 14:32:19 -04:00
Alvaro Herrera 1f42d35a1d
BRIN: Handle concurrent desummarization properly
If a page range is desummarized at just the right time concurrently with
an index walk, BRIN would raise an error indicating index corruption.
This is scary and unhelpful; silently returning that the page range is
not summarized is sufficient reaction.

This bug was introduced by commit 975ad4e602 as additional protection
against a bug whose actual fix was elsewhere.  Backpatch equally.

Reported-By: Anastasia Lubennikova <a.lubennikova@postgrespro.ru>
Diagnosed-By: Alexander Lakhin <exclusion@gmail.com>
Discussion: https://postgr.es/m/2588667e-d07d-7e10-74e2-7e1e46194491@postgrespro.ru
Backpatch: 9.5 - master
2020-08-12 15:33:36 -04:00
Tom Lane 9f9682783b Invent "amadjustmembers" AM method for validating opclass members.
This allows AM-specific knowledge to be applied during creation of
pg_amop and pg_amproc entries.  Specifically, the AM knows better than
core code which entries to consider as required or optional.  Giving
the latter entries the appropriate sort of dependency allows them to
be dropped without taking out the whole opclass or opfamily; which
is something we'd like to have to correct obsolescent entries in
extensions.

This callback also opens the door to performing AM-specific validity
checks during opclass creation, rather than hoping than an opclass
developer will remember to test with "amvalidate".  For the most part
I've not actually added any such checks yet; that can happen in a
follow-on patch.  (Note that we shouldn't remove any tests from
"amvalidate", as those are still needed to cross-check manually
constructed entries in the initdb data.  So adding tests to
"amadjustmembers" will be somewhat duplicative, but it seems like
a good idea anyway.)

Patch by me, reviewed by Alexander Korotkov, Hamid Akhtar, and
Anastasia Lubennikova.

Discussion: https://postgr.es/m/4578.1565195302@sss.pgh.pa.us
2020-08-01 17:12:47 -04:00
Alvaro Herrera 986529ce40
Remove WARNING message from brin_desummarize_range
This message was being emitted on the grounds that only crashed
summarization could cause it, but in reality even an aborted vacuum
could do it ... which makes it way too noisy, particularly since it
shows up in regression tests and makes them die.

Reported by Tom Lane.
Discussion: https://postgr.es/m/489091.1593534251@sss.pgh.pa.us
2020-07-09 20:13:25 -04:00
Alexander Korotkov 911e702077 Implement operator class parameters
PostgreSQL provides set of template index access methods, where opclasses have
much freedom in the semantics of indexing.  These index AMs are GiST, GIN,
SP-GiST and BRIN.  There opclasses define representation of keys, operations on
them and supported search strategies.  So, it's natural that opclasses may be
faced some tradeoffs, which require user-side decision.  This commit implements
opclass parameters allowing users to set some values, which tell opclass how to
index the particular dataset.

This commit doesn't introduce new storage in system catalog.  Instead it uses
pg_attribute.attoptions, which is used for table column storage options but
unused for index attributes.

In order to evade changing signature of each opclass support function, we
implement unified way to pass options to opclass support functions.  Options
are set to fn_expr as the constant bytea expression.  It's possible due to the
fact that opclass support functions are executed outside of expressions, so
fn_expr is unused for them.

This commit comes with some examples of opclass options usage.  We parametrize
signature length in GiST.  That applies to multiple opclasses: tsvector_ops,
gist__intbig_ops, gist_ltree_ops, gist__ltree_ops, gist_trgm_ops and
gist_hstore_ops.  Also we parametrize maximum number of integer ranges for
gist__int_ops.  However, the main future usage of this feature is expected
to be json, where users would be able to specify which way to index particular
json parts.

Catversion is bumped.

Discussion: https://postgr.es/m/d22c3a18-31c7-1879-fc11-4c1ce2f5e5af%40postgrespro.ru
Author: Nikita Glukhov, revised by me
Reviwed-by: Nikolay Shaplov, Robert Haas, Tom Lane, Tomas Vondra, Alvaro Herrera
2020-03-30 19:17:23 +03:00
Peter Eisentraut 3c173a53a8 Remove utils/acl.h from catalog/objectaddress.h
The need for this was removed by
8b9e9644dc.

A number of files now need to include utils/acl.h or
parser/parse_node.h explicitly where they previously got it indirectly
somehow.

Since parser/parse_node.h already includes nodes/parsenodes.h, the
latter is then removed where the former was added.  Also, remove
nodes/pg_list.h from objectaddress.h, since that's included via
nodes/parsenodes.h.

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Alvaro Herrera <alvherre@2ndquadrant.com>
Discussion: https://www.postgresql.org/message-id/flat/7601e258-26b2-8481-36d0-dc9dca6f28f1%402ndquadrant.com
2020-03-10 10:27:00 +01:00
Heikki Linnakangas 4c87010981 Fix crash in BRIN inclusion op functions, due to missing datum copy.
The BRIN add_value() and union() functions need to make a longer-lived
copy of the argument, if they want to store it in the BrinValues struct
also passed as argument. The functions for the "inclusion operator
classes" used with box, range and inet types didn't take into account
that the union helper function might return its argument as is, without
making a copy. Check for that case, and make a copy if necessary. That
case arises at least with the range_union() function, when one of the
arguments is an 'empty' range:

CREATE TABLE brintest (n numrange);
CREATE INDEX brinidx ON brintest USING brin (n);
INSERT INTO brintest VALUES ('empty');
INSERT INTO brintest VALUES (numrange(0, 2^1000::numeric));
INSERT INTO brintest VALUES ('(-1, 0)');

SELECT brin_desummarize_range('brinidx', 0);
SELECT brin_summarize_range('brinidx', 0);

Backpatch down to 9.5, where BRIN was introduced.

Discussion: https://www.postgresql.org/message-id/e6e1d6eb-0a67-36aa-e779-bcca59167c14%40iki.fi
Reviewed-by: Emre Hasegeli, Tom Lane, Alvaro Herrera
2020-01-20 10:36:35 +02:00
Amit Kapila 4d8a8d0c73 Introduce IndexAM fields for parallel vacuum.
Introduce new fields amusemaintenanceworkmem and amparallelvacuumoptions
in IndexAmRoutine for parallel vacuum.  The amusemaintenanceworkmem tells
whether a particular IndexAM uses maintenance_work_mem or not.  This will
help in controlling the memory used by individual workers as otherwise,
each worker can consume memory equal to maintenance_work_mem.  The
amparallelvacuumoptions tell whether a particular IndexAM participates in
a parallel vacuum and if so in which phase (bulkdelete, vacuumcleanup) of
vacuum.

Author: Masahiko Sawada and Amit Kapila
Reviewed-by: Dilip Kumar, Amit Kapila, Tomas Vondra and Robert Haas
Discussion:
https://postgr.es/m/CAD21AoDTPMgzSkV4E3SFo1CH_x50bf5PqZFQf4jmqjk-C03BWg@mail.gmail.com
https://postgr.es/m/CAA4eK1LmcD5aPogzwim5Nn58Ki+74a6Edghx4Wd8hAskvHaq5A@mail.gmail.com
2020-01-15 07:24:14 +05:30
Bruce Momjian 7559d8ebfa Update copyrights for 2020
Backpatch-through: update all files in master, backpatch legal files through 9.4
2020-01-01 12:21:45 -05:00
Michael Paquier 7854e07f25 Revert "Rename files and headers related to index AM"
This follows multiple complains from Peter Geoghegan, Andres Freund and
Alvaro Herrera that this issue ought to be dug more before actually
happening, if it happens.

Discussion: https://postgr.es/m/20191226144606.GA5659@alvherre.pgsql
2019-12-27 08:09:00 +09:00
Michael Paquier 8ce3aa9b59 Rename files and headers related to index AM
The following renaming is done so as source files related to index
access methods are more consistent with table access methods (the
original names used for index AMs ware too generic, and could be
confused as including features related to table AMs):
- amapi.h -> indexam.h.
- amapi.c -> indexamapi.c.  Here we have an equivalent with
backend/access/table/tableamapi.c.
- amvalidate.c -> indexamvalidate.c.
- amvalidate.h -> indexamvalidate.h.
- genam.c -> indexgenam.c.
- genam.h -> indexgenam.h.

This has been discussed during the development of v12 when table AM was
worked on, but the renaming never happened.

Author: Michael Paquier
Reviewed-by: Fabien Coelho, Julien Rouhaud
Discussion: https://postgr.es/m/20191223053434.GF34339@paquier.xyz
2019-12-25 10:23:39 +09:00
Amit Kapila 14aec03502 Make the order of the header file includes consistent in backend modules.
Similar to commits 7e735035f2 and dddf4cdc33, this commit makes the order
of header file inclusion consistent for backend modules.

In the passing, removed a couple of duplicate inclusions.

Author: Vignesh C
Reviewed-by: Kuntal Ghosh and Amit Kapila
Discussion: https://postgr.es/m/CALDaNm2Sznv8RR6Ex-iJO6xAdsxgWhCoETkaYX=+9DW3q0QCfA@mail.gmail.com
2019-11-12 08:30:16 +05:30
Andres Freund aae50236e4 Pass ItemPointer not HeapTuple to IndexBuildCallback.
Not all AMs use HeapTuples internally, making it inconvenient to pass
a HeapTuple. As the index callbacks really only need the TID, not the
full tuple, modify callback to only take ItemPointer.

Author: Ashwin Agrawal
Reviewed-By: Andres Freund
Discussion: https://postgr.es/m/CALfoeis6=8ehuR=VNtHvj3z16cYfCwPdTcpaxU+sfSUJ5QgR3g@mail.gmail.com
2019-11-08 11:49:29 -08:00
Andres Freund 01368e5d9d Split all OBJS style lines in makefiles into one-line-per-entry style.
When maintaining or merging patches, one of the most common sources
for conflicts are the list of objects in makefiles. Especially when
the split across lines has been changed on both sides, which is
somewhat common due to attempting to stay below 80 columns, those
conflicts are unnecessarily laborious to resolve.

By splitting, and alphabetically sorting, OBJS style lines into one
object per line, conflicts should be less frequent, and easier to
resolve when they still occur.

Author: Andres Freund
Discussion: https://postgr.es/m/20191029200901.vww4idgcxv74cwes@alap3.anarazel.de
2019-11-05 14:41:07 -08:00
Michael Paquier 3534fa2233 Refactor code building relation options
Historically, the code to build relation options has been shaped the
same way in multiple code paths by using a set of datums in input with
the options parsed with a static table which is then filled with the
option values.  This introduces a new common routine in reloptions.c to
do most of the legwork for the in-core code paths.

Author: Amit Langote
Reviewed-by: Michael Paquier
Discussion: https://postgr.es/m/CA+HiwqGsoSn_uTPPYT19WrtR7oYpYtv4CdS0xuedTKiHHWuk_g@mail.gmail.com
2019-11-05 09:17:05 +09:00
Michael Paquier 66bde49d96 Fix inconsistencies and typos in the tree, take 10
This addresses some issues with unnecessary code comments, fixes various
typos in docs and comments, and removes some orphaned structures and
definitions.

Author: Alexander Lakhin
Discussion: https://postgr.es/m/9aabc775-5494-b372-8bcb-4dfc0bd37c68@gmail.com
2019-08-13 13:53:41 +09:00
Michael Paquier c74d49d41c Fix many typos and inconsistencies
Author: Alexander Lakhin
Discussion: https://postgr.es/m/af27d1b3-a128-9d62-46e0-88f424397f44@gmail.com
2019-07-01 10:00:23 +09:00
Noah Misch 31d250e049 Update stale comments, and fix comment typos. 2019-06-08 10:12:26 -07:00
Tom Lane 8255c7a5ee Phase 2 pgindent run for v12.
Switch to 2.1 version of pg_bsd_indent.  This formats
multiline function declarations "correctly", that is with
additional lines of parameter declarations indented to match
where the first line's left parenthesis is.

Discussion: https://postgr.es/m/CAEepm=0P3FeTXRcU5B2W3jv3PgRVZ-kGUXLGfd42FFhUROO3ug@mail.gmail.com
2019-05-22 13:04:48 -04:00
Amit Kapila 7db0cde6b5 Revert "Avoid the creation of the free space map for small heap relations".
This feature was using a process local map to track the first few blocks
in the relation.  The map was reset each time we get the block with enough
freespace.  It was discussed that it would be better to track this map on
a per-relation basis in relcache and then invalidate the same whenever
vacuum frees up some space in the page or when FSM is created.  The new
design would be better both in terms of API design and performance.

List of commits reverted, in reverse chronological order:

06c8a5090e  Improve code comments in b0eaa4c51b.
13e8643bfc  During pg_upgrade, conditionally skip transfer of FSMs.
6f918159a9  Add more tests for FSM.
9c32e4c350  Clear the local map when not used.
29d108cdec  Update the documentation for FSM behavior..
08ecdfe7e5  Make FSM test portable.
b0eaa4c51b  Avoid creation of the free space map for small heap relations.

Discussion: https://postgr.es/m/20190416180452.3pm6uegx54iitbt5@alap3.anarazel.de
2019-05-07 09:30:24 +05:30
Alvaro Herrera ab0dfc961b Report progress of CREATE INDEX operations
This uses the progress reporting infrastructure added by c16dc1aca5,
adding support for CREATE INDEX and CREATE INDEX CONCURRENTLY.

There are two pieces to this: one is index-AM-agnostic, and the other is
AM-specific.  The latter is fairly elaborate for btrees, including
reportage for parallel index builds and the separate phases that btree
index creation uses; other index AMs, which are much simpler in their
building procedures, have simplistic reporting only, but that seems
sufficient, at least for non-concurrent builds.

The index-AM-agnostic part is fairly complete, providing insight into
the CONCURRENTLY wait phases as well as block-based progress during the
index validation table scan.  (The index validation index scan requires
patching each AM, which has not been included here.)

Reviewers: Rahila Syed, Pavan Deolasee, Tatsuro Yamada
Discussion: https://postgr.es/m/20181220220022.mg63bhk26zdpvmcj@alvherre.pgsql
2019-04-02 15:18:08 -03:00
Andres Freund 2a96909a4a tableam: Support for an index build's initial table scan(s).
To support building indexes over tables of different AMs, the scans to
do so need to be routed through the table AM.  While moving a fair
amount of code, nearly all the changes are just moving code to below a
callback.

Currently the range based interface wouldn't make much sense for non
block based table AMs. But that seems aceptable for now.

Author: Andres Freund
Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
2019-03-27 19:59:06 -07:00
Peter Eisentraut 37d9916020 More unconstify use
Replace casts whose only purpose is to cast away const with the
unconstify() macro.

Discussion: https://www.postgresql.org/message-id/flat/53a28052-f9f3-1808-fed9-460fd43035ab%402ndquadrant.com
2019-02-13 11:50:16 +01:00
Amit Kapila b0eaa4c51b Avoid creation of the free space map for small heap relations, take 2.
Previously, all heaps had FSMs. For very small tables, this means that the
FSM took up more space than the heap did. This is wasteful, so now we
refrain from creating the FSM for heaps with 4 pages or fewer. If the last
known target block has insufficient space, we still try to insert into some
other page before giving up and extending the relation, since doing
otherwise leads to table bloat. Testing showed that trying every page
penalized performance slightly, so we compromise and try every other page.
This way, we visit at most two pages. Any pages with wasted free space
become visible at next relation extension, so we still control table bloat.
As a bonus, directly attempting one or two pages can even be faster than
consulting the FSM would have been.

Once the FSM is created for a heap we don't remove it even if somebody
deletes all the rows from the corresponding relation.  We don't think it is
a useful optimization as it is quite likely that relation will again grow
to the same size.

Author: John Naylor, Amit Kapila
Reviewed-by: Amit Kapila
Tested-by: Mithun C Y
Discussion: https://www.postgresql.org/message-id/CAJVSVGWvB13PzpbLEecFuGFc5V2fsO736BsdTakPiPAcdMM5tQ@mail.gmail.com
2019-02-04 07:49:15 +05:30
Amit Kapila a23676503b Revert "Avoid creation of the free space map for small heap relations."
This reverts commit ac88d2962a.
2019-01-28 11:31:44 +05:30
Amit Kapila ac88d2962a Avoid creation of the free space map for small heap relations.
Previously, all heaps had FSMs. For very small tables, this means that the
FSM took up more space than the heap did. This is wasteful, so now we
refrain from creating the FSM for heaps with 4 pages or fewer. If the last
known target block has insufficient space, we still try to insert into some
other page before giving up and extending the relation, since doing
otherwise leads to table bloat. Testing showed that trying every page
penalized performance slightly, so we compromise and try every other page.
This way, we visit at most two pages. Any pages with wasted free space
become visible at next relation extension, so we still control table bloat.
As a bonus, directly attempting one or two pages can even be faster than
consulting the FSM would have been.

Once the FSM is created for a heap we don't remove it even if somebody
deletes all the rows from the corresponding relation.  We don't think it is
a useful optimization as it is quite likely that relation will again grow
to the same size.

Author: John Naylor with design inputs and some code contribution by Amit Kapila
Reviewed-by: Amit Kapila
Tested-by: Mithun C Y
Discussion: https://www.postgresql.org/message-id/CAJVSVGWvB13PzpbLEecFuGFc5V2fsO736BsdTakPiPAcdMM5tQ@mail.gmail.com
2019-01-28 08:14:06 +05:30
Andres Freund e0c4ec0728 Replace uses of heap_open et al with the corresponding table_* function.
Author: Andres Freund
Discussion: https://postgr.es/m/20190111000539.xbv7s6w7ilcvm7dp@alap3.anarazel.de
2019-01-21 10:51:37 -08:00
Andres Freund 111944c5ee Replace heapam.h includes with {table, relation}.h where applicable.
A lot of files only included heapam.h for relation_open, heap_open etc
- replace the heapam.h include in those files with the narrower
header.

Author: Andres Freund
Discussion: https://postgr.es/m/20190111000539.xbv7s6w7ilcvm7dp@alap3.anarazel.de
2019-01-21 10:51:37 -08:00
Andres Freund 4c850ecec6 Don't include heapam.h from others headers.
heapam.h previously was included in a number of widely used
headers (e.g. execnodes.h, indirectly in executor.h, ...). That's
problematic on its own, as heapam.h contains a lot of low-level
details that don't need to be exposed that widely, but becomes more
problematic with the upcoming introduction of pluggable table storage
- it seems inappropriate for heapam.h to be included that widely
afterwards.

heapam.h was largely only included in other headers to get the
HeapScanDesc typedef (which was defined in heapam.h, even though
HeapScanDescData is defined in relscan.h). The better solution here
seems to be to just use the underlying struct (forward declared where
necessary). Similar for BulkInsertState.

Another problem was that LockTupleMode was used in executor.h - parts
of the file tried to cope without heapam.h, but due to the fact that
it indirectly included it, several subsequent violations of that goal
were not not noticed. We could just reuse the approach of declaring
parameters as int, but it seems nicer to move LockTupleMode to
lockoptions.h - that's not a perfect location, but also doesn't seem
bad.

As a number of files relied on implicitly included heapam.h, a
significant number of files grew an explicit include. It's quite
probably that a few external projects will need to do the same.

Author: Andres Freund
Reviewed-By: Alvaro Herrera
Discussion: https://postgr.es/m/20190114000701.y4ttcb74jpskkcfb@alap3.anarazel.de
2019-01-14 16:24:41 -08:00
Bruce Momjian 97c39498e5 Update copyright for 2019
Backpatch-through: certain files through 9.4
2019-01-02 12:44:25 -05:00
Andres Freund 578b229718 Remove WITH OIDS support, change oid catalog column visibility.
Previously tables declared WITH OIDS, including a significant fraction
of the catalog tables, stored the oid column not as a normal column,
but as part of the tuple header.

This special column was not shown by default, which was somewhat odd,
as it's often (consider e.g. pg_class.oid) one of the more important
parts of a row.  Neither pg_dump nor COPY included the contents of the
oid column by default.

The fact that the oid column was not an ordinary column necessitated a
significant amount of special case code to support oid columns. That
already was painful for the existing, but upcoming work aiming to make
table storage pluggable, would have required expanding and duplicating
that "specialness" significantly.

WITH OIDS has been deprecated since 2005 (commit ff02d0a05280e0).
Remove it.

Removing includes:
- CREATE TABLE and ALTER TABLE syntax for declaring the table to be
  WITH OIDS has been removed (WITH (oids[ = true]) will error out)
- pg_dump does not support dumping tables declared WITH OIDS and will
  issue a warning when dumping one (and ignore the oid column).
- restoring an pg_dump archive with pg_restore will warn when
  restoring a table with oid contents (and ignore the oid column)
- COPY will refuse to load binary dump that includes oids.
- pg_upgrade will error out when encountering tables declared WITH
  OIDS, they have to be altered to remove the oid column first.
- Functionality to access the oid of the last inserted row (like
  plpgsql's RESULT_OID, spi's SPI_lastoid, ...) has been removed.

The syntax for declaring a table WITHOUT OIDS (or WITH (oids = false)
for CREATE TABLE) is still supported. While that requires a bit of
support code, it seems unnecessary to break applications / dumps that
do not use oids, and are explicit about not using them.

The biggest user of WITH OID columns was postgres' catalog. This
commit changes all 'magic' oid columns to be columns that are normally
declared and stored. To reduce unnecessary query breakage all the
newly added columns are still named 'oid', even if a table's column
naming scheme would indicate 'reloid' or such.  This obviously
requires adapting a lot code, mostly replacing oid access via
HeapTupleGetOid() with access to the underlying Form_pg_*->oid column.

The bootstrap process now assigns oids for all oid columns in
genbki.pl that do not have an explicit value (starting at the largest
oid previously used), only oids assigned later by oids will be above
FirstBootstrapObjectId. As the oid column now is a normal column the
special bootstrap syntax for oids has been removed.

Oids are not automatically assigned during insertion anymore, all
backend code explicitly assigns oids with GetNewOidWithIndex(). For
the rare case that insertions into the catalog via SQL are called for
the new pg_nextoid() function can be used (which only works on catalog
tables).

The fact that oid columns on system tables are now normal columns
means that they will be included in the set of columns expanded
by * (i.e. SELECT * FROM pg_class will now include the table's oid,
previously it did not). It'd not technically be hard to hide oid
column by default, but that'd mean confusing behavior would either
have to be carried forward forever, or it'd cause breakage down the
line.

While it's not unlikely that further adjustments are needed, the
scope/invasiveness of the patch makes it worthwhile to get merge this
now. It's painful to maintain externally, too complicated to commit
after the code code freeze, and a dependency of a number of other
patches.

Catversion bump, for obvious reasons.

Author: Andres Freund, with contributions by John Naylor
Discussion: https://postgr.es/m/20180930034810.ywp2c7awz7opzcfr@alap3.anarazel.de
2018-11-20 16:00:17 -08:00
Alvaro Herrera 74da7cda31 Fail BRIN control functions during recovery explicitly
They already fail anyway, but prior to this patch they raise an ugly
error message about a lock that cannot be acquired.  This just improves
the message.

Author: Masahiko Sawada
Reported-by: Masahiko Sawada
Discussion: https://postgr.es/m/CAD21AoBZau4g4_NUf3BKNd=CdYK+xaPdtJCzvOC1TxGdTiJx_Q@mail.gmail.com
Reviewed-by: Kuntal Ghosh, Alexander Korotkov, Simon Riggs, Michaël Paquier, Álvaro Herrera
2018-06-14 12:51:32 -04:00
Tom Lane bdf46af748 Post-feature-freeze pgindent run.
Discussion: https://postgr.es/m/15719.1523984266@sss.pgh.pa.us
2018-04-26 14:47:16 -04:00
Teodor Sigaev 8224de4f42 Indexes with INCLUDE columns and their support in B-tree
This patch introduces INCLUDE clause to index definition.  This clause
specifies a list of columns which will be included as a non-key part in
the index.  The INCLUDE columns exist solely to allow more queries to
benefit from index-only scans.  Also, such columns don't need to have
appropriate operator classes.  Expressions are not supported as INCLUDE
columns since they cannot be used in index-only scans.

Index access methods supporting INCLUDE are indicated by amcaninclude flag
in IndexAmRoutine.  For now, only B-tree indexes support INCLUDE clause.

In B-tree indexes INCLUDE columns are truncated from pivot index tuples
(tuples located in non-leaf pages and high keys).  Therefore, B-tree indexes
now might have variable number of attributes.  This patch also provides
generic facility to support that: pivot tuples contain number of their
attributes in t_tid.ip_posid.  Free 13th bit of t_info is used for indicating
that.  This facility will simplify further support of index suffix truncation.
The changes of above are backward-compatible, pg_upgrade doesn't need special
handling of B-tree indexes for that.

Bump catalog version

Author: Anastasia Lubennikova with contribition by Alexander Korotkov and me
Reviewed by: Peter Geoghegan, Tomas Vondra, Antonin Houska, Jeff Janes,
			 David Rowley, Alexander Korotkov
Discussion: https://www.postgresql.org/message-id/flat/56168952.4010101@postgrespro.ru
2018-04-07 23:00:39 +03:00
Tom Lane 1383e2a1a9 Improve FSM management for BRIN indexes.
BRIN indexes like to propagate additions of free space into the upper pages
of their free space maps as soon as the new space is known, even when it's
just on one individual index page.  Previously this required calling
FreeSpaceMapVacuum, which is quite an expensive thing if the map is large.
Use the FreeSpaceMapVacuumRange function recently added by commit c79f6df75
to reduce the amount of work done for this purpose.

Fix a couple of places that neglected to do the upper-page vacuuming at all
after recording new free space.  If the policy is to be that BRIN should do
that, it should do it everywhere.

Do RecordPageWithFreeSpace unconditionally in brin_page_cleanup, and do
FreeSpaceMapVacuum unconditionally in brin_vacuum_scan.  Because of the
FSM's imprecise storage of free space, the old complications here seldom
bought anything, they just slowed things down.  This approach also
provides a predictable path for FSM corruption to be repaired.

Remove premature RecordPageWithFreeSpace call in brin_getinsertbuffer
where it's about to return an extended page to the caller.  The caller
should do that, instead, after it's inserted its new tuple.  Fix the
one caller that forgot to do so.

Simplify logic in brin_doupdate's same-page-update case by postponing
brin_initialize_empty_new_buffer to after the critical section; I see
little point in doing it before.

Avoid repeat calls of RelationGetNumberOfBlocks in brin_vacuum_scan.
Avoid duplicate BufferGetBlockNumber and BufferGetPage calls in
a couple of places where we already had the right values.

Move a BRIN_elog debug logging call out of a critical section; that's
pretty unsafe and I don't think it buys us anything to not wait till
after the critical section.

Move the "*extended = false" step in brin_getinsertbuffer into the
routine's main loop.  There's no actual bug there, since the loop can't
iterate with *extended still true, but it doesn't seem very future-proof
as coded; and it's certainly not documented as a loop invariant.

This is all from follow-on investigation inspired by commit c79f6df75.

Discussion: https://postgr.es/m/5801.1522429460@sss.pgh.pa.us
2018-04-04 14:26:04 -04:00
Alvaro Herrera 530bcf7581 Fix thinko in comment
The listed numbers disagreed with the ones being used in the symbols;
but instead of just fixing the numbers in the comment, use the symbolic
name instead, which seems clearer.

This has been wrong all along, so apply back to 9.5 where BRIN was
introduced.

Reported-by: Tomas Vondra
Discussion: https://postgr.es/m/5ff514f2-8b1e-6366-b11c-8e2ed442562d@2ndquadrant.com
2018-03-26 12:03:42 -03:00
Alvaro Herrera 484a4a08ab Log when a BRIN autosummarization request fails
Autovacuum's 'workitem' request queue is of limited size, so requests
can fail if they arrive more quickly than autovacuum can process them.
Emit a log message when this happens, to provide better visibility of
this.

Backpatch to 10.  While this represents an API change for
AutoVacuumRequestWork, that function is not yet prepared to deal with
external modules calling it, so there doesn't seem to be any risk (other
than log spam, that is.)

Author: Masahiko Sawada
Reviewed-by: Fabrízio Mello, Ildar Musin, Álvaro Herrera
Discussion: https://postgr.es/m/CAD21AoB1HrQhp6_4rTyHN5kWEJCEsG8YzsjZNt-ctoXSn5Uisw@mail.gmail.com
2018-03-14 11:59:40 -03:00
Robert Haas 9da0cc3528 Support parallel btree index builds.
To make this work, tuplesort.c and logtape.c must also support
parallelism, so this patch adds that infrastructure and then applies
it to the particular case of parallel btree index builds.  Testing
to date shows that this can often be 2-3x faster than a serial
index build.

The model for deciding how many workers to use is fairly primitive
at present, but it's better than not having the feature.  We can
refine it as we get more experience.

Peter Geoghegan with some help from Rushabh Lathia.  While Heikki
Linnakangas is not an author of this patch, he wrote other patches
without which this feature would not have been possible, and
therefore the release notes should possibly credit him as an author
of this feature.  Reviewed by Claudio Freire, Heikki Linnakangas,
Thomas Munro, Tels, Amit Kapila, me.

Discussion: http://postgr.es/m/CAM3SWZQKM=Pzc=CAHzRixKjp2eO5Q0Jg1SoFQqeXFQ647JiwqQ@mail.gmail.com
Discussion: http://postgr.es/m/CAH2-Wz=AxWqDoVvGU7dq856S4r6sJAj6DBn7VMtigkB33N5eyg@mail.gmail.com
2018-02-02 13:32:44 -05:00
Peter Eisentraut 8b9e9644dc Replace AclObjectKind with ObjectType
AclObjectKind was basically just another enumeration for object types,
and we already have a preferred one for that.  It's only used in
aclcheck_error.  By using ObjectType instead, we can also give some more
precise error messages, for example "index" instead of "relation".

Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
2018-01-19 14:01:15 -05:00
Bruce Momjian 9d4649ca49 Update copyright for 2018
Backpatch-through: certain files through 9.3
2018-01-02 23:30:12 -05:00
Robert Haas eaedf0df71 Update typedefs.list and re-run pgindent
Discussion: http://postgr.es/m/CA+TgmoaA9=1RWKtBWpDaj+sF3Stgc8sHgf5z=KGtbjwPLQVDMA@mail.gmail.com
2017-11-29 09:24:24 -05:00
Peter Eisentraut 2eb4a831e5 Change TRUE/FALSE to true/false
The lower case spellings are C and C++ standard and are used in most
parts of the PostgreSQL sources.  The upper case spellings are only used
in some files/modules.  So standardize on the standard spellings.

The APIs for ICU, Perl, and Windows define their own TRUE and FALSE, so
those are left as is when using those APIs.

In code comments, we use the lower-case spelling for the C concepts and
keep the upper-case spelling for the SQL concepts.

Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
2017-11-08 11:37:28 -05:00
Alvaro Herrera 1b890562b8 Fix thinkos in BRIN summarization
The previous commit contained a thinko that made a single-range
summarization request process from there to end of table.  Fix by
setting the correct end range point.  Per buildfarm.
2017-11-03 20:45:36 +01:00
Alvaro Herrera ec42a1dcb3 Fix BRIN summarization concurrent with extension
If a process is extending a table concurrently with some BRIN
summarization process, it is possible for the latter to miss pages added
by the former because the number of pages is computed ahead of time.

Fix by determining a fresh relation size after inserting the placeholder
tuple: any process that further extends the table concurrently will
update the placeholder tuple, while previous pages will be processed by
the heap scan.

Reported-by: Tomas Vondra
Reviewed-by: Tom Lane
Author: Álvaro Herrera
Discussion: https://postgr.es/m/083d996a-4a8a-0e13-800a-851dd09ad8cc@2ndquadrant.com
Backpatch-to: 9.5
2017-11-03 17:23:13 +01:00
Tom Lane 81e334ce4e Set the metapage's pd_lower correctly in brin, gin, and spgist indexes.
Previously, these index types left the pd_lower field set to the default
SizeOfPageHeaderData, which is really a lie because it ought to point past
whatever space is being used for metadata.  The coding accidentally failed
to fail because we never told xlog.c that the metapage is of standard
format --- but that's not very good, because it impedes WAL consistency
checking, and in some cases prevents compression of full-page images.

To fix, ensure that we set pd_lower correctly, not only when creating a
metapage but whenever we write it out (these apparently redundant steps are
needed to cope with pg_upgrade'd indexes that don't yet contain the right
value).  This allows telling xlog.c that the page is of standard format.

The WAL consistency check mask functions are made to mask only if pd_lower
appears valid, which I think is likely unnecessary complication, since
any metapage appearing in a v11 WAL stream should contain valid pd_lower.
But it doesn't cost much to be paranoid.

Amit Langote, reviewed by Michael Paquier and Amit Kapila

Discussion: https://postgr.es/m/0d273805-0e9e-ec1a-cb84-d4da400b8f85@lab.ntt.co.jp
2017-11-02 17:22:08 -04:00
Tom Lane 62a16572d5 Fix corner-case errors in brin_doupdate().
In some cases the BRIN code releases lock on an index page, and later
re-acquires lock and tries to check that the tuple it was working on is
still there.  That check was a couple bricks shy of a load.  It didn't
consider that the page might have turned into a "revmap" page.  (The
samepage code path doesn't call brin_getinsertbuffer(), so it isn't
protected by the checks for revmap status there.)  It also didn't check
whether the tuple offset was now off the end of the linepointer array.
Since commit 24992c6db the latter case is pretty common, but at least
in principle it could have occurred before that.  The net result is
that concurrent updates of a BRIN index could fail with errors like
"invalid index offnum" or "inconsistent range map".

Per report from Tomas Vondra.  Back-patch to 9.5, since this code is
substantially the same in all versions containing BRIN.

Discussion: https://postgr.es/m/10d2b9f9-f427-03b8-8ad9-6af4ecacbee9@2ndquadrant.com
2017-11-02 12:54:55 -04:00
Robert Haas 6a2fa09c0c For wal_consistency_checking, mask page checksum as well as page LSN.
If the LSN is different, the checksum will be different, too.

Ashwin Agrawal, reviewed by Michael Paquier and Kuntal Ghosh

Discussion: http://postgr.es/m/CALfoeis5iqrAU-+JAN+ZzXkpPr7+-0OAGv7QUHwFn=-wDy4o4Q@mail.gmail.com
2017-09-22 14:28:22 -04:00
Andres Freund 2cd7084524 Change tupledesc->attrs[n] to TupleDescAttr(tupledesc, n).
This is a mechanical change in preparation for a later commit that
will change the layout of TupleDesc.  Introducing a macro to abstract
the details of where attributes are stored will allow us to change
that in separate step and revise it in future.

Author: Thomas Munro, editorialized by Andres Freund
Reviewed-By: Andres Freund
Discussion: https://postgr.es/m/CAEepm=0ZtQ-SpsgCyzzYpsXS6e=kZWqk3g5Ygn3MDV7A8dabUA@mail.gmail.com
2017-08-20 11:19:07 -07:00
Tom Lane 382ceffdf7 Phase 3 of pgindent updates.
Don't move parenthesized lines to the left, even if that means they
flow past the right margin.

By default, BSD indent lines up statement continuation lines that are
within parentheses so that they start just to the right of the preceding
left parenthesis.  However, traditionally, if that resulted in the
continuation line extending to the right of the desired right margin,
then indent would push it left just far enough to not overrun the margin,
if it could do so without making the continuation line start to the left of
the current statement indent.  That makes for a weird mix of indentations
unless one has been completely rigid about never violating the 80-column
limit.

This behavior has been pretty universally panned by Postgres developers.
Hence, disable it with indent's new -lpl switch, so that parenthesized
lines are always lined up with the preceding left paren.

This patch is much less interesting than the first round of indent
changes, but also bulkier, so I thought it best to separate the effects.

Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
2017-06-21 15:35:54 -04:00
Tom Lane c7b8998ebb Phase 2 of pgindent updates.
Change pg_bsd_indent to follow upstream rules for placement of comments
to the right of code, and remove pgindent hack that caused comments
following #endif to not obey the general rule.

Commit e3860ffa4d wasn't actually using
the published version of pg_bsd_indent, but a hacked-up version that
tried to minimize the amount of movement of comments to the right of
code.  The situation of interest is where such a comment has to be
moved to the right of its default placement at column 33 because there's
code there.  BSD indent has always moved right in units of tab stops
in such cases --- but in the previous incarnation, indent was working
in 8-space tab stops, while now it knows we use 4-space tabs.  So the
net result is that in about half the cases, such comments are placed
one tab stop left of before.  This is better all around: it leaves
more room on the line for comment text, and it means that in such
cases the comment uniformly starts at the next 4-space tab stop after
the code, rather than sometimes one and sometimes two tabs after.

Also, ensure that comments following #endif are indented the same
as comments following other preprocessor commands such as #else.
That inconsistency turns out to have been self-inflicted damage
from a poorly-thought-through post-indent "fixup" in pgindent.

This patch is much less interesting than the first round of indent
changes, but also bulkier, so I thought it best to separate the effects.

Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
2017-06-21 15:19:25 -04:00
Tom Lane 651902deb1 Re-run pgindent.
This is just to have a clean base state for testing of Piotr Stefaniak's
latest version of FreeBSD indent.  I fixed up a couple of places where
pgindent would have changed format not-nicely.  perltidy not included.

Discussion: https://postgr.es/m/VI1PR03MB119959F4B65F000CA7CD9F6BF2CC0@VI1PR03MB1199.eurprd03.prod.outlook.com
2017-06-13 13:05:59 -04:00
Alvaro Herrera 55a70a023c Assorted translatable string fixes
Mark our rusage reportage string translatable; remove quotes from type
names; unify formatting of very similar messages.
2017-06-04 11:41:16 -04:00
Alvaro Herrera b4da9d0e1e brin: Don't crash on auto-summarization
We were trying to free a pointer into a shared buffer, which never
works; and we were failing to release the buffer lock appropriately.
Fix those omissions.

While at it, improve documentation for brinGetTupleForHeapBlock, the
inadequacy of which evidently caused these bugs in the first place.

Reported independently by Zhou Digoal (bug #14668) and Alexander Sosna.

Discussion: https://postgr.es/m/8c31c11b-6adb-228d-22c2-4ace89fc9209@credativ.de
Discussion: https://postgr.es/m/20170524063323.29941.46339@wrigleys.postgresql.org
2017-05-30 18:17:09 -04:00
Alvaro Herrera e6785a5ca1 Fix wording in amvalidate error messages
Remove some gratuituous message differences by making the AM name
previously embedded in each message be a %s instead.  While at it, get
rid of terminology that's unclear and unnecessary in one message.

Discussion: https://postgr.es/m/20170523001557.bq2hbq7hxyvyw62q@alvherre.pgsql
2017-05-30 15:45:42 -04:00
Bruce Momjian a6fd7b7a5f Post-PG 10 beta1 pgindent run
perltidy run not included.
2017-05-17 16:31:56 -04:00
Alvaro Herrera 8bf74967da Reduce the number of pallocs() in BRIN
Instead of allocating memory in brin_deform_tuple and brin_copy_tuple
over and over during a scan, allow reuse of previously allocated memory.
This is said to make for a measurable performance improvement.

Author: Jinyu Zhang, Álvaro Herrera
Reviewed by: Tomas Vondra
Discussion: https://postgr.es/m/495deb78.4186.1500dacaa63.Coremail.beijing_pg@163.com
2017-04-07 19:08:43 -03:00
Alvaro Herrera 817cb10013 Fix new BRIN desummarize WAL record
The WAL-writing piece was forgetting to set the pages-per-range value.
Also, fix the declared type of struct member heapBlk, which I mistakenly
set as OffsetNumber rather than BlockNumber.

Problem was introduced by commit c655899ba9 (April 1st).  Any system
that tries to replay the new WAL record written before this fix is
likely to die on replay and require pg_resetwal.

Reported by Tom Lane.
Discussion: https://postgr.es/m/20191.1491524824@sss.pgh.pa.us
2017-04-07 17:11:56 -03:00
Alvaro Herrera 7e534adcdc Fix BRIN cost estimation
The original code was overly optimistic about the cost of scanning a
BRIN index, leading to BRIN indexes being selected when they'd be a
worse choice than some other index.  This complete rewrite should be
more accurate.

Author: David Rowley, based on an earlier patch by Emre Hasegeli
Reviewed-by: Emre Hasegeli
Discussion: https://postgr.es/m/CAKJS1f9n-Wapop5Xz1dtGdpdqmzeGqQK4sV2MK-zZugfC14Xtw@mail.gmail.com
2017-04-06 17:51:53 -03:00
Alvaro Herrera c655899ba9 BRIN de-summarization
When the BRIN summary tuple for a page range becomes too "wide" for the
values actually stored in the table (because the tuples that were
present originally are no longer present due to updates or deletes), it
can be useful to remove the outdated summary tuple, so that a future
summarization can install a tighter summary.

This commit introduces a SQL-callable interface to do so.

Author: Álvaro Herrera
Reviewed-by: Eiji Seki
Discussion: https://postgr.es/m/20170228045643.n2ri74ara4fhhfxf@alvherre.pgsql
2017-04-01 16:10:04 -03:00
Alvaro Herrera 7526e10224 BRIN auto-summarization
Previously, only VACUUM would cause a page range to get initially
summarized by BRIN indexes, which for some use cases takes too much time
since the inserts occur.  To avoid the delay, have brininsert request a
summarization run for the previous range as soon as the first tuple is
inserted into the first page of the next range.  Autovacuum is in charge
of processing these requests, after doing all the regular vacuuming/
analyzing work on tables.

This doesn't impose any new tasks on autovacuum, because autovacuum was
already in charge of doing summarizations.  The only actual effect is to
change the timing, i.e. that it occurs earlier.  For this reason, we
don't go any great lengths to record these requests very robustly; if
they are lost because of a server crash or restart, they will happen at
a later time anyway.

Most of the new code here is in autovacuum, which can now be told about
"work items" to process.  This can be used for other things such as GIN
pending list cleaning, perhaps visibility map bit setting, both of which
are currently invoked during vacuum, but do not really depend on vacuum
taking place.

The requests are at the page range level, a granularity for which we did
not have SQL-level access; we only had index-level summarization
requests via brin_summarize_new_values().  It seems reasonable to add
SQL-level access to range-level summarization too, so add a function
brin_summarize_range() to do that.

Authors: Álvaro Herrera, based on sketch from Simon Riggs.
Reviewed-by: Thomas Munro.
Discussion: https://postgr.es/m/20170301045823.vneqdqkmsd4as4ds@alvherre.pgsql
2017-04-01 14:00:53 -03:00
Robert Haas 5262f7a4fc Add optimizer and executor support for parallel index scans.
In combination with 569174f1be, which
taught the btree AM how to perform parallel index scans, this allows
parallel index scan plans on btree indexes.  This infrastructure
should be general enough to support parallel index scans for other
index AMs as well, if someone updates them to support parallel
scans.

Amit Kapila, reviewed and tested by Anastasia Lubennikova, Tushar
Ahuja, and Haribabu Kommi, and me.
2017-02-15 13:53:24 -05:00
Tom Lane 86d911ec0f Allow index AMs to cache data across aminsert calls within a SQL command.
It's always been possible for index AMs to cache data across successive
amgettuple calls within a single SQL command: the IndexScanDesc.opaque
field is meant for precisely that.  However, no comparable facility
exists for amortizing setup work across successive aminsert calls.
This patch adds such a feature and teaches GIN, GIST, and BRIN to use it
to amortize catalog lookups they'd previously been doing on every call.
(The other standard index AMs keep everything they need in the relcache,
so there's little to improve there.)

For GIN, the overall improvement in a statement that inserts many rows
can be as much as 10%, though it seems a bit less for the other two.
In addition, this makes a really significant difference in runtime
for CLOBBER_CACHE_ALWAYS tests, since in those builds the repeated
catalog lookups are vastly more expensive.

The reason this has been hard up to now is that the aminsert function is
not passed any useful place to cache per-statement data.  What I chose to
do is to add suitable fields to struct IndexInfo and pass that to aminsert.
That's not widening the index AM API very much because IndexInfo is already
within the ken of ambuild; in fact, by passing the same info to aminsert
as to ambuild, this is really removing an inconsistency in the AM API.

Discussion: https://postgr.es/m/27568.1486508680@sss.pgh.pa.us
2017-02-09 11:52:12 -05:00
Robert Haas a507b86900 Add WAL consistency checking facility.
When the new GUC wal_consistency_checking is set to a non-empty value,
it triggers recording of additional full-page images, which are
compared on the standby against the results of applying the WAL record
(without regard to those full-page images).  Allowable differences
such as hints are masked out, and the resulting pages are compared;
any difference results in a FATAL error on the standby.

Kuntal Ghosh, based on earlier patches by Michael Paquier and Heikki
Linnakangas.  Extensively reviewed and revised by Michael Paquier and
by me, with additional reviews and comments from Amit Kapila, Álvaro
Herrera, Simon Riggs, and Peter Eisentraut.
2017-02-08 15:45:30 -05:00
Robert Haas 7b4ac19982 Extend index AM API for parallel index scans.
This patch doesn't actually make any index AM parallel-aware, but it
provides the necessary functions at the AM layer to do so.

Rahila Syed, Amit Kapila, Robert Haas
2017-01-24 16:42:58 -05:00
Peter Eisentraut f21a563d25 Move some things from builtins.h to new header files
This avoids that builtins.h has to include additional header files.
2017-01-20 20:29:53 -05:00
Peter Eisentraut 352a24a1f9 Generate fmgr prototypes automatically
Gen_fmgrtab.pl creates a new file fmgrprotos.h, which contains
prototypes for all functions registered in pg_proc.h.  This avoids
having to manually maintain these prototypes across a random variety of
header files.  It also automatically enforces a correct function
signature, and since there are warnings about missing prototypes, it
will detect functions that are defined but not registered in
pg_proc.h (or otherwise used).

Reviewed-by: Pavel Stehule <pavel.stehule@gmail.com>
2017-01-17 14:06:07 -05:00
Alvaro Herrera 7403561c0f BRIN revmap pages are not standard pages ...
... and therefore we ought not to tell XLogRegisterBuffer the opposite,
when writing XLog for a brin update that moves the index tuple to a
different page.  Otherwise, xlog insertion would try to "compress the
hole" when producing a full-page image for it; but since we don't update
pd_lower/upper, the hole covers the whole page.  On WAL replay, the
revmap page becomes empty and so the entire portion of the index is
useless and needs to be recomputed.

This is low-probability: a BRIN update only moves an index tuple to a
different page when the summary tuple is larger than the existing one,
which doesn't happen with fixed-width datatypes.  Also, the revmap
page must be first after a checkpoint.

Report and patch: Kuntal Ghosh
Bug is alleged to have detected by a WAL-consistency-checking tool.
Discussion: https://postgr.es/m/CAGz5QCJ=00UQjScSEFbV=0qO5ShTZB9WWz_Fm7+Wd83zPs9Geg@mail.gmail.com

I posted a test case demonstrating the problem, but I'm refraining from
adding it to the test suite; if the WAL consistency tool makes it in,
that will be a better way to catch this from regressing.  (We should
definitely have someting that causes not-same-page updates, though.)
2017-01-09 18:19:29 -03:00
Bruce Momjian 1d25779284 Update copyright via script for 2017 2017-01-03 13:48:53 -05:00
Tom Lane 24992c6db9 Rewrite PageIndexDeleteNoCompact into a form that only deletes 1 tuple.
The full generality of deleting an arbitrary number of tuples is no longer
needed, so let's save some code and cycles by replacing the original coding
with an implementation based on PageIndexTupleDelete.

We can always get back the old code from git if we need it again for new
callers (though I don't care for its willingness to mess with line pointers
it wasn't told to mess with).

Discussion: <552.1473445163@sss.pgh.pa.us>
2016-09-09 19:00:59 -04:00
Tom Lane b1328d78f8 Invent PageIndexTupleOverwrite, and teach BRIN and GiST to use it.
PageIndexTupleOverwrite performs approximately the same function as
PageIndexTupleDelete (or PageIndexDeleteNoCompact) followed by PageAddItem
targeting the same item pointer offset.  But in the case where the new
tuple is the same size as the old, it avoids shuffling other data around on
the page, because the new tuple is placed where the old one was rather than
being appended to the end of the page.  This has been shown to provide a
substantial speedup for some GiST use-cases.

Also, this change allows some API simplifications: we can get rid of
the rather klugy and error-prone PAI_ALLOW_FAR_OFFSET flag for
PageAddItemExtended, since that was used only to cover a corner case
for BRIN that's better expressed by using PageIndexTupleOverwrite.

Note that this patch causes a rather subtle WAL incompatibility: the
physical page content change represented by certain WAL records is now
different than it was before, because while the tuples have the same
itempointer line numbers, the tuples themselves are in different places.
I have not bumped the WAL version number because I think it doesn't matter
unless you are trying to do bitwise comparisons of original and replayed
pages, and in any case we're early in a devel cycle and there will probably
be more WAL changes before v10 gets out the door.

There is probably room to make use of PageIndexTupleOverwrite in SP-GiST
and GIN too, but that is left for a future patch.

Andrey Borodin, reviewed by Anastasia Lubennikova, whacked around a bit
by me

Discussion: <CAJEAwVGQjGGOj6mMSgMwGvtFd5Kwe6VFAxY=uEPZWMDjzbn4VQ@mail.gmail.com>
2016-09-09 18:02:36 -04:00
Fujii Masao bd082231ed Fix typos in comments. 2016-08-29 16:06:40 +09:00
Tom Lane ea268cdc9a Add macros to make AllocSetContextCreate() calls simpler and safer.
I found that half a dozen (nearly 5%) of our AllocSetContextCreate calls
had typos in the context-sizing parameters.  While none of these led to
especially significant problems, they did create minor inefficiencies,
and it's now clear that expecting people to copy-and-paste those calls
accurately is not a great idea.  Let's reduce the risk of future errors
by introducing single macros that encapsulate the common use-cases.
Three such macros are enough to cover all but two special-purpose contexts;
those two calls can be left as-is, I think.

While this patch doesn't in itself improve matters for third-party
extensions, it doesn't break anything for them either, and they can
gradually adopt the simplified notation over time.

In passing, change TopMemoryContext to use the default allocation
parameters.  Formerly it could only be extended 8K at a time.  That was
probably reasonable when this code was written; but nowadays we create
many more contexts than we did then, so that it's not unusual to have a
couple hundred K in TopMemoryContext, even without considering various
dubious code that sticks other things there.  There seems no good reason
not to let it use growing blocks like most other contexts.

Back-patch to 9.6, mostly because that's still close enough to HEAD that
it's easy to do so, and keeping the branches in sync can be expected to
avoid some future back-patching pain.  The bugs fixed by these changes
don't seem to be significant enough to justify fixing them further back.

Discussion: <21072.1472321324@sss.pgh.pa.us>
2016-08-27 17:50:38 -04:00
Tom Lane b5bce6c1ec Final pgindent + perltidy run for 9.6. 2016-08-15 13:42:51 -04:00
Tom Lane ed0097e4f9 Add SQL-accessible functions for inspecting index AM properties.
Per discussion, we should provide such functions to replace the lost
ability to discover AM properties by inspecting pg_am (cf commit
65c5fcd35).  The added functionality is also meant to displace any code
that was looking directly at pg_index.indoption, since we'd rather not
believe that the bit meanings in that field are part of any client API
contract.

As future-proofing, define the SQL API to not assume that properties that
are currently AM-wide or index-wide will remain so unless they logically
must be; instead, expose them only when inquiring about a specific index
or even specific index column.  Also provide the ability for an index
AM to override the behavior.

In passing, document pg_am.amtype, overlooked in commit 473b93287.

Andrew Gierth, with kibitzing by me and others

Discussion: <87mvl5on7n.fsf@news-spur.riddles.org.uk>
2016-08-13 18:31:14 -04:00
Peter Eisentraut ef5d4a3cfa Message style improvements 2016-07-28 16:34:44 -04:00
Robert Haas 4bc424b968 pgindent run for 9.6 2016-06-09 18:02:36 -04:00
Alvaro Herrera 975ad4e602 Fix PageAddItem BRIN bug
BRIN was relying on the ability to remove a tuple from an index page,
then putting another tuple in the same line pointer.  But PageAddItem
refuses to add a tuple beyond the first free item past the last used
item, and in particular, it rejects an attempt to add an item to an
empty page anywhere other than the first line pointer.  PageAddItem
issues a WARNING and indicates to the caller that it failed, which in
turn causes the BRIN calling code to issue a PANIC, so the whole
sequence looks like this:
	WARNING:  specified item offset is too large
	PANIC:  failed to add BRIN tuple

To fix, create a new function PageAddItemExtended which is like
PageAddItem except that the two boolean arguments become a flags bitmap;
the "overwrite" and "is_heap" boolean flags in PageAddItem become
PAI_OVERWITE and PAI_IS_HEAP flags in the new function, and a new flag
PAI_ALLOW_FAR_OFFSET enables the behavior required by BRIN.
PageAddItem() retains its original signature, for compatibility with
third-party modules (other callers in core code are not modified,
either).

Also, in the belt-and-suspenders spirit, I added a new sanity check in
brinGetTupleForHeapBlock to raise an error if an TID found in the revmap
is not marked as live by the page header.  This causes it to react with
"ERROR: corrupted BRIN index" to the bug at hand, rather than a hard
crash.

Backpatch to 9.5.

Bug reported by Andreas Seltenreich as detected by his handy sqlsmith
fuzzer.
Discussion: https://www.postgresql.org/message-id/87mvni77jh.fsf@elite.ansel.ydns.eu
2016-05-30 14:47:22 -04:00
Kevin Grittner a343e223a5 Revert no-op changes to BufferGetPage()
The reverted changes were intended to force a choice of whether any
newly-added BufferGetPage() calls needed to be accompanied by a
test of the snapshot age, to support the "snapshot too old"
feature.  Such an accompanying test is needed in about 7% of the
cases, where the page is being used as part of a scan rather than
positioning for other purposes (such as DML or vacuuming).  The
additional effort required for back-patching, and the doubt whether
the intended benefit would really be there, have indicated it is
best just to rely on developers to do the right thing based on
comments and existing usage, as we do with many other conventions.

This change should have little or no effect on generated executable
code.

Motivated by the back-patching pain of Tom Lane and Robert Haas
2016-04-20 08:31:19 -05:00
Kevin Grittner 848ef42bb8 Add the "snapshot too old" feature
This feature is controlled by a new old_snapshot_threshold GUC.  A
value of -1 disables the feature, and that is the default.  The
value of 0 is just intended for testing.  Above that it is the
number of minutes a snapshot can reach before pruning and vacuum
are allowed to remove dead tuples which the snapshot would
otherwise protect.  The xmin associated with a transaction ID does
still protect dead tuples.  A connection which is using an "old"
snapshot does not get an error unless it accesses a page modified
recently enough that it might not be able to produce accurate
results.

This is similar to the Oracle feature, and we use the same SQLSTATE
and error message for compatibility.
2016-04-08 14:36:30 -05:00
Kevin Grittner 8b65cf4c5e Modify BufferGetPage() to prepare for "snapshot too old" feature
This patch is a no-op patch which is intended to reduce the chances
of failures of omission once the functional part of the "snapshot
too old" patch goes in.  It adds parameters for snapshot, relation,
and an enum to specify whether the snapshot age check needs to be
done for the page at this point.  This initial patch passes NULL
for the first two new parameters and BGP_NO_SNAPSHOT_TEST for the
third.  The follow-on patch will change the places where the test
needs to be made.
2016-04-08 14:30:10 -05:00
Teodor Sigaev 8b99edefca Revert CREATE INDEX ... INCLUDING ...
It's not ready yet, revert two commits
690c543550 - unstable test output
386e3d7609 - patch itself
2016-04-08 21:52:13 +03:00
Teodor Sigaev 386e3d7609 CREATE INDEX ... INCLUDING (column[, ...])
Now indexes (but only B-tree for now) can contain "extra" column(s) which
doesn't participate in index structure, they are just stored in leaf
tuples. It allows to use index only scan by using single index instead
of two or more indexes.

Author: Anastasia Lubennikova with minor editorializing by me
Reviewers: David Rowley, Peter Geoghegan, Jeff Janes
2016-04-08 19:45:59 +03:00
Tom Lane be44ed27b8 Improve index AMs' opclass validation procedures.
The amvalidate functions added in commit 65c5fcd353 were on the
crude side.  Improve them in a few ways:

* Perform signature checking for operators and support functions.

* Apply more thorough checks for missing operators and functions,
where possible.

* Instead of reporting problems as ERRORs, report most problems as INFO
messages and make the amvalidate function return FALSE.  This allows
more than one problem to be discovered per run.

* Report object names rather than OIDs, and work a bit harder on making
the messages understandable.

Also, remove a few more opr_sanity regression test queries that are
now superseded by the amvalidate checks.
2016-01-21 19:47:15 -05:00
Tom Lane 65c5fcd353 Restructure index access method API to hide most of it at the C level.
This patch reduces pg_am to just two columns, a name and a handler
function.  All the data formerly obtained from pg_am is now provided
in a C struct returned by the handler function.  This is similar to
the designs we've adopted for FDWs and tablesample methods.  There
are multiple advantages.  For one, the index AM's support functions
are now simple C functions, making them faster to call and much less
error-prone, since the C compiler can now check function signatures.
For another, this will make it far more practical to define index access
methods in installable extensions.

A disadvantage is that SQL-level code can no longer see attributes
of index AMs; in particular, some of the crosschecks in the opr_sanity
regression test are no longer possible from SQL.  We've addressed that
by adding a facility for the index AM to perform such checks instead.
(Much more could be done in that line, but for now we're content if the
amvalidate functions more or less replace what opr_sanity used to do.)
We might also want to expose some sort of reporting functionality, but
this patch doesn't do that.

Alexander Korotkov, reviewed by Petr Jelínek, and rather heavily
editorialized on by me.
2016-01-17 19:36:59 -05:00
Bruce Momjian ee94300446 Update copyright for 2016
Backpatch certain files through 9.1
2016-01-02 13:33:40 -05:00
Tom Lane 3d2b31e30e Fix brin_summarize_new_values() to check index type and ownership.
brin_summarize_new_values() did not check that the passed OID was for
an index at all, much less that it was a BRIN index, and would fail in
obscure ways if it wasn't (possibly damaging data first?).  It also
lacked any permissions test; by analogy to VACUUM, we should only allow
the table's owner to summarize.

Noted by Jeff Janes, fix by Michael Paquier and me
2015-12-26 12:56:09 -05:00
Alvaro Herrera 21a4e4a4c9 Fix BRIN free space computations
A bug in the original free space computation made it possible to
return a page which wasn't actually able to fit the item.  Since the
insertion code isn't prepared to deal with PageAddItem failing, a PANIC
resulted ("failed to add BRIN tuple [to new page]").  Add a macro to
encapsulate the correct computation, and use it in
brin_getinsertbuffer's callers before calling that routine, to raise an
early error.

I became aware of the possiblity of a problem in this area while working
on ccc4c07499.  There's no archived discussion about it, but it's
easy to reproduce a problem in the unpatched code with something like

CREATE TABLE t (a text);
CREATE INDEX ti ON t USING brin (a) WITH (pages_per_range=1);

for length in `seq 8000 8196`
do
	psql -f - <<EOF
TRUNCATE TABLE t;
INSERT INTO t VALUES ('z'), (repeat('a', $length));
EOF
done

Backpatch to 9.5, where BRIN was introduced.
2015-10-27 18:17:55 -03:00
Alvaro Herrera 5cd6538345 Add missing ReleaseBuffer call in BRIN revmap code
I think this particular branch is actually dead, but the analysis to
prove that is not trivial, so instead take the weasel way.

Reported by Jinyu Zhang
Backpatch to 9.5, where BRIN was introduced.
2015-09-11 15:29:46 -03:00
Heikki Linnakangas c80b5f66c6 Fix misc typos.
Oskari Saarenmaa. Backpatch to stable branches where applicable.
2015-09-05 11:35:49 +03:00
Tatsuo Ishii c39f5674df Fix brin index summarizing while vacuuming.
If the number of heap blocks is not multiples of pages per range, the
summarizing produces wrong summary information for the last brin index
tuple while vacuuming.

Problem reported by Tatsuo Ishii and fixed by Amit Langote.

Discussion at "[HACKERS] BRIN INDEX value (message id :20150903.174935.1946402199422994347.t-ishii@sraoss.co.jp)
Backpatched to 9.5 in which brin index was added.
2015-09-05 09:19:25 +09:00
Alvaro Herrera fcbf455842 Fix unitialized variables
As complained by clang, reported by Andres Freund.  Brown paper bag bug
in ccc4c07499.

Add some comments, too.

Backpatch to 9.5, like that one.
2015-08-13 00:12:07 -03:00
Alvaro Herrera ccc4c07499 Close some holes in BRIN page assignment
In some corner cases, it is possible for the BRIN index relation to be
extended by brin_getinsertbuffer but the new page not be used
immediately for anything by its callers; when this happens, the page is
initialized and the FSM is updated (by brin_getinsertbuffer) with the
info about that page, but these actions are not WAL-logged.  A later
index insert/update can use the page, but since the page is already
initialized, the initialization itself is not WAL-logged then either.
Replay of this sequence of events causes recovery to fail altogether.

There is a related corner case within brin_getinsertbuffer itself, in
which we extend the relation to put a new index tuple there, but later
find out that we cannot do so, and do not return the buffer; the page
obtained from extension is not even initialized.  The resulting page is
lost forever.

To fix, shuffle the code so that initialization is not the
responsibility of brin_getinsertbuffer anymore, in normal cases;
instead, the initialization is done by its callers (brin_doinsert and
brin_doupdate) once they're certain that the page is going to be used.
When either those functions determine that the new page cannot be used,
before bailing out they initialize the page as an empty regular page,
enter it in FSM and WAL-log all this.  This way, the page is usable for
future index insertions, and WAL replay doesn't find trying to insert
tuples in pages whose initialization didn't make it to the WAL.  The
same strategy is used in brin_getinsertbuffer when it cannot return the
new page.

Additionally, add a new step to vacuuming so that all pages of the index
are scanned; whenever an uninitialized page is found, it is initialized
as empty and WAL-logged.  This closes the hole that the relation is
extended but the system crashes before anything is WAL-logged about it.
We also take this opportunity to update the FSM, in case it has gotten
out of date.

Thanks to Heikki Linnakangas for finding the problem that kicked some
additional analysis of BRIN page assignment code.

Backpatch to 9.5, where BRIN was introduced.

Discussion: https://www.postgresql.org/message-id/20150723204810.GY5596@postgresql.org
2015-08-12 14:20:38 -03:00
Alvaro Herrera 2834855cb9 Fix BRIN to use SnapshotAny during summarization
For correctness of summarization results, it is critical that the
snapshot used during the summarization scan is able to see all tuples
that are live to all transactions -- including tuples inserted or
deleted by in-progress transactions.  Otherwise, it would be possible
for a transaction to insert a tuple, then idle for a long time while a
concurrent transaction executes summarization of the range: this would
result in the inserted value not being considered in the summary.
Previously we were trying to use a MVCC snapshot in conjunction with
adding a "placeholder" tuple in the index: the snapshot would see all
committed tuples, and the placeholder tuple would catch insertions by
any new inserters.  The hole is that prior insertions by transactions
that are still in progress by the time the MVCC snapshot was taken were
ignored.

Kevin Grittner reported this as a bogus error message during vacuum with
default transaction isolation mode set to repeatable read (because the
error report mentioned a function name not being invoked during), but
the problem is larger than that.

To fix, tweak IndexBuildHeapRangeScan to have a new mode that behaves
the way we need using SnapshotAny visibility rules.  This change
simplifies the BRIN code a bit, mainly by removing large comments that
were mistaken.  Instead, rely on the SnapshotAny semantics to provide
what it needs.  (The business about a placeholder tuple needs to remain:
that covers the case that a transaction inserts a a tuple in a page that
summarization already scanned.)

Discussion: https://www.postgresql.org/message-id/20150731175700.GX2441@postgresql.org

In passing, remove a couple of unused declarations from brin.h and
reword a comment to be proper English.  This part submitted by Kevin
Grittner.

Backpatch to 9.5, where BRIN was introduced.
2015-08-05 16:20:50 -03:00
Alvaro Herrera c81276241b Fix broken assertion in BRIN code
The code was assuming that any NULL value in scan keys was due to IS
NULL or IS NOT NULL, but it turns out to be possible to get them with
other operators too, if they are used in contrived-enough ways.  Easiest
way out of the problem seems to check explicitely for the IS NOT NULL
flag, instead of assuming it must be set if the IS NULL flag is not set,
when a null scan key is found; if neither flag is set, follow the lead
of other index AMs and assume that all indexable operators must be
strict, and thus the query is never satisfiable.

Also, add a comment to try and lure some future hacker into improving
analysis of scan keys in brin.

Per report from Andreas Seltenreich; diagnosis by Tom Lane.
Backpatch to 9.5.

Discussion: http://www.postgresql.org/message-id/20646.1437919632@sss.pgh.pa.us
2015-07-30 15:07:19 -03:00
Alvaro Herrera 8d90736924 Improve BRIN documentation somewhat
This removes some info about support procedures being used, which was
obsoleted by commit db5f98ab4f, as well as add some more documentation
on how to create new opclasses using the Minmax infrastructure.
(Hopefully we can get something similar for Inclusion as well.)

In passing, fix some obsolete mentions of "mmtuples" in source code
comments.

Backpatch to 9.5, where BRIN was introduced.
2015-07-20 12:16:40 +02:00
Alvaro Herrera 4028222468 Fix BRIN xlog replay
There was a confusion about which block number to use when storing an
item's pointer in the revmap -- the revmap page's blkno was being used,
not the data page's blkno.

Spotted-by: Jeff Janes
2015-06-26 18:13:05 -03:00
Tom Lane 79f2b5d583 Fix valgrind's "unaddressable bytes" whining about BRIN code.
brin_form_tuple calculated an exact tuple size, then palloc'd and
filled just that much.  Later, brin_doinsert or brin_doupdate would
MAXALIGN the tuple size and tell PageAddItem that that was the size
of the tuple to insert.  If the original tuple size wasn't a multiple
of MAXALIGN, the net result would be that PageAddItem would memcpy
a few more bytes than the palloc request had been for.

AFAICS, this is totally harmless in the real world: the error is a
read overrun not a write overrun, and palloc would certainly have
rounded the request up to a MAXALIGN multiple internally, so there's
no chance of the memcpy fetching off the end of memory.  Valgrind,
however, is picky to the byte level not the MAXALIGN level.

Fix it by pushing the MAXALIGN step back to brin_form_tuple.  (The other
possible source of tuples in this code, brin_form_placeholder_tuple,
was already producing a MAXALIGN'd result.)

In passing, be a bit more paranoid about internal allocations in
brin_form_tuple.
2015-05-25 21:56:19 -04:00
Bruce Momjian 807b9e0dff pgindent run for 9.5 2015-05-23 21:35:49 -04:00
Heikki Linnakangas fa60fb63e5 Fix more typos in comments.
Patch by CharSyam, plus a few more I spotted with grep.
2015-05-20 19:45:43 +03:00
Heikki Linnakangas 4fc72cc7bb Collection of typo fixes.
Use "a" and "an" correctly, mostly in comments. Two error messages were
also fixed (they were just elogs, so no translation work required). Two
function comments in pg_proc.h were also fixed. Etsuro Fujita reported one
of these, but I found a lot more with grep.

Also fix a few other typos spotted while grepping for the a/an typos.
For example, "consists out of ..." -> "consists of ...". Plus a "though"/
"through" mixup reported by Euler Taveira.

Many of these typos were in old code, which would be nice to backpatch to
make future backpatching easier. But much of the code was new, and I didn't
feel like crafting separate patches for each branch. So no backpatching.
2015-05-20 16:56:22 +03:00
Alvaro Herrera b0b7be6133 Add BRIN infrastructure for "inclusion" opclasses
This lets BRIN be used with R-Tree-like indexing strategies.

Also provided are operator classes for range types, box and inet/cidr.
The infrastructure provided here should be sufficient to create operator
classes for similar datatypes; for instance, opclasses for PostGIS
geometries should be doable, though we didn't try to implement one.

(A box/point opclass was also submitted, but we ripped it out before
commit because the handling of floating point comparisons in existing
code is inconsistent and would generate corrupt indexes.)

Author: Emre Hasegeli.  Cosmetic changes by me
Review: Andreas Karlsson
2015-05-15 18:05:22 -03:00
Alvaro Herrera 26df7066cc Move strategy numbers to include/access/stratnum.h
For upcoming BRIN opclasses, it's convenient to have strategy numbers
defined in a single place.  Since there's nothing appropriate, create
it.  The StrategyNumber typedef now lives there, as well as existing
strategy numbers for B-trees (from skey.h) and R-tree-and-friends (from
gist.h).  skey.h is forced to include stratnum.h because of the
StrategyNumber typedef, but gist.h is not; extensions that currently
rely on gist.h for rtree strategy numbers might need to add a new

A few .c files can stop including skey.h and/or gist.h, which is a nice
side benefit.

Per discussion:
https://www.postgresql.org/message-id/20150514232132.GZ2523@alvh.no-ip.org

Authored by Emre Hasegeli and Álvaro.

(It's not clear to me why bootscanner.l has any #include lines at all.)
2015-05-15 17:03:16 -03:00
Tom Lane c594c75078 Add missing "static" marker.
Per buildfarm member pademelon.
2015-05-09 23:39:36 -04:00
Alvaro Herrera db5f98ab4f Improve BRIN infra, minmax opclass and regression test
The minmax opclass was using the wrong support functions when
cross-datatypes queries were run.  Instead of trying to fix the
pg_amproc definitions (which apparently is not possible), use the
already correct pg_amop entries instead.  This requires jumping through
more hoops (read: extra syscache lookups) to obtain the underlying
functions to execute, but it is necessary for correctness.

Author: Emre Hasegeli, tweaked by Álvaro
Review: Andreas Karlsson

Also change BrinOpcInfo to record each stored type's typecache entry
instead of just the OID.  Turns out that the full type cache is
necessary in brin_deform_tuple: the original code used the indexed
type's byval and typlen properties to extract the stored tuple, which is
correct in Minmax; but in other implementations that want to store
something different, that's wrong.  The realization that this is a bug
comes from Emre also, but I did not use his patch.

I also adopted Emre's regression test code (with smallish changes),
which is more complete.
2015-05-07 13:02:22 -03:00
Andres Freund 6aab1f45ac Fix various typos and grammar errors in comments.
Author: Dmitriy Olshevskiy
Discussion: 553D00A6.4090205@bk.ru
2015-04-26 18:42:31 +02:00
Alvaro Herrera e491bd2ee3 Move BRIN page type to page's last two bytes
... which is the usual convention among AMs, so that pg_filedump and
similar utilities can tell apart pages of different AMs.  It was also
the intent of the original code, but I failed to realize that alignment
considerations would move the whole thing to the previous-to-last word
in the page.

The new definition of the associated macro makes surrounding code a bit
leaner, too.

Per note from Heikki at
http://www.postgresql.org/message-id/546A16EF.9070005@vmware.com
2015-03-10 12:27:15 -03:00
Alvaro Herrera 972bf7d6f1 Tweak BRIN minmax operator class
In the union support proc, we were not checking the hasnulls flag of
value A early enough, so it could be skipped if the "allnulls" flag in
value B is set.  Also, a check on the allnulls flag of value "B" was
redundant, so remove it.

Also change inet_minmax_ops to not be the default opclass for type inet,
as a future inclusion operator class would be more useful and it's
pretty difficult to change default opclass for a datatype later on.
(There is no catversion bump for this catalog change; this shouldn't be
a problem.)

Extracted from a larger patch to add an "inclusion" operator class.

Author: Emre Hasegeli
2015-01-22 17:01:09 -03:00
Robert Haas 9d54b93239 BRIN typo fix.
Amit Langote
2015-01-19 08:34:29 -05:00
Bruce Momjian 4baaf863ec Update copyright for 2015
Backpatch certain files through 9.0
2015-01-06 11:43:47 -05:00
Tom Lane 1511521a36 Minor cleanup of function declarations for BRIN.
Get rid of PG_FUNCTION_INFO_V1() macros, which are quite inappropriate
for built-in functions (possibly leftovers from testing as a loadable
module?).  Also, fix gratuitous inconsistency between SQL-level and
C-level names of the minmax support functions.
2014-12-02 14:07:54 -05:00
Heikki Linnakangas 2c03216d83 Revamp the WAL record format.
Each WAL record now carries information about the modified relation and
block(s) in a standardized format. That makes it easier to write tools that
need that information, like pg_rewind, prefetching the blocks to speed up
recovery, etc.

There's a whole new API for building WAL records, replacing the XLogRecData
chains used previously. The new API consists of XLogRegister* functions,
which are called for each buffer and chunk of data that is added to the
record. The new API also gives more control over when a full-page image is
written, by passing flags to the XLogRegisterBuffer function.

This also simplifies the XLogReadBufferForRedo() calls. The function can dig
the relation and block number from the WAL record, so they no longer need to
be passed as arguments.

For the convenience of redo routines, XLogReader now disects each WAL record
after reading it, copying the main data part and the per-block data into
MAXALIGNed buffers. The data chunks are not aligned within the WAL record,
but the redo routines can assume that the pointers returned by XLogRecGet*
functions are. Redo routines are now passed the XLogReaderState, which
contains the record in the already-disected format, instead of the plain
XLogRecord.

The new record format also makes the fixed size XLogRecord header smaller,
by removing the xl_len field. The length of the "main data" portion is now
stored at the end of the WAL record, and there's a separate header after
XLogRecord for it. The alignment padding at the end of XLogRecord is also
removed. This compansates for the fact that the new format would otherwise
be more bulky than the old format.

Reviewed by Andres Freund, Amit Kapila, Michael Paquier, Alvaro Herrera,
Fujii Masao.
2014-11-20 18:46:41 +02:00
Alvaro Herrera a590f266e4 BRIN: fix bug in xlog backup block counting
The code that generates the BRIN_XLOG_UPDATE removes the buffer
reference when the page that's target for the updated tuple is freshly
initialized.  This is a pretty usual optimization, but was breaking the
case where the revmap buffer, which is referenced in the same WAL
record, is getting a backup block: the replay code was using backup
block index 1, which is not valid when the update target buffer gets
pruned; the revmap buffer gets assigned 0 instead.  Make sure to use the
correct backup block index for revmap when replaying.

Bug reported by Fujii Masao.
2014-11-10 18:13:49 -03:00
Alvaro Herrera 1e0b4365c2 Further code and wording tweaks in BRIN
Besides a couple of typo fixes, per David Rowley, Thom Brown, and Amit
Langote, and mentions of BRIN in the general CREATE INDEX page again per
David, this includes silencing MSVC compiler warnings (thanks Microsoft)
and an additional variable initialization per Coverity scanner.
2014-11-10 15:56:08 -03:00
Kevin Grittner 96a73fcdac Fix compiler warning for non-assert builds.
Reported by Peter Geoghegan
David Rowley
2014-11-10 09:42:46 -06:00
Alvaro Herrera b89ee54e20 Fix some coding issues in BRIN
Reported by David Rowley: variadic macros are a problem.  Get rid of
them using a trick suggested by Tom Lane: add extra parentheses where
needed.  In the future we might decide we don't need the calls at all
and remove them, but it seems appropriate to keep them while this code
is still new.

Also from David Rowley: brininsert() was trying to use a variable before
initializing it.  Fix by moving the brin_form_tuple call (which
initializes the variable) to within the locked section.

Reported by Peter Eisentraut: can't use "new" as a struct member name,
because C++ compilers will choke on it, as reported by cpluspluscheck.
2014-11-08 00:31:03 -03:00
Alvaro Herrera 7516f52594 BRIN: Block Range Indexes
BRIN is a new index access method intended to accelerate scans of very
large tables, without the maintenance overhead of btrees or other
traditional indexes.  They work by maintaining "summary" data about
block ranges.  Bitmap index scans work by reading each summary tuple and
comparing them with the query quals; all pages in the range are returned
in a lossy TID bitmap if the quals are consistent with the values in the
summary tuple, otherwise not.  Normal index scans are not supported
because these indexes do not store TIDs.

As new tuples are added into the index, the summary information is
updated (if the block range in which the tuple is added is already
summarized) or not; in the latter case, a subsequent pass of VACUUM or
the brin_summarize_new_values() function will create the summary
information.

For data types with natural 1-D sort orders, the summary info consists
of the maximum and the minimum values of each indexed column within each
page range.  This type of operator class we call "Minmax", and we
supply a bunch of them for most data types with B-tree opclasses.
Since the BRIN code is generalized, other approaches are possible for
things such as arrays, geometric types, ranges, etc; even for things
such as enum types we could do something different than minmax with
better results.  In this commit I only include minmax.

Catalog version bumped due to new builtin catalog entries.

There's more that could be done here, but this is a good step forwards.

Loosely based on ideas from Simon Riggs; code mostly by Álvaro Herrera,
with contribution by Heikki Linnakangas.

Patch reviewed by: Amit Kapila, Heikki Linnakangas, Robert Haas.
Testing help from Jeff Janes, Erik Rijkers, Emanuel Calvo.

PS:
  The research leading to these results has received funding from the
  European Union's Seventh Framework Programme (FP7/2007-2013) under
  grant agreement n° 318633.
2014-11-07 16:38:14 -03:00