Commit Graph

903 Commits

Author SHA1 Message Date
Tomas Vondra c1ec02be1d Reuse BrinDesc and BrinRevmap in brininsert
The brininsert code used to initialize (and destroy) BrinDesc and
BrinRevmap for each tuple, which is not free. This patch initializes
these structures only once, and reuses them for all inserts in the same
command. The data is passed through indexInfo->ii_AmCache.

This also introduces an optional AM callback "aminsertcleanup" that
allows performing custom cleanup in case simply pfree-ing ii_AmCache is
not sufficient (which is the case when the cache contains TupleDesc,
Buffers, and so on).

Author: Soumyadeep Chakraborty
Reviewed-by: Alvaro Herrera, Matthias van de Meent, Tomas Vondra
Discussion: https://postgr.es/m/CAE-ML%2B9r2%3DaO1wwji1sBN9gvPz2xRAtFUGfnffpd0ZqyuzjamA%40mail.gmail.com
2023-11-25 20:27:28 +01:00
Peter Eisentraut 611806cd72 Add trailing commas to enum definitions
Since C99, there can be a trailing comma after the last value in an
enum definition.  A lot of new code has been introducing this style on
the fly.  Some new patches are now taking an inconsistent approach to
this.  Some add the last comma on the fly if they add a new last
value, some are trying to preserve the existing style in each place,
some are even dropping the last comma if there was one.  We could
nudge this all in a consistent direction if we just add the trailing
commas everywhere once.

I omitted a few places where there was a fixed "last" value that will
always stay last.  I also skipped the header files of libpq and ecpg,
in case people want to use those with older compilers.  There were
also a small number of cases where the enum type wasn't used anywhere
(but the enum values were), which ended up confusing pgindent a bit,
so I left those alone.

Discussion: https://www.postgresql.org/message-id/flat/386f8c45-c8ac-4681-8add-e3b0852c1620%40eisentraut.org
2023-10-26 09:20:54 +02:00
Amit Kapila 7cc2f59dd5 Remove duplicate words in docs and code comments.
Additionally, add a missing "the" in a couple of places.

Author: Vignesh C, Dagfinn Ilmari Mannsåker
Discussion: http://postgr.es/m/CALDaNm28t+wWyPfuyqEaARS810Je=dRFkaPertaLAEJYY2cWYQ@mail.gmail.com
2023-10-09 09:18:47 +05:30
Alexander Korotkov 82a7132f53 Fix another typo in e0b1ee17dc
Reported-by: Richard Guo
Discussion: https://postgr.es/m/CAMbWs4_kHMJDak75y1kBTirv-drS1-knT-7Mpg5LprAjqRJDVA%40mail.gmail.com
2023-10-07 20:36:47 +03:00
Alexander Korotkov e8c334c47a Fix typos in e0b1ee17dc
Reported-by: Alexander Lakhin
2023-10-07 11:55:55 +03:00
Alexander Korotkov e0b1ee17dc Skip checking of scan keys required for directional scan in B-tree
Currently, B-tree code matches every scan key to every item on the page.
Imagine the ordered B-tree scan for the query like this.

SELECT * FROM tbl WHERE col > 'a' AND col < 'b' ORDER BY col;

The (col > 'a') scan key will be always matched once we find the location to
start the scan.  The (col < 'b') scan key will match every item on the page
as long as it matches the last item on the page.

This patch implements prechecking of the scan keys required for directional
scan on beginning of page scan.  If precheck is successful we can skip this
scan keys check for the items on the page.  That could lead to significant
acceleration especially if the comparison operator is expensive.

Idea from patch by Konstantin Knizhnik.

Discussion: https://postgr.es/m/079c3f8e-3371-abe2-e93c-fc8a0ae3f571%40garret.ru
Reviewed-by: Peter Geoghegan, Pavel Borisov
2023-10-06 10:40:51 +03:00
Peter Geoghegan 714780dcdd Fix btmarkpos/btrestrpos array key wraparound bug.
nbtree's mark/restore processing failed to correctly handle an edge case
involving array key advancement and related search-type scan key state.
Scans with ScalarArrayScalarArrayOpExpr quals requiring mark/restore
processing (for a merge join) could incorrectly conclude that an
affected array/scan key must not have advanced during the time between
marking and restoring the scan's position.

As a result of all this, array key handling within btrestrpos could skip
a required call to _bt_preprocess_keys().  This confusion allowed later
primitive index scans to overlook tuples matching the true current array
keys.  The scan's search-type scan keys would still have spurious values
corresponding to the final array element(s) -- not values matching the
first/now-current array element(s).

To fix, remember that "array key wraparound" has taken place during the
ongoing btrescan in a flag variable stored in the scan's state, and use
that information at the point where btrestrpos decides if another call
to _bt_preprocess_keys is required.

Oversight in commit 70bc5833, which taught nbtree to handle array keys
during mark/restore processing, but missed this subtlety.  That commit
was itself a bug fix for an issue in commit 9e8da0f7, which taught
nbtree to handle ScalarArrayOpExpr quals natively.

Author: Peter Geoghegan <pg@bowt.ie>
Discussion: https://postgr.es/m/CAH2-WzkgP3DDRJxw6DgjCxo-cu-DKrvjEv_ArkP2ctBJatDCYg@mail.gmail.com
Backpatch: 11- (all supported branches).
2023-09-28 16:29:37 -07:00
Thomas Munro 9f0602539d Remove some more "snapshot too old" vestiges.
Commit f691f5b8 removed the logic, but left behind some now-useless
Snapshot arguments to various AM-internal functions, and missed a couple
of comments.

Reported-by: Peter Geoghegan <pg@bowt.ie>
Discussion: https://postgr.es/m/CAH2-Wznj9qSNXZ1P1uWTUD_FeaTezbUazb416EPwi4Qr_jR_6A%40mail.gmail.com
2023-09-08 17:12:12 +12:00
Thomas Munro f691f5b80a Remove the "snapshot too old" feature.
Remove the old_snapshot_threshold setting and mechanism for producing
the error "snapshot too old", originally added by commit 848ef42b.
Unfortunately it had a number of known problems in terms of correctness
and performance, mostly reported by Andres in the course of his work on
snapshot scalability.  We agreed to remove it, after a long period
without an active plan to fix it.

This is certainly a desirable feature, and someone might propose a new
or improved implementation in the future.

Reported-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CACG%3DezYV%2BEvO135fLRdVn-ZusfVsTY6cH1OZqWtezuEYH6ciQA%40mail.gmail.com
Discussion: https://postgr.es/m/20200401064008.qob7bfnnbu4w5cw4%40alap3.anarazel.de
Discussion: https://postgr.es/m/CA%2BTgmoY%3Daqf0zjTD%2B3dUWYkgMiNDegDLFjo%2B6ze%3DWtpik%2B3XqA%40mail.gmail.com
2023-09-05 19:53:43 +12:00
Heikki Linnakangas bf8bf6d0bd Fix _bt_allequalimage() call within critical section.
_bt_allequalimage() does complicated things, so it's not OK to call it
in a critical section. Per buildfarm failure on 'prion', which uses
-DRELCACHE_FORCE_RELEASE -DCATCACHE_FORCE_RELEASE options.

Discussion: https://www.postgresql.org/message-id/6e5bbc08-cdfc-b2b3-9e23-1a914b9850a9@iki.fi
Backpatch-through: 16, like commit ccadf73163 that introduced this
2023-08-23 18:12:41 +03:00
Heikki Linnakangas ccadf73163 Use the buffer cache when initializing an unlogged index.
Some of the ambuildempty functions used smgrwrite() directly, followed
by smgrimmedsync(). A few small problems with that:

Firstly, one is supposed to use smgrextend() when extending a
relation, not smgrwrite(). It doesn't make much difference in
production builds. smgrextend() updates the relation size cache, so
you miss that, but that's harmless because we never use the cached
relation size of an init fork. But if you compile with
CHECK_WRITE_VS_EXTEND, you get an assertion failure.

Secondly, the smgrwrite() calls were performed before WAL-logging, so
the page image written to disk had 0/0 as the LSN, not the LSN of the
WAL record. That's also harmless in practice, but seems sloppy.

Thirdly, it's better to use the buffer cache, because then you don't
need to smgrimmedsync() the relation to disk, which adds latency.
Bypassing the cache makes sense for bulk operations like index
creation, but not when you're just initializing an empty index.
Creation of unlogged tables is hardly performance bottleneck in any
real world applications, but nevertheless.

Backpatch to v16, but no further. These issues should be harmless in
practice, so better to not rock the boat in older branches.

Reviewed-by: Robert Haas
Discussion: https://www.postgresql.org/message-id/6e5bbc08-cdfc-b2b3-9e23-1a914b9850a9@iki.fi
2023-08-23 17:21:31 +03:00
Thomas Munro 7114791158 ExtendBufferedWhat -> BufferManagerRelation.
Commit 31966b15 invented a way for functions dealing with relation
extension to accept a Relation in online code and an SMgrRelation in
recovery code.  It seems highly likely that future bufmgr.c interfaces
will face the same problem, and need to do something similar.
Generalize the names so that each interface doesn't have to re-invent
the wheel.

Back-patch to 16.  Since extension AM authors might start using the
constructor macros once 16 ships, we agreed to do the rename in 16
rather than waiting for 17.

Reviewed-by: Peter Geoghegan <pg@bowt.ie>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CA%2BhUKG%2B6tLD2BhpRWycEoti6LVLyQq457UL4ticP5xd8LqHySA%40mail.gmail.com
2023-08-23 12:31:23 +12:00
Thomas Munro f9b7fc651a Fix race in SSI interaction with empty btrees.
When predicate-locking btrees, we have a special case for completely
empty btrees, since there is no page to lock.  This was racy, because,
without buffer lock held, a matching key could be inserted between the
_bt_search() and the PredicateLockRelation() calls.

Fix, by rechecking _bt_search() after taking the relation-level SIREAD
lock, if using SERIALIZABLE isolation and an empty btree is discovered.

Back-patch to all supported releases.  Fixes one aspect of bug #17949.

Reported-by: Artem Anisimov <artem.anisimov.255@gmail.com>
Reviewed-by: Dmitry Dolgov <9erthalion6@gmail.com>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://postgr.es/m/17949-a0f17035294a55e2%40postgresql.org
2023-07-04 09:07:31 +12:00
Peter Geoghegan 5f0762f147 nbtree VACUUM: cope with topparent inconsistencies.
Avoid "right sibling %u of block %u is not next child" errors when
vacuuming a corrupt nbtree index.  Just LOG the issue and press on.
That way VACUUM will have a decent chance of finishing off all required
processing for the index (and for the table as a whole).

This is similar to recent work from commit 5abff197, as well as work
from commit 5b861baa (later backpatched as commit 43e409ce), which
taught nbtree VACUUM to keep going when its "re-find" check fails.  The
hardening added by this commit takes place directly after the "re-find"
check, right before the critical section for the first stage of page
deletion.

Author: Peter Geoghegan <pg@bowt.ie>
Discussion: https://postgr.es/m/CAH2-Wz=dayg0vjs4+er84TS9ami=csdzjpuiCGbEw=idhwqhzQ@mail.gmail.com
Backpatch: 11- (all supported versions).
2023-06-21 17:41:58 -07:00
Peter Geoghegan d088ba5a5a nbtree: Allocate new pages in separate function.
Split nbtree's _bt_getbuf function is two: code that read locks or write
locks existing pages remains in _bt_getbuf, while code that deals with
allocating new pages is moved to a new, dedicated function called
_bt_allocbuf.  This simplifies most _bt_getbuf callers, since it is no
longer necessary for them to pass a heaprel argument.  Many of the
changes to nbtree from commit 61b313e4 can be reverted.  This minimizes
the divergence between HEAD/PostgreSQL 16 and earlier release branches.

_bt_allocbuf replaces the previous nbtree idiom of passing P_NEW to
_bt_getbuf.  There are only 3 affected call sites, all of which continue
to pass a heaprel for recovery conflict purposes.  Note that nbtree's
use of P_NEW was superficial; nbtree never actually relied on the P_NEW
code paths in bufmgr.c, so this change is strictly mechanical.

GiST already took the same approach; it has a dedicated function for
allocating new pages called gistNewBuffer().  That factor allowed commit
61b313e4 to make much more targeted changes to GiST.

Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://postgr.es/m/CAH2-Wz=8Z9qY58bjm_7TAHgtW6RzZ5Ke62q5emdCEy9BAzwhmg@mail.gmail.com
2023-06-10 14:08:25 -07:00
Nathan Bossart 9aee26a491 Fix missing word in nbtree/README.
Reported-by: Daniel Westermann
Author: Gurjeet Singh
Reviewed-by: Richard Guo
Discussion: https://postgr.es/m/ZR0P278MB0427F0E0CE4ED140F52D1923D250A%40ZR0P278MB0427.CHEP278.PROD.OUTLOOK.COM
2023-06-08 21:20:24 -07:00
Peter Geoghegan 5abff197cc nbtree VACUUM: cope with right sibling link corruption.
Avoid "right sibling's left-link doesn't match" errors when vacuuming a
corrupt nbtree index.  Just LOG the issue and press on.  That way VACUUM
will have a decent chance of finishing off all required processing for
the index (and for the table as a whole).

This error was seen in the field from time to time (it's more than a
theoretical risk), so giving VACUUM the ability to press on like this
has real value.  Nothing short of a REINDEX is expected to fix the
underlying index corruption, so giving up (by throwing an error) risks
making a bad situation far worse.  Anything that blocks forward progress
by VACUUM like this might go unnoticed for a long time.  This could
eventually lead to a wraparound/xidStopLimit outage.

Note that _bt_unlink_halfdead_page() has always been able to bail on
page deletion when the target page's left sibling page was in an
inconsistent state.  It now does the same thing (returns false to back
out of the second phase of deletion) when it notices sibling link
corruption in the target page's right sibling page.

This is similar to the work from commit 5b861baa (later backpatched as
commit 43e409ce), which taught nbtree to press on with vacuuming an
index when page deletion fails to "re-find" a downlink in the target
page's parent page.  The "re-find" check seems to make VACUUM bail on
page deletion more often in practice, but there is no reason to take any
chances here.

Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://postgr.es/m/CAH2-Wzko2q2kP1+UvgJyP9g0mF4hopK0NtQZcxwvMv9_ytGhkQ@mail.gmail.com
Backpatch: 11- (all supported versions).
2023-05-25 15:33:00 -07:00
Tom Lane 0245f8db36 Pre-beta mechanical code beautification.
Run pgindent, pgperltidy, and reformat-dat-files.

This set of diffs is a bit larger than typical.  We've updated to
pg_bsd_indent 2.1.2, which properly indents variable declarations that
have multi-line initialization expressions (the continuation lines are
now indented one tab stop).  We've also updated to perltidy version
20230309 and changed some of its settings, which reduces its desire to
add whitespace to lines to make assignments etc. line up.  Going
forward, that should make for fewer random-seeming changes to existing
code.

Discussion: https://postgr.es/m/20230428092545.qfb3y5wcu4cm75ur@alvherre.pgsql
2023-05-19 17:24:48 -04:00
Michael Paquier 8961cb9a03 Fix typos in comments
The changes done in this commit impact comments with no direct
user-visible changes, with fixes for incorrect function, variable or
structure names.

Author: Alexander Lakhin
Discussion: https://postgr.es/m/e8c38840-596a-83d6-bd8d-cebc51111572@gmail.com
2023-05-02 12:23:08 +09:00
Peter Geoghegan 06e0652750 Remove useless argument from nbtree dedup function.
_bt_dedup_pass()'s heapRel argument hasn't been needed or used since
commit cf2acaf4dc made deleting any existing LP_DEAD index tuples the
caller's responsibility.
2023-04-18 10:33:15 -07:00
Andres Freund 26669757b6 Handle logical slot conflicts on standby
During WAL replay on the standby, when a conflict with a logical slot is
identified, invalidate such slots. There are two sources of conflicts:
1) Using the information added in 6af1793954, logical slots are invalidated if
   required rows are removed
2) wal_level on the primary server is reduced to below logical

Uses the infrastructure introduced in the prior commit. FIXME: add commit
reference.

Change InvalidatePossiblyObsoleteSlot() to use a recovery conflict to
interrupt use of a slot, if called in the startup process. The new recovery
conflict is added to pg_stat_database_conflicts, as confl_active_logicalslot.

See 6af1793954 for an overall design of logical decoding on a standby.

Bumps catversion for the addition of the pg_stat_database_conflicts column.
Bumps PGSTAT_FILE_FORMAT_ID for the same reason.

Author: "Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com>
Author: Andres Freund <andres@anarazel.de>
Author: Amit Khandekar <amitdkhan.pg@gmail.com> (in an older version)
Reviewed-by: "Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Fabrízio de Royes Mello <fabriziomello@gmail.com>
Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m/20230407075009.igg7be27ha2htkbt@awork3.anarazel.de
2023-04-08 00:05:44 -07:00
Thomas Munro faeedbcefd Introduce PG_IO_ALIGN_SIZE and align all I/O buffers.
In order to have the option to use O_DIRECT/FILE_FLAG_NO_BUFFERING in a
later commit, we need the addresses of user space buffers to be well
aligned.  The exact requirements vary by OS and file system (typically
sectors and/or memory pages).  The address alignment size is set to
4096, which is enough for currently known systems: it matches modern
sectors and common memory page size.  There is no standard governing
O_DIRECT's requirements so we might eventually have to reconsider this
with more information from the field or future systems.

Aligning I/O buffers on memory pages is also known to improve regular
buffered I/O performance.

Three classes of I/O buffers for regular data pages are adjusted:
(1) Heap buffers are now allocated with the new palloc_aligned() or
MemoryContextAllocAligned() functions introduced by commit 439f6175.
(2) Stack buffers now use a new struct PGIOAlignedBlock to respect
PG_IO_ALIGN_SIZE, if possible with this compiler.  (3) The buffer
pool is also aligned in shared memory.

WAL buffers were already aligned on XLOG_BLCKSZ.  It's possible for
XLOG_BLCKSZ to be configured smaller than PG_IO_ALIGNED_SIZE and thus
for O_DIRECT WAL writes to fail to be well aligned, but that's a
pre-existing condition and will be addressed by a later commit.

BufFiles are not yet addressed (there's no current plan to use O_DIRECT
for those, but they could potentially get some incidental speedup even
in plain buffered I/O operations through better alignment).

If we can't align stack objects suitably using the compiler extensions
we know about, we disable the use of O_DIRECT by setting PG_O_DIRECT to
0.  This avoids the need to consider systems that have O_DIRECT but
can't align stack objects the way we want; such systems could in theory
be supported with more work but we don't currently know of any such
machines, so it's easier to pretend there is no O_DIRECT support
instead.  That's an existing and tested class of system.

Add assertions that all buffers passed into smgrread(), smgrwrite() and
smgrextend() are correctly aligned, unless PG_O_DIRECT is 0 (= stack
alignment tricks may be unavailable) or the block size has been set too
small to allow arrays of buffers to be all aligned.

Author: Thomas Munro <thomas.munro@gmail.com>
Author: Andres Freund <andres@anarazel.de>
Reviewed-by: Justin Pryzby <pryzby@telsasoft.com>
Discussion: https://postgr.es/m/CA+hUKGK1X532hYqJ_MzFWt0n1zt8trz980D79WbjwnT-yYLZpg@mail.gmail.com
2023-04-08 16:34:50 +12:00
Andres Freund acab1b0914 Convert many uses of ReadBuffer[Extended](P_NEW) to ExtendBufferedRel()
A few places are not converted. Some because they are tackled in later
commits (e.g. hio.c, xlogutils.c), some because they are more
complicated (e.g. brin_pageops.c).  Having a few users of ReadBuffer(P_NEW) is
good anyway, to ensure the backward compat path stays working.

Discussion: https://postgr.es/m/20221029025420.eplyow6k7tgu6he3@awork3.anarazel.de
2023-04-05 18:57:29 -07:00
Peter Geoghegan e48c817395 Recycle deleted nbtree pages more aggressively.
Commit 61b313e4 made nbtree consistently pass down a heaprel to low
level routines like _bt_getbuf().  Although this was primarily intended
as preparation for logical decoding on standbys, it also made it easy to
correct an old deficiency in how nbtree VACUUM determines whether or not
it's now safe to recycle deleted pages.

Pass the heaprel to GlobalVisTestFor() in nbtree routines that deal with
recycle safety.  nbtree now makes less pessimistic assumptions about
recycle safety within non-catalog relations.  This enhancement
complements the recycling enhancement added by commit 9dd963ae25.

nbtree remains just as pessimistic as ever when it comes to recycle
safety within indexes on catalog relations.  There is no fundamental
reason why we need to treat catalog relations differently, though.  The
behavioral inconsistency is a consequence of the way that nbtree uses
nextXID values to implement what Lanin and Shasha call "the drain
technique".  Note in particular that it has nothing to do with whether
or not index tuples might still be required for an older MVCC snapshot.

Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/CAH2-WzkaiDxCje0yPuH=3Uh2p1V_2pFGY==xfbZoZu7Ax_NB8g@mail.gmail.com
2023-04-03 11:31:43 -07:00
Andres Freund 6af1793954 Add info in WAL records in preparation for logical slot conflict handling
This commit only implements one prerequisite part for allowing logical
decoding. The commit message contains an explanation of the overall design,
which later commits will refer back to.

Overall design:

1. We want to enable logical decoding on standbys, but replay of WAL
from the primary might remove data that is needed by logical decoding,
causing error(s) on the standby. To prevent those errors, a new replication
conflict scenario needs to be addressed (as much as hot standby does).

2. Our chosen strategy for dealing with this type of replication slot
is to invalidate logical slots for which needed data has been removed.

3. To do this we need the latestRemovedXid for each change, just as we
do for physical replication conflicts, but we also need to know
whether any particular change was to data that logical replication
might access. That way, during WAL replay, we know when there is a risk of
conflict and, if so, if there is a conflict.

4. We can't rely on the standby's relcache entries for this purpose in
any way, because the startup process can't access catalog contents.

5. Therefore every WAL record that potentially removes data from the
index or heap must carry a flag indicating whether or not it is one
that might be accessed during logical decoding.

Why do we need this for logical decoding on standby?

First, let's forget about logical decoding on standby and recall that
on a primary database, any catalog rows that may be needed by a logical
decoding replication slot are not removed.

This is done thanks to the catalog_xmin associated with the logical
replication slot.

But, with logical decoding on standby, in the following cases:

- hot_standby_feedback is off
- hot_standby_feedback is on but there is no a physical slot between
  the primary and the standby. Then, hot_standby_feedback will work,
  but only while the connection is alive (for example a node restart
  would break it)

Then, the primary may delete system catalog rows that could be needed
by the logical decoding on the standby (as it does not know about the
catalog_xmin on the standby).

So, it’s mandatory to identify those rows and invalidate the slots
that may need them if any. Identifying those rows is the purpose of
this commit.

Implementation:

When a WAL replay on standby indicates that a catalog table tuple is
to be deleted by an xid that is greater than a logical slot's
catalog_xmin, then that means the slot's catalog_xmin conflicts with
the xid, and we need to handle the conflict. While subsequent commits
will do the actual conflict handling, this commit adds a new field
isCatalogRel in such WAL records (and a new bit set in the
xl_heap_visible flags field), that is true for catalog tables, so as to
arrange for conflict handling.

The affected WAL records are the ones that already contain the
snapshotConflictHorizon field, namely:

- gistxlogDelete
- gistxlogPageReuse
- xl_hash_vacuum_one_page
- xl_heap_prune
- xl_heap_freeze_page
- xl_heap_visible
- xl_btree_reuse_page
- xl_btree_delete
- spgxlogVacuumRedirect

Due to this new field being added, xl_hash_vacuum_one_page and
gistxlogDelete do now contain the offsets to be deleted as a
FLEXIBLE_ARRAY_MEMBER. This is needed to ensure correct alignment.
It's not needed on the others struct where isCatalogRel has
been added.

This commit just introduces the WAL format changes mentioned above. Handling
the actual conflicts will follow in future commits.

Bumps XLOG_PAGE_MAGIC as the several WAL records are changed.

Author: "Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com>
Author: Andres Freund <andres@anarazel.de> (in an older version)
Author: Amit Khandekar <amitdkhan.pg@gmail.com>  (in an older version)
Reviewed-by: "Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Fabrízio de Royes Mello <fabriziomello@gmail.com>
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
2023-04-02 12:32:19 -07:00
Andres Freund 61b313e47e Pass down table relation into more index relation functions
This is done in preparation for logical decoding on standby, which needs to
include whether visibility affecting WAL records are about a (user) catalog
table. Which is only known for the table, not the indexes.

It's also nice to be able to pass the heap relation to GlobalVisTestFor() in
vacuumRedirectAndPlaceholder().

Author: "Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/21b700c3-eecf-2e05-a699-f8c78dd31ec7@gmail.com
2023-04-01 20:18:29 -07:00
Tomas Vondra 19d8e2308b Ignore BRIN indexes when checking for HOT updates
When determining whether an index update may be skipped by using HOT, we
can ignore attributes indexed by block summarizing indexes without
references to individual tuples that need to be cleaned up.

A new type TU_UpdateIndexes provides a signal to the executor to
determine which indexes to update - no indexes, all indexes, or only the
summarizing indexes.

This also removes rd_indexattr list, and replaces it with rd_attrsvalid
flag. The list was not used anywhere, and a simple flag is sufficient.

This was originally committed as 5753d4ee32, but then got reverted by
e3fcca0d0d because of correctness issues.

Original patch by Josef Simanek, various fixes and improvements by Tomas
Vondra and me.

Authors: Matthias van de Meent, Josef Simanek, Tomas Vondra
Reviewed-by: Tomas Vondra, Alvaro Herrera
Discussion: https://postgr.es/m/05ebcb44-f383-86e3-4f31-0a97a55634cf@enterprisedb.com
Discussion: https://postgr.es/m/CAFp7QwpMRGcDAQumN7onN9HjrJ3u4X3ZRXdGFT0K5G2JWvnbWg%40mail.gmail.com
2023-03-20 11:02:42 +01:00
Peter Eisentraut b9f0e54bc9 Update types in smgr API
Change data buffer to void *, from char *, and add const where
appropriate.  This makes it match the File API (see also
2d4f1ba6cf) and stdio.

Discussion: https://www.postgresql.org/message-id/flat/11dda853-bb5b-59ba-a746-e168b1ce4bdb%40enterprisedb.com
2023-02-27 07:47:46 +01:00
Peter Eisentraut aa69541046 Remove useless casts to (void *) in arguments of some system functions
The affected functions are: bsearch, memcmp, memcpy, memset, memmove,
qsort, repalloc

Reviewed-by: Corey Huinker <corey.huinker@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/fd9adf5d-b1aa-e82f-e4c7-263c30145807%40enterprisedb.com
2023-02-07 06:57:59 +01:00
Bruce Momjian c8e1ba736b Update copyright for 2023
Backpatch-through: 11
2023-01-02 15:00:37 -05:00
Andrew Dunstan 8284cf5f74 Add copyright notices to meson files
Discussion: https://postgr.es/m/222b43a5-2fb3-2c1b-9cd0-375d376c8246@dunslane.net
2022-12-20 07:54:39 -05:00
Peter Eisentraut 75f49221c2 Static assertions cleanup
Because we added StaticAssertStmt() first before StaticAssertDecl(),
some uses as well as the instructions in c.h are now a bit backwards
from the "native" way static assertions are meant to be used in C.
This updates the guidance and moves some static assertions to better
places.

Specifically, since the addition of StaticAssertDecl(), we can put
static assertions at the file level.  This moves a number of static
assertions out of function bodies, where they might have been stuck
out of necessity, to perhaps better places at the file level or in
header files.

Also, when the static assertion appears in a position where a
declaration is allowed, then using StaticAssertDecl() is more native
than StaticAssertStmt().

Reviewed-by: John Naylor <john.naylor@enterprisedb.com>
Discussion: https://www.postgresql.org/message-id/flat/941a04e7-dd6f-c0e4-8cdf-a33b3338cbda%40enterprisedb.com
2022-12-15 10:10:32 +01:00
Peter Geoghegan 1489b1ce72 Standardize rmgrdesc recovery conflict XID output.
Standardize on the name snapshotConflictHorizon for all XID fields from
WAL records that generate recovery conflicts when in hot standby mode.
This supersedes the previous latestRemovedXid naming convention.

The new naming convention places emphasis on how the values are actually
used by REDO routines.  How the values are generated during original
execution (details of which vary by record type) is deemphasized.  Users
of tools like pg_waldump can now grep for snapshotConflictHorizon to see
all potential sources of recovery conflicts in a standardized way,
without necessarily having to consider which specific record types might
be involved.

Also bring a couple of WAL record types that didn't follow any kind of
naming convention into line.  These are heapam's VISIBLE record type and
SP-GiST's VACUUM_REDIRECT record type.  Now every WAL record whose REDO
routine calls ResolveRecoveryConflictWithSnapshot() passes through the
snapshotConflictHorizon field from its WAL record.  This is follow-up
work to the refactoring from commit 9e540599 that made FREEZE_PAGE WAL
records use a standard snapshotConflictHorizon style XID cutoff.

No bump in XLOG_PAGE_MAGIC, since the underlying format of affected WAL
records doesn't change.

Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CAH2-Wzm2CQUmViUq7Opgk=McVREHSOorYaAjR1ZpLYkRN7_dPw@mail.gmail.com
2022-11-17 14:55:08 -08:00
Peter Eisentraut b1099eca8f Remove AssertArg and AssertState
These don't offer anything over plain Assert, and their usage had
already been declared obsolescent.

Author: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://www.postgresql.org/message-id/20221009210148.GA900071@nathanxps13
2022-10-28 09:19:06 +02:00
Andres Freund e6927270cd meson: Add initial version of meson based build system
Autoconf is showing its age, fewer and fewer contributors know how to wrangle
it. Recursive make has a lot of hard to resolve dependency issues and slow
incremental rebuilds. Our home-grown MSVC build system is hard to maintain for
developers not using Windows and runs tests serially. While these and other
issues could individually be addressed with incremental improvements, together
they seem best addressed by moving to a more modern build system.

After evaluating different build system choices, we chose to use meson, to a
good degree based on the adoption by other open source projects.

We decided that it's more realistic to commit a relatively early version of
the new build system and mature it in tree.

This commit adds an initial version of a meson based build system. It supports
building postgres on at least AIX, FreeBSD, Linux, macOS, NetBSD, OpenBSD,
Solaris and Windows (however only gcc is supported on aix, solaris). For
Windows/MSVC postgres can now be built with ninja (faster, particularly for
incremental builds) and msbuild (supporting the visual studio GUI, but
building slower).

Several aspects (e.g. Windows rc file generation, PGXS compatibility, LLVM
bitcode generation, documentation adjustments) are done in subsequent commits
requiring further review. Other aspects (e.g. not installing test-only
extensions) are not yet addressed.

When building on Windows with msbuild, builds are slower when using a visual
studio version older than 2019, because those versions do not support
MultiToolTask, required by meson for intra-target parallelism.

The plan is to remove the MSVC specific build system in src/tools/msvc soon
after reaching feature parity. However, we're not planning to remove the
autoconf/make build system in the near future. Likely we're going to keep at
least the parts required for PGXS to keep working around until all supported
versions build with meson.

Some initial help for postgres developers is at
https://wiki.postgresql.org/wiki/Meson

With contributions from Thomas Munro, John Naylor, Stone Tickle and others.

Author: Andres Freund <andres@anarazel.de>
Author: Nazir Bilal Yavuz <byavuz81@gmail.com>
Author: Peter Eisentraut <peter@eisentraut.org>
Reviewed-By: Peter Eisentraut <peter.eisentraut@enterprisedb.com>
Discussion: https://postgr.es/m/20211012083721.hvixq4pnh2pixr3j@alap3.anarazel.de
2022-09-21 22:37:17 -07:00
Peter Geoghegan f68faf4c75 Fix comments about deduplication updating page.
nbtree deduplication passes add tuples from the original/target page to
a temp page, merging as necessary.  The temp page is copied back to the
target permanent page in the critical section.  This is similar to the
approach taken by nbtree page splits.

Adjust comments that referred to updating the original page in-place as
tuples were merged.  These were left over from earlier versions of the
deduplication patch that didn't yet use a temp page.
2022-08-05 14:25:49 -07:00
Peter Geoghegan b2fe783aec Add missing parenthesis to max item size macro.
Oversight in commit 92f37505, per buildfarm.
2022-08-05 13:06:19 -07:00
Robert Haas b0a55e4329 Change internal RelFileNode references to RelFileNumber or RelFileLocator.
We have been using the term RelFileNode to refer to either (1) the
integer that is used to name the sequence of files for a certain relation
within the directory set aside for that tablespace/database combination;
or (2) that value plus the OIDs of the tablespace and database; or
occasionally (3) the whole series of files created for a relation
based on those values. Using the same name for more than one thing is
confusing.

Replace RelFileNode with RelFileNumber when we're talking about just the
single number, i.e. (1) from above, and with RelFileLocator when we're
talking about all the things that are needed to locate a relation's files
on disk, i.e. (2) from above. In the places where we refer to (3) as
a relfilenode, instead refer to "relation storage".

Since there is a ton of SQL code in the world that knows about
pg_class.relfilenode, don't change the name of that column, or of other
SQL-facing things that derive their name from it.

On the other hand, do adjust closely-related internal terminology. For
example, the structure member names dbNode and spcNode appear to be
derived from the fact that the structure itself was called RelFileNode,
so change those to dbOid and spcOid. Likewise, various variables with
names like rnode and relnode get renamed appropriately, according to
how they're being used in context.

Hopefully, this is clearer than before. It is also preparation for
future patches that intend to widen the relfilenumber fields from its
current width of 32 bits. Variables that store a relfilenumber are now
declared as type RelFileNumber rather than type Oid; right now, these
are the same, but that can now more easily be changed.

Dilip Kumar, per an idea from me. Reviewed also by Andres Freund.
I fixed some whitespace issues, changed a couple of words in a
comment, and made one other minor correction.

Discussion: http://postgr.es/m/CA+TgmoamOtXbVAQf9hWFzonUo6bhhjS6toZQd7HZ-pmojtAmag@mail.gmail.com
Discussion: http://postgr.es/m/CA+Tgmobp7+7kmi4gkq7Y+4AM9fTvL+O1oQ4-5gFTT+6Ng-dQ=g@mail.gmail.com
Discussion: http://postgr.es/m/CAFiTN-vTe79M8uDH1yprOU64MNFE+R3ODRuA+JWf27JbhY4hJw@mail.gmail.com
2022-07-06 11:39:09 -04:00
Tomas Vondra e3fcca0d0d Revert changes in HOT handling of BRIN indexes
This reverts commits 5753d4ee32 and fe60b67250 that modified HOT to
ignore BRIN indexes. The commit message for 5753d4ee32 claims that:

    When determining whether an index update may be skipped by using
    HOT, we can ignore attributes indexed only by BRIN indexes. There
    are no index pointers to individual tuples in BRIN, and the page
    range summary will be updated anyway as it relies on visibility
    info.

This is partially incorrect - it's true BRIN indexes don't point to
individual tuples, so HOT chains are not an issue, but the visibitlity
info is not sufficient to keep the index up to date. This can easily
result in corrupted indexes, as demonstrated in the hackers thread.

This does not mean relaxing the HOT restrictions for BRIN is a lost
cause, but it needs to handle the two aspects (allowing HOT chains and
updating the page range summaries) as separate. But that requires a
major changes, and it's too late for that in the current dev cycle.

Reported-by: Tomas Vondra
Discussion: https://postgr.es/m/05ebcb44-f383-86e3-4f31-0a97a55634cf@enterprisedb.com
2022-06-16 15:02:49 +02:00
David Rowley c90c16591c Fix some incorrect preprocessor tests in tuplesort specializations
697492434 added 3 new quicksort specialization functions for common
datatypes.

That commit was not very consistent in how it would determine if we're
compiling for 32-bit or 64-bit machines.  It would sometimes use
USE_FLOAT8_BYVAL and at other times check if SIZEOF_DATUM == 8.  This
could cause theoretical problems due to the way USE_FLOAT8_BYVAL is now
defined based on SIZEOF_VOID_P >= 8.  If pointers for some reason were
ever larger than 8-bytes then we'd end up doing 32-bit comparisons
mistakenly.  Let's just always check SIZEOF_DATUM >= 8.

It also seems that ssup_datum_signed_cmp is just never used on 32-bit
builds, so let's just ifdef that out to make sure we never accidentally
use that comparison function on such machines.  This also allows us to
ifdef out 1 of the 3 new specialization quicksort functions in 32-bit
builds which seems to shrink down the binary by over 4KB on my machine.

In passing, also add the missing DatumGetInt32() / DatumGetInt64() macros
in the comparison functions.

Discussion: https://postgr.es/m/CAApHDvqcQExRhtRa9hJrJB_5egs3SUfOcutP3m+3HO8A+fZTPA@mail.gmail.com
Reviewed-by: John Naylor
2022-05-11 11:38:13 +12:00
Alvaro Herrera 24d2b2680a
Remove extraneous blank lines before block-closing braces
These are useless and distracting.  We wouldn't have written the code
with them to begin with, so there's no reason to keep them.

Author: Justin Pryzby <pryzby@telsasoft.com>
Discussion: https://postgr.es/m/20220411020336.GB26620@telsasoft.com
Discussion: https://postgr.es/m/attachment/133167/0016-Extraneous-blank-lines.patch
2022-04-13 19:16:02 +02:00
Robert Haas 7fc0e7de9f Revert the addition of GetMaxBackends() and related stuff.
This reverts commits 0147fc7, 4567596, aa64f23, and 5ecd018.
There is no longer agreement that introducing this function
was the right way to address the problem. The consensus now
seems to favor trying to make a correct value for MaxBackends
available to mdules executing their _PG_init() functions.

Nathan Bossart

Discussion: http://postgr.es/m/20220323045229.i23skfscdbvrsuxa@jrouhaud
2022-04-12 14:45:23 -04:00
Tom Lane bd037dc928 Make XLogRecGetBlockTag() throw error if there's no such block.
All but a few existing callers assume without checking that this
function succeeds.  While it probably will, that's a poor excuse for
not checking.  Let's make it return void and instead throw an error
if it doesn't find the block reference.  Callers that actually need
to handle the no-such-block case must now use the underlying function
XLogRecGetBlockTagExtended.

In addition to being a bit less error-prone, this should also serve
to suppress some Coverity complaints about XLogRecGetBlockRefInfo.

While at it, clean up some inconsistency about use of the
XLogRecHasBlockRef macro: make XLogRecGetBlockTagExtended use
that instead of open-coding the same condition, and avoid calling
XLogRecHasBlockRef twice in relevant code paths.  (That is,
calling XLogRecHasBlockRef followed by XLogRecGetBlockTag is now
deprecated: use XLogRecGetBlockTagExtended instead.)

Patch HEAD only; this doesn't seem to have enough value to consider
a back-branch API break.

Discussion: https://postgr.es/m/425039.1649701221@sss.pgh.pa.us
2022-04-11 17:43:53 -04:00
Tom Lane dbafe127bb Suppress "variable 'pagesaving' set but not used" warning.
With asserts disabled, late-model clang notices that this variable
is incremented but never otherwise read.

Discussion: https://postgr.es/m/3171401.1649275153@sss.pgh.pa.us
2022-04-06 17:03:50 -04:00
David Rowley 77bae396df Adjust tuplesort API to have bitwise option flags
This replaces the bool flag for randomAccess.  An upcoming patch requires
adding another option, so instead of breaking the API for that, then
breaking it again one day if we add more options, let's just break it
once.  Any boolean options we add in the future will just make use of an
unused bit in the flags.

Any extensions making use of tuplesorts will need to update their code
to pass TUPLESORT_RANDOMACCESS instead of true for randomAccess.
TUPLESORT_NONE can be used for a set of empty options.

Author: David Rowley
Reviewed-by: Justin Pryzby
Discussion: https://postgr.es/m/CAApHDvoH4ASzsAOyHcxkuY01Qf%2B%2B8JJ0paw%2B03dk%2BW25tQEcNQ%40mail.gmail.com
2022-04-04 22:24:59 +12:00
John Naylor 6974924347 Specialize tuplesort routines for different kinds of abbreviated keys
Previously, the specialized tuplesort routine inlined handling for
reverse-sort and NULLs-ordering but called the datum comparator via a
pointer in the SortSupport struct parameter. Testing has showed that we
can get a useful performance gain by specializing datum comparison for
the different representations of abbreviated keys -- signed and unsigned
64-bit integers and signed 32-bit integers. Almost all abbreviatable data
types will benefit -- the only exception for now is numeric, since the
datum comparison is more complex. The performance gain depends on data
type and input distribution, but often falls in the range of 10-20% faster.

Thomas Munro

Reviewed by Peter Geoghegan, review and performance testing by me

Discussion:
https://www.postgresql.org/message-id/CA%2BhUKGKKYttZZk-JMRQSVak%3DCXSJ5fiwtirFf%3Dn%3DPAbumvn1Ww%40mail.gmail.com
2022-04-02 15:22:25 +07:00
Michael Paquier d16773cdc8 Add macros in hash and btree AMs to get the special area of their pages
This makes the code more consistent with SpGiST, GiST and GIN, that
already use this style, and the idea is to make easier the introduction
of more sanity checks for each of these AM-specific macros.  BRIN uses a
different set of macros to get a page's type and flags, so it has no
need for something similar.

Author: Matthias van de Meent
Discussion: https://postgr.es/m/CAEze2WjE3+tGO9Fs9+iZMU+z6mMZKo54W1Zt98WKqbEUHbHOBg@mail.gmail.com
2022-04-01 13:24:50 +09:00
Robert Haas 9c08aea6a3 Add new block-by-block strategy for CREATE DATABASE.
Because this strategy logs changes on a block-by-block basis, it
avoids the need to checkpoint before and after the operation.
However, because it logs each changed block individually, it might
generate a lot of extra write-ahead logging if the template database
is large. Therefore, the older strategy remains available via a new
STRATEGY parameter to CREATE DATABASE, and a corresponding --strategy
option to createdb.

Somewhat controversially, this patch assembles the list of relations
to be copied to the new database by reading the pg_class relation of
the template database. Cross-database access like this isn't normally
possible, but it can be made to work here because there can't be any
connections to the database being copied, nor can it contain any
in-doubt transactions. Even so, we have to use lower-level interfaces
than normal, since the table scan and relcache interfaces will not
work for a database to which we're not connected. The advantage of
this approach is that we do not need to rely on the filesystem to
determine what ought to be copied, but instead on PostgreSQL's own
knowledge of the database structure. This avoids, for example,
copying stray files that happen to be located in the source database
directory.

Dilip Kumar, with a fairly large number of cosmetic changes by me.
Reviewed and tested by Ashutosh Sharma, Andres Freund, John Naylor,
Greg Nancarrow, Neha Sharma. Additional feedback from Bruce Momjian,
Heikki Linnakangas, Julien Rouhaud, Adam Brusselback, Kyotaro
Horiguchi, Tomas Vondra, Andrew Dunstan, Álvaro Herrera, and others.

Discussion: http://postgr.es/m/CA+TgmoYtcdxBjLh31DLxUXHxFVMPGzrU5_T=CYCvRyFHywSBUQ@mail.gmail.com
2022-03-29 11:48:36 -04:00
Robert Haas aa64f23b02 Remove MaxBackends variable in favor of GetMaxBackends() function.
Previously, it was really easy to write code that accessed MaxBackends
before we'd actually initialized it, especially when coding up an
extension. To make this less error-prune, introduce a new function
GetMaxBackends() which should be used to obtain the correct value.
This will ERROR if called too early. Demote the global variable to
a file-level static, so that nobody can peak at it directly.

Nathan Bossart. Idea by Andres Freund. Review by Greg Sabino Mullane,
by Michael Paquier (who had doubts about the approach), and by me.

Discussion: http://postgr.es/m/20210802224204.bckcikl45uezv5e4@alap3.anarazel.de
2022-02-08 15:53:19 -05:00
Peter Eisentraut 94aa7cc5f7 Add UNIQUE null treatment option
The SQL standard has been ambiguous about whether null values in
unique constraints should be considered equal or not.  Different
implementations have different behaviors.  In the SQL:202x draft, this
has been formalized by making this implementation-defined and adding
an option on unique constraint definitions UNIQUE [ NULLS [NOT]
DISTINCT ] to choose a behavior explicitly.

This patch adds this option to PostgreSQL.  The default behavior
remains UNIQUE NULLS DISTINCT.  Making this happen in the btree code
is pretty easy; most of the patch is just to carry the flag around to
all the places that need it.

The CREATE UNIQUE INDEX syntax extension is not from the standard,
it's my own invention.

I named all the internal flags, catalog columns, etc. in the negative
("nulls not distinct") so that the default PostgreSQL behavior is the
default if the flag is false.

Reviewed-by: Maxim Orlov <orlovmg@gmail.com>
Reviewed-by: Pavel Borisov <pashkin.elfe@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/84e5ee1b-387e-9a54-c326-9082674bde78@enterprisedb.com
2022-02-03 11:48:21 +01:00