Commit Graph

171 Commits

Author SHA1 Message Date
Robert Haas b0313f9cc8 pageinspect: Fix use of wrong memory context by hash_page_items.
This can cause it to produce incorrect output.

Report and patch by Masahiko Sawada.

Discussion: http://postgr.es/m/CAD21AoBc5Asx7pXdUWu6NqU_g=Ysn95EGL9SMeYhLLduYoO_OA@mail.gmail.com
2018-01-26 09:56:33 -05:00
Tom Lane 18869e202b Fix new test case to not be endian-dependent.
Per buildfarm.

Discussion: https://postgr.es/m/ec295792-a69f-350f-6287-25a20e8f31d5@gmail.com
2018-01-04 16:00:21 -05:00
Tom Lane 39cfe86195 Fix incorrect computations of length of null bitmap in pageinspect.
Instead of using our standard macro for this calculation, this code
did it itself ... and got it wrong, leading to incorrect display of
the null bitmap in some cases.  Noted and fixed by Maksim Milyutin.

In passing, remove a uselessly duplicative error check.

Errors were introduced in commit d6061f83a; back-patch to 9.6
where that came in.

Maksim Milyutin, reviewed by Andrey Borodin

Discussion: https://postgr.es/m/ec295792-a69f-350f-6287-25a20e8f31d5@gmail.com
2018-01-04 14:59:00 -05:00
Bruce Momjian 9d4649ca49 Update copyright for 2018
Backpatch-through: certain files through 9.3
2018-01-02 23:30:12 -05:00
Andres Freund 2cd7084524 Change tupledesc->attrs[n] to TupleDescAttr(tupledesc, n).
This is a mechanical change in preparation for a later commit that
will change the layout of TupleDesc.  Introducing a macro to abstract
the details of where attributes are stored will allow us to change
that in separate step and revise it in future.

Author: Thomas Munro, editorialized by Andres Freund
Reviewed-By: Andres Freund
Discussion: https://postgr.es/m/CAEepm=0ZtQ-SpsgCyzzYpsXS6e=kZWqk3g5Ygn3MDV7A8dabUA@mail.gmail.com
2017-08-20 11:19:07 -07:00
Robert Haas 620b49a16d hash: Increase the number of possible overflow bitmaps by 8x.
Per a report from AP, it's not that hard to exhaust the supply of
bitmap pages if you create a table with a hash index and then insert a
few billion rows - and then you start getting errors when you try to
insert additional rows.  In the particular case reported by AP,
there's another fix that we can make to improve recycling of overflow
pages, which is another way to avoid the error, but there may be other
cases where this problem happens and that fix won't help.  So let's
buy ourselves as much headroom as we can without rearchitecting
anything.

The comments claim that the old limit was 64GB, but it was really
only 32GB, because we didn't use all the bits in the page for bitmap
bits - only the largest power of 2 that could fit after deducting
space for the page header and so forth.  Thus, we have 4kB per page
for bitmap bits, not 8kB.  The new limit is thus actually 8 times the
old *real* limit but only 4 times the old *purported* limit.

Since this breaks on-disk compatibility, bump HASH_VERSION.  We've
already done this earlier in this release cycle, so this doesn't cause
any incremental inconvenience for people using pg_upgrade from
releases prior to v10.  However, users who use pg_upgrade to reach
10beta3 or later from 10beta2 or earlier will need to REINDEX any hash
indexes again.

Amit Kapila and Robert Haas

Discussion: http://postgr.es/m/20170704105728.mwb72jebfmok2nm2@zip.com.au
2017-08-04 16:30:32 -04:00
Tom Lane 382ceffdf7 Phase 3 of pgindent updates.
Don't move parenthesized lines to the left, even if that means they
flow past the right margin.

By default, BSD indent lines up statement continuation lines that are
within parentheses so that they start just to the right of the preceding
left parenthesis.  However, traditionally, if that resulted in the
continuation line extending to the right of the desired right margin,
then indent would push it left just far enough to not overrun the margin,
if it could do so without making the continuation line start to the left of
the current statement indent.  That makes for a weird mix of indentations
unless one has been completely rigid about never violating the 80-column
limit.

This behavior has been pretty universally panned by Postgres developers.
Hence, disable it with indent's new -lpl switch, so that parenthesized
lines are always lined up with the preceding left paren.

This patch is much less interesting than the first round of indent
changes, but also bulkier, so I thought it best to separate the effects.

Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
2017-06-21 15:35:54 -04:00
Tom Lane c7b8998ebb Phase 2 of pgindent updates.
Change pg_bsd_indent to follow upstream rules for placement of comments
to the right of code, and remove pgindent hack that caused comments
following #endif to not obey the general rule.

Commit e3860ffa4d wasn't actually using
the published version of pg_bsd_indent, but a hacked-up version that
tried to minimize the amount of movement of comments to the right of
code.  The situation of interest is where such a comment has to be
moved to the right of its default placement at column 33 because there's
code there.  BSD indent has always moved right in units of tab stops
in such cases --- but in the previous incarnation, indent was working
in 8-space tab stops, while now it knows we use 4-space tabs.  So the
net result is that in about half the cases, such comments are placed
one tab stop left of before.  This is better all around: it leaves
more room on the line for comment text, and it means that in such
cases the comment uniformly starts at the next 4-space tab stop after
the code, rather than sometimes one and sometimes two tabs after.

Also, ensure that comments following #endif are indented the same
as comments following other preprocessor commands such as #else.
That inconsistency turns out to have been self-inflicted damage
from a poorly-thought-through post-indent "fixup" in pgindent.

This patch is much less interesting than the first round of indent
changes, but also bulkier, so I thought it best to separate the effects.

Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
2017-06-21 15:19:25 -04:00
Bruce Momjian a6fd7b7a5f Post-PG 10 beta1 pgindent run
perltidy run not included.
2017-05-17 16:31:56 -04:00
Tom Lane 2040bb4a0b Clean up manipulations of hash indexes' hasho_flag field.
Standardize on testing a hash index page's type by doing
	(opaque->hasho_flag & LH_PAGE_TYPE) == LH_xxx_PAGE
Various places were taking shortcuts like
	opaque->hasho_flag & LH_BUCKET_PAGE
which while not actually wrong, is still bad practice because
it encourages use of
	opaque->hasho_flag & LH_UNUSED_PAGE
which *is* wrong (LH_UNUSED_PAGE == 0, so the above is constant false).
hash_xlog.c's hash_mask() contained such an incorrect test.

This also ensures that we mask out the additional flag bits that
hasho_flag has accreted since 9.6.  pgstattuple's pgstat_hash_page(),
for one, was failing to do that and was thus actively broken.

Also fix assorted comments that hadn't been updated to reflect the
extended usage of hasho_flag, and fix some macros that were testing
just "(hasho_flag & bit)" to use the less dangerous, project-approved
form "((hasho_flag & bit) != 0)".

Coverity found the bug in hash_mask(); I noted the one in
pgstat_hash_page() through code reading.
2017-04-14 17:04:25 -04:00
Alvaro Herrera 8bf74967da Reduce the number of pallocs() in BRIN
Instead of allocating memory in brin_deform_tuple and brin_copy_tuple
over and over during a scan, allow reuse of previously allocated memory.
This is said to make for a measurable performance improvement.

Author: Jinyu Zhang, Álvaro Herrera
Reviewed by: Tomas Vondra
Discussion: https://postgr.es/m/495deb78.4186.1500dacaa63.Coremail.beijing_pg@163.com
2017-04-07 19:08:43 -03:00
Robert Haas 633e15ea0f Fix pageinspect failures on hash indexes.
Make every page in a hash index which isn't all-zeroes have a valid
special space, so that tools like pageinspect don't error out.

Also, make pageinspect cope with all-zeroes pages, because
_hash_alloc_buckets can leave behind large numbers of those until
they're consumed by splits.

Ashutosh Sharma and Robert Haas, reviewed by Amit Kapila.
Original trouble report from Jeff Janes.

Discussion: http://postgr.es/m/CAMkU=1y6NjKmqbJ8wLMhr=F74WzcMALYWcVFhEpm7i=mV=XsOg@mail.gmail.com
2017-04-05 14:18:15 -04:00
Peter Eisentraut 193f5f9e91 pageinspect: Add bt_page_items function with bytea argument
Author: Tomas Vondra <tomas.vondra@2ndquadrant.com>
Reviewed-by: Ashutosh Sharma <ashu.coek88@gmail.com>
2017-04-04 23:52:55 -04:00
Robert Haas ea69a0dead Expand hash indexes more gradually.
Since hash indexes typically have very few overflow pages, adding a
new splitpoint essentially doubles the on-disk size of the index,
which can lead to large and abrupt increases in disk usage (and
perhaps long delays on occasion).  To mitigate this problem to some
degree, divide larger splitpoints into four equal phases.  This means
that, for example, instead of growing from 4GB to 8GB all at once, a
hash index will now grow from 4GB to 5GB to 6GB to 7GB to 8GB, which
is perhaps still not as smooth as we'd like but certainly an
improvement.

This changes the on-disk format of the metapage, so bump HASH_VERSION
from 2 to 3.  This will force a REINDEX of all existing hash indexes,
but that's probably a good idea anyway.  First, hash indexes from
pre-10 versions of PostgreSQL could easily be corrupted, and we don't
want to confuse corruption carried over from an older release with any
corruption caused despite the new write-ahead logging in v10.  Second,
it will let us remove some backward-compatibility code added by commit
293e24e507.

Mithun Cy, reviewed by Amit Kapila, Jesper Pedersen and me.  Regression
test outputs updated by me.

Discussion: http://postgr.es/m/CAD__OuhG6F1gQLCgMQNnMNgoCvOLQZz9zKYJQNYvYmmJoM42gA@mail.gmail.com
Discussion: http://postgr.es/m/CA+TgmoYty0jCf-pa+m+vYUJ716+AxM7nv_syvyanyf5O-L_i2A@mail.gmail.com
2017-04-03 23:46:33 -04:00
Alvaro Herrera ce96ce60ca Remove direct uses of ItemPointer.{ip_blkid,ip_posid}
There are no functional changes here; this simply encapsulates knowledge
of the ItemPointerData struct so that a future patch can change things
without more breakage.

All direct users of ip_blkid and ip_posid are changed to use existing
macros ItemPointerGetBlockNumber and ItemPointerGetOffsetNumber
respectively.  For callers where that's inappropriate (because they
Assert that the itempointer is is valid-looking), add
ItemPointerGetBlockNumberNoCheck and ItemPointerGetOffsetNumberNoCheck,
which lack the assertion but are otherwise identical.

Author: Pavan Deolasee
Discussion: https://postgr.es/m/CABOikdNnFon4cJiL=h1mZH3bgUeU+sWHuU4Yr8AB=j3A2p1GiA@mail.gmail.com
2017-03-28 19:02:23 -03:00
Peter Eisentraut fef2bcdcba pageinspect: Add page_checksum function
Author: Tomas Vondra <tomas.vondra@2ndquadrant.com>
Reviewed-by: Ashutosh Sharma <ashu.coek88@gmail.com>
2017-03-17 10:55:17 -04:00
Peter Eisentraut a02731cb10 pageinspect: Add test for page_header function 2017-03-17 09:23:39 -04:00
Robert Haas c11453ce0a hash: Add write-ahead logging support.
The warning about hash indexes not being write-ahead logged and their
use being discouraged has been removed.  "snapshot too old" is now
supported for tables with hash indexes.  Most importantly, barring
bugs, hash indexes will now be crash-safe and usable on standbys.

This commit doesn't yet add WAL consistency checking for hash
indexes, as we now have for other index types; a separate patch has
been submitted to cure that lack.

Amit Kapila, reviewed and slightly modified by me.  The larger patch
series of which this is a part has been reviewed and tested by Álvaro
Herrera, Ashutosh Sharma, Mark Kirkwood, Jeff Janes, and Jesper
Pedersen.

Discussion: http://postgr.es/m/CAA4eK1JOBX=YU33631Qh-XivYXtPSALh514+jR8XeD7v+K3r_Q@mail.gmail.com
2017-03-14 13:27:02 -04:00
Noah Misch 3a0d473192 Use wrappers of PG_DETOAST_DATUM_PACKED() more.
This makes almost all core code follow the policy introduced in the
previous commit.  Specific decisions:

- Text search support functions with char* and length arguments, such as
  prsstart and lexize, may receive unaligned strings.  I doubt
  maintainers of non-core text search code will notice.

- Use plain VARDATA() on values detoasted or synthesized earlier in the
  same function.  Use VARDATA_ANY() on varlenas sourced outside the
  function, even if they happen to always have four-byte headers.  As an
  exception, retain the universal practice of using VARDATA() on return
  values of SendFunctionCall().

- Retain PG_GETARG_BYTEA_P() in pageinspect.  (Page images are too large
  for a one-byte header, so this misses no optimization.)  Sites that do
  not call get_page_from_raw() typically need the four-byte alignment.

- For now, do not change btree_gist.  Its use of four-byte headers in
  memory is partly entangled with storage of 4-byte headers inside
  GBT_VARKEY, on disk.

- For now, do not change gtrgm_consistent() or gtrgm_distance().  They
  incorporate the varlena header into a cache, and there are multiple
  credible implementation strategies to consider.
2017-03-12 19:35:34 -04:00
Stephen Frost c08d82f38e Add relkind checks to certain contrib modules
The contrib extensions pageinspect, pg_visibility and pgstattuple only
work against regular relations which have storage.  They don't work
against foreign tables, partitioned (parent) tables, views, et al.

Add checks to the user-callable functions to return a useful error
message to the user if they mistakenly pass an invalid relation to a
function which doesn't accept that kind of relation.

In passing, improve some of the existing checks to use ereport() instead
of elog(), add a function to consolidate common checks where
appropriate, and add some regression tests.

Author: Amit Langote, with various changes by me
Reviewed by: Michael Paquier and Corey Huinker
Discussion: https://postgr.es/m/ab91fd9d-4751-ee77-c87b-4dd704c1e59c@lab.ntt.co.jp
2017-03-09 16:34:25 -05:00
Robert Haas b4316928d5 Fix incorrect typecast.
Ashutosh Sharma, per a report from Mithun Cy.

Discussion: http://postgr.es/m/CAD__OujgqNNnCujeFTmKpjNu+W4smS8Hbi=RcWAhf1ZUs3H4WA@mail.gmail.com
2017-02-22 12:05:42 +05:30
Robert Haas fc8219dc54 pageinspect: Fix hash_bitmap_info not to read the underlying page.
It did that to verify that the page was an overflow page rather than
anything else, but that means that checking the status of all the
overflow bits requires reading the entire index.  So don't do that.
The new code validates that the page is not a primary bucket page
or bitmap page by looking at the metapage, so that using this on
large numbers of pages can be reasonably efficient.

Ashutosh Sharma, per a complaint from me, and with further
modifications by me.
2017-02-09 14:34:34 -05:00
Robert Haas 293e24e507 Cache hash index's metapage in rel->rd_amcache.
This avoids a very significant amount of buffer manager traffic and
contention when scanning hash indexes, because it's no longer
necessary to lock and pin the metapage for every scan.  We do need
some way of figuring out when the cache is too stale to use any more,
so that when we lock the primary bucket page to which the cached
metapage points us, we can tell whether a split has occurred since we
cached the metapage data.  To do that, we use the hash_prevblkno field
in the primary bucket page, which would otherwise always be set to
InvalidBuffer.

This patch contains code so that it will continue working (although
less efficiently) with hash indexes built before this change, but
perhaps we should consider bumping the hash version and ripping out
the compatibility code.  That decision can be made later, though.

Mithun Cy, reviewed by Jesper Pedersen, Amit Kapila, and by me.
Before committing, I made a number of cosmetic changes to the last
posted version of the patch, adjusted _hash_getcachedmetap to be more
careful about order of operation, and made some necessary updates to
the pageinspect documentation and regression tests.
2017-02-07 12:35:45 -05:00
Robert Haas 871ec0e336 pageinspect: More type-sanity surgery on the new hash index code.
Uniformly expose unsigned quantities using the next-wider signed
integer type (since we have no unsigned types at the SQL level).
At the SQL level, this results a change to report itemoffset as
int4 rather than int2.  Also at the SQL level, report one value
that is an OID as type oid.  Under the hood, uniformly use macros
that match the SQL output type as to both width and signedness.
2017-02-03 16:28:13 -05:00
Tom Lane 14e9b18fed In pageinspect/hashfuncs.c, avoid crashes on alignment-picky machines.
On machines with MAXALIGN = 8, the payload of a bytea is not maxaligned,
since it will start 4 bytes into a palloc'd value.  On alignment-picky
hardware, this will cause failures in accesses to 8-byte-wide values
within the page.  We already encountered this problem when we introduced
GIN index inspection functions, and fixed it in commit 84ad68d64.  Make
use of the same function for hash indexes.

A small difficulty is that up to now contrib/pageinspect has not shared
any functions at all across files.  To support that, introduce a common
header file "pageinspect.h" for the module.

Also, move get_page_from_raw() out of ginfuncs.c, where it didn't
especially belong, and put it in rawpage.c which seems a more natural home.

Per buildfarm.

Discussion: https://postgr.es/m/17311.1486134714@sss.pgh.pa.us
2017-02-03 11:34:47 -05:00
Robert Haas 29e312bc13 pageinspect: Remove platform-dependent values from hash tests.
Per a report from Tom Lane, the ffactor reported by hash_metapage_info
and the free_size reported by hash_page_stats vary by platform.

Ashutosh Sharma and Robert Haas
2017-02-03 11:06:41 -05:00
Tom Lane c6eeb67dcc Fix a bunch more portability bugs in commit 08bf6e529.
It seems like somebody used a dartboard while choosing integer widths
for the various values taken and returned by these functions ... and
then threw a fresh set of darts while writing the SQL declarations.

This patch brings the C code into line with what the SQL declarations
say, which is enough to make it not dump core on the particular 32-bit
machine I'm testing on.  But I think we could do with another round
of looking at what the datum widths *should* be.  For instance, it's
not all that sensible that hash_bitmap_info decided to use int64 to
represent a BlockNumber input when get_raw_page doesn't do it that way.

There's also a remaining problem that the expected outputs from the
test script are platform-dependent, but I'll leave that issue for
somebody else.

Per buildfarm.
2017-02-02 23:11:08 -05:00
Robert Haas ed807fda6d pageinspect: Try to fix some bugs in previous commit.
Commit 08bf6e5295 seems not to have
used the correct *GetDatum and PG_GETARG_* macros for the SQL types
in some cases, and some of the SQL types seem to have been poorly
chosen, too.  Try to fix it.  I'm not sure if this is the reason
why the buildfarm is currently unhappy with this code, but it
seems like a good place to start.

Buildfarm unhappiness reported by Tom Lane.
2017-02-02 22:32:06 -05:00
Robert Haas 08bf6e5295 pageinspect: Support hash indexes.
Patch by Jesper Pedersen and Ashutosh Sharma, with some error handling
improvements by me.  Tests from Peter Eisentraut.  Reviewed by Álvaro
Herrera, Michael Paquier, Jesper Pedersen, Jeff Janes, Peter
Eisentraut, Amit Kapila, Mithun Cy, and me.

Discussion: http://postgr.es/m/e2ac6c58-b93f-9dd9-f4e6-d6d30add7fdf@redhat.com
2017-02-02 14:19:32 -05:00
Peter Eisentraut f21a563d25 Move some things from builtins.h to new header files
This avoids that builtins.h has to include additional header files.
2017-01-20 20:29:53 -05:00
Bruce Momjian 1d25779284 Update copyright via script for 2017 2017-01-03 13:48:53 -05:00
Tom Lane 367b99bbb1 Fix gin_leafpage_items().
On closer inspection, commit 84ad68d64 broke gin_leafpage_items(),
because the aligned copy of the page got palloc'd in a short-lived
context whereas it needs to be in the SRF's multi_call_memory_ctx.
This was not exposed by the regression test, because the regression
test doesn't actually exercise the function in a meaningful way.
Fix the code bug, and extend the test in what I hope is a portable
fashion.
2016-11-04 12:11:54 -04:00
Peter Eisentraut 84ad68d645 pageinspect: Fix unaligned struct access in GIN functions
The raw page data that is passed into the functions will not be aligned
at 8-byte boundaries.  Casting that to a struct and accessing int64
fields will result in unaligned access.  On most platforms, you get away
with it, but it will result on a crash on pickier platforms such as ia64
and sparc64.
2016-11-04 10:05:37 -04:00
Peter Eisentraut 00a86856c1 pageinspect: Make page test more portable
Choose test data that makes the output independent of endianness.
2016-11-02 08:45:17 -04:00
Tom Lane 14ee35799f Fix portability bug in gin_page_opaque_info().
Somebody apparently thought that "if Int32GetDatum is good,
Int64GetDatum must be better".  Per buildfarm failures now
that Peter has added some regression tests here.
2016-11-02 00:09:27 -04:00
Peter Eisentraut f7c9a6e083 pageinspect: Make btree test more portable
Choose test data that makes the output independent of endianness and
alignment.
2016-11-01 22:02:39 -04:00
Peter Eisentraut adfb81d9e1 pageinspect: Add tests 2016-11-01 14:02:16 -04:00
Heikki Linnakangas 8a2f08fbea Fix typo in comment.
Daniel Gustafsson
2016-10-26 11:10:13 +03:00
Robert Haas 4bc424b968 pgindent run for 9.6 2016-06-09 18:02:36 -04:00
Robert Haas e3b607cd7a Update pageinspect extension for parallel query.
All functions provided by this extension are PARALLEL SAFE.

Andreas Karlsson
2016-06-09 17:18:09 -04:00
Robert Haas 8826d85078 Tweak a few more things in preparation for upcoming pgindent run.
These adjustments adjust code and comments in minor ways to prevent
pgindent from mangling them.  Among other things, I tried to avoid
situations where pgindent would emit "a +b" instead of "a + b", and I
tried to avoid having it break up inline comments across multiple
lines.
2016-05-03 10:52:25 -04:00
Heikki Linnakangas d22b85fbd4 Remove unused macros.
CHECK_PAGE_OFFSET_RANGE() has been unused forever.
CHECK_RELATION_BLOCK_RANGE() has been unused in pgstatindex.c ever since
bt_page_stats() and bt_page_items() functions were moved from pgstattuple
to pageinspect module. It still exists in pageinspect/btreefuncs.c.

Daniel Gustafsson
2016-05-02 10:07:49 +03:00
Kevin Grittner a343e223a5 Revert no-op changes to BufferGetPage()
The reverted changes were intended to force a choice of whether any
newly-added BufferGetPage() calls needed to be accompanied by a
test of the snapshot age, to support the "snapshot too old"
feature.  Such an accompanying test is needed in about 7% of the
cases, where the page is being used as part of a scan rather than
positioning for other purposes (such as DML or vacuuming).  The
additional effort required for back-patching, and the doubt whether
the intended benefit would really be there, have indicated it is
best just to rely on developers to do the right thing based on
comments and existing usage, as we do with many other conventions.

This change should have little or no effect on generated executable
code.

Motivated by the back-patching pain of Tom Lane and Robert Haas
2016-04-20 08:31:19 -05:00
Kevin Grittner 8b65cf4c5e Modify BufferGetPage() to prepare for "snapshot too old" feature
This patch is a no-op patch which is intended to reduce the chances
of failures of omission once the functional part of the "snapshot
too old" patch goes in.  It adds parameters for snapshot, relation,
and an enum to specify whether the snapshot age check needs to be
done for the page at this point.  This initial patch passes NULL
for the first two new parameters and BGP_NO_SNAPSHOT_TEST for the
third.  The follow-on patch will change the places where the test
needs to be made.
2016-04-08 14:30:10 -05:00
Peter Eisentraut 339025c68f Replace printf format %i by %d
see also ce8d7bb644
2016-04-08 12:42:58 -04:00
Peter Eisentraut 8b737f9084 Fix printf format 2016-04-08 12:34:33 -04:00
Alvaro Herrera 3e1338475f Add missing checks to some of pageinspect's BRIN functions
brin_page_type() and brin_metapage_info() did not enforce being called
by superuser, like other pageinspect functions that take bytea do.
Since they don't verify the passed page thoroughly, it is possible to
use them to read the server memory with a carefully crafted bytea value,
up to a file kilobytes from where the input bytea is located.

Have them throw errors if called by a non-superuser.

Report and initial patch: Andreas Seltenreich

Security: CVE-2016-3065
2016-03-28 10:57:42 -03:00
Tom Lane 65c5fcd353 Restructure index access method API to hide most of it at the C level.
This patch reduces pg_am to just two columns, a name and a handler
function.  All the data formerly obtained from pg_am is now provided
in a C struct returned by the handler function.  This is similar to
the designs we've adopted for FDWs and tablesample methods.  There
are multiple advantages.  For one, the index AM's support functions
are now simple C functions, making them faster to call and much less
error-prone, since the C compiler can now check function signatures.
For another, this will make it far more practical to define index access
methods in installable extensions.

A disadvantage is that SQL-level code can no longer see attributes
of index AMs; in particular, some of the crosschecks in the opr_sanity
regression test are no longer possible from SQL.  We've addressed that
by adding a facility for the index AM to perform such checks instead.
(Much more could be done in that line, but for now we're content if the
amvalidate functions more or less replace what opr_sanity used to do.)
We might also want to expose some sort of reporting functionality, but
this patch doesn't do that.

Alexander Korotkov, reviewed by Petr Jelínek, and rather heavily
editorialized on by me.
2016-01-17 19:36:59 -05:00
Bruce Momjian ee94300446 Update copyright for 2016
Backpatch certain files through 9.1
2016-01-02 13:33:40 -05:00
Teodor Sigaev 0271e27c10 Add forgotten file in commit d6061f83a1 2015-11-25 16:59:07 +03:00
Teodor Sigaev d6061f83a1 Improve pageinspect module
Now pageinspect can show data stored in the heap tuple.

Nikolay Shaplov
2015-11-25 16:31:55 +03:00
Alvaro Herrera 94d626ff5a Use materialize SRF mode in brin_page_items
This function was using the single-value-per-call mechanism, but the
code relied on a relcache entry that wasn't kept open across calls.
This manifested as weird errors in buildfarm during the short time that
the "brin-1" isolation test lived.

Backpatch to 9.5, where it was introduced.
2015-08-13 13:02:10 -03:00
Bruce Momjian 807b9e0dff pgindent run for 9.5 2015-05-23 21:35:49 -04:00
Alvaro Herrera db5f98ab4f Improve BRIN infra, minmax opclass and regression test
The minmax opclass was using the wrong support functions when
cross-datatypes queries were run.  Instead of trying to fix the
pg_amproc definitions (which apparently is not possible), use the
already correct pg_amop entries instead.  This requires jumping through
more hoops (read: extra syscache lookups) to obtain the underlying
functions to execute, but it is necessary for correctness.

Author: Emre Hasegeli, tweaked by Álvaro
Review: Andreas Karlsson

Also change BrinOpcInfo to record each stored type's typecache entry
instead of just the OID.  Turns out that the full type cache is
necessary in brin_deform_tuple: the original code used the indexed
type's byval and typlen properties to extract the stored tuple, which is
correct in Minmax; but in other implementations that want to store
something different, that's wrong.  The realization that this is a bug
comes from Emre also, but I did not use his patch.

I also adopted Emre's regression test code (with smallish changes),
which is more complete.
2015-05-07 13:02:22 -03:00
Alvaro Herrera e491bd2ee3 Move BRIN page type to page's last two bytes
... which is the usual convention among AMs, so that pg_filedump and
similar utilities can tell apart pages of different AMs.  It was also
the intent of the original code, but I failed to realize that alignment
considerations would move the whole thing to the previous-to-last word
in the page.

The new definition of the associated macro makes surrounding code a bit
leaner, too.

Per note from Heikki at
http://www.postgresql.org/message-id/546A16EF.9070005@vmware.com
2015-03-10 12:27:15 -03:00
Tom Lane e1a11d9311 Use FLEXIBLE_ARRAY_MEMBER for HeapTupleHeaderData.t_bits[].
This requires changing quite a few places that were depending on
sizeof(HeapTupleHeaderData), but it seems for the best.

Michael Paquier, some adjustments by me
2015-02-21 15:13:06 -05:00
Tom Lane 09d8d110a6 Use FLEXIBLE_ARRAY_MEMBER in a bunch more places.
Replace some bogus "x[1]" declarations with "x[FLEXIBLE_ARRAY_MEMBER]".
Aside from being more self-documenting, this should help prevent bogus
warnings from static code analyzers and perhaps compiler misoptimizations.

This patch is just a down payment on eliminating the whole problem, but
it gets rid of a lot of easy-to-fix cases.

Note that the main problem with doing this is that one must no longer rely
on computing sizeof(the containing struct), since the result would be
compiler-dependent.  Instead use offsetof(struct, lastfield).  Autoconf
also warns against spelling that offsetof(struct, lastfield[0]).

Michael Paquier, review and additional fixes by me.
2015-02-20 00:11:42 -05:00
Bruce Momjian 4baaf863ec Update copyright for 2015
Backpatch certain files through 9.0
2015-01-06 11:43:47 -05:00
Alvaro Herrera b52cb4690e pageinspect/BRIN: minor tweaks
Michael Paquier

Double-dash additions suggested by Peter Geoghegan
2014-12-02 12:20:50 -03:00
Heikki Linnakangas 3a82bc6f8a Add pageinspect functions for inspecting GIN indexes.
Patch by me, Peter Geoghegan and Michael Paquier, reviewed by Amit Kapila.
2014-11-21 11:58:07 +02:00
Alvaro Herrera 7516f52594 BRIN: Block Range Indexes
BRIN is a new index access method intended to accelerate scans of very
large tables, without the maintenance overhead of btrees or other
traditional indexes.  They work by maintaining "summary" data about
block ranges.  Bitmap index scans work by reading each summary tuple and
comparing them with the query quals; all pages in the range are returned
in a lossy TID bitmap if the quals are consistent with the values in the
summary tuple, otherwise not.  Normal index scans are not supported
because these indexes do not store TIDs.

As new tuples are added into the index, the summary information is
updated (if the block range in which the tuple is added is already
summarized) or not; in the latter case, a subsequent pass of VACUUM or
the brin_summarize_new_values() function will create the summary
information.

For data types with natural 1-D sort orders, the summary info consists
of the maximum and the minimum values of each indexed column within each
page range.  This type of operator class we call "Minmax", and we
supply a bunch of them for most data types with B-tree opclasses.
Since the BRIN code is generalized, other approaches are possible for
things such as arrays, geometric types, ranges, etc; even for things
such as enum types we could do something different than minmax with
better results.  In this commit I only include minmax.

Catalog version bumped due to new builtin catalog entries.

There's more that could be done here, but this is a good step forwards.

Loosely based on ideas from Simon Riggs; code mostly by Álvaro Herrera,
with contribution by Heikki Linnakangas.

Patch reviewed by: Amit Kapila, Heikki Linnakangas, Robert Haas.
Testing help from Jeff Janes, Erik Rijkers, Emanuel Calvo.

PS:
  The research leading to these results has received funding from the
  European Union's Seventh Framework Programme (FP7/2007-2013) under
  grant agreement n° 318633.
2014-11-07 16:38:14 -03:00
Andres Freund d153b80161 Fix typos in some error messages thrown by extension scripts when fed to psql.
Some of the many error messages introduced in 458857cc missed 'FROM
unpackaged'. Also e016b724 and 45ffeb7e forgot to quote extension
version numbers.

Backpatch to 9.1, just like 458857cc which introduced the messages. Do
so because the error messages thrown when the wrong command is copy &
pasted aren't easy to understand.
2014-08-25 18:30:37 +02:00
Tom Lane 27cef0a561 Check block number against the correct fork in get_raw_page().
get_raw_page tried to validate the supplied block number against
RelationGetNumberOfBlocks(), which of course is only right when
accessing the main fork.  In most cases, the main fork is longer
than the others, so that the check was too weak (allowing a
lower-level error to be reported, but no real harm to be done).
However, very small tables could have an FSM larger than their heap,
in which case the mistake prevented access to some FSM pages.
Per report from Torsten Foertsch.

In passing, make the bad-block-number error into an ereport not elog
(since it's certainly not an internal error); and fix sloppily
maintained comment for RelationGetNumberOfBlocksInFork.

This has been wrong since we invented relation forks, so back-patch
to all supported branches.
2014-07-22 11:46:29 -04:00
Noah Misch 0ffc201a51 Add file version information to most installed Windows binaries.
Prominent binaries already had this metadata.  A handful of minor
binaries, such as pg_regress.exe, still lack it; efforts to eliminate
such exceptions are welcome.

Michael Paquier, reviewed by MauMau.
2014-07-14 14:07:52 -04:00
Bruce Momjian 0a78320057 pgindent run for 9.4
This includes removing tabs after periods in C comments, which was
applied to back branches, so this change should not effect backpatching.
2014-05-06 12:12:18 -04:00
Peter Eisentraut e7128e8dbb Create function prototype as part of PG_FUNCTION_INFO_V1 macro
Because of gcc -Wmissing-prototypes, all functions in dynamically
loadable modules must have a separate prototype declaration.  This is
meant to detect global functions that are not declared in header files,
but in cases where the function is called via dfmgr, this is redundant.
Besides filling up space with boilerplate, this is a frequent source of
compiler warnings in extension modules.

We can fix that by creating the function prototype as part of the
PG_FUNCTION_INFO_V1 macro, which such modules have to use anyway.  That
makes the code of modules cleaner, because there is one less place where
the entry points have to be listed, and creates an additional check that
functions have the right prototype.

Remove now redundant prototypes from contrib and other modules.
2014-04-18 00:03:19 -04:00
Robert Haas 45ffeb7e00 pageinspect: Use new pg_lsn datatype.
Michael Paquier, with slight comment changes by me
2014-03-03 07:15:04 -05:00
Bruce Momjian 7e04792a1c Update copyright for 2014
Update all files in head, and files COPYRIGHT and legal.sgml in all back
branches.
2014-01-07 16:05:30 -05:00
Peter Eisentraut edc43458d7 Add more use of psprintf() 2014-01-06 21:30:26 -05:00
Robert Haas 37484ad2aa Change the way we mark tuples as frozen.
Instead of changing the tuple xmin to FrozenTransactionId, the combination
of HEAP_XMIN_COMMITTED and HEAP_XMIN_INVALID, which were previously never
set together, is now defined as HEAP_XMIN_FROZEN.  A variety of previous
proposals to freeze tuples opportunistically before vacuum_freeze_min_age
is reached have foundered on the objection that replacing xmin by
FrozenTransactionId might hinder debugging efforts when things in this
area go awry; this patch is intended to solve that problem by keeping
the XID around (but largely ignoring the value to which it is set).

Third-party code that checks for HEAP_XMIN_INVALID on tuples where
HEAP_XMIN_COMMITTED might be set will be broken by this change.  To fix,
use the new accessor macros in htup_details.h rather than consulting the
bits directly.  HeapTupleHeaderGetXmin has been modified to return
FrozenTransactionId when the infomask bits indicate that the tuple is
frozen; use HeapTupleHeaderGetRawXmin when you already know that the
tuple isn't marked commited or frozen, or want the raw value anyway.
We currently do this in routines that display the xmin for user consumption,
in tqual.c where it's known to be safe and important for the avoidance of
extra cycles, and in the function-caching code for various procedural
languages, which shouldn't invalidate the cache just because the tuple
gets frozen.

Robert Haas and Andres Freund
2013-12-22 15:49:09 -05:00
Robert Haas f1df4731ee Use cstring_to_text_with_len when length is known.
This avoids a potentially-expensive extra call to strlen().

David Rowley
2013-11-18 10:19:00 -05:00
Fujii Masao 6f9e39bc99 Fix typo in update scripts for some contrib modules. 2013-07-19 04:13:01 +09:00
Heikki Linnakangas 230e92c82b Remove pageinspect--1.0.sql
We're not installing it anymore.

Michael Paquier
2013-05-24 08:11:44 -04:00
Simon Riggs e016b72411 Add pageinspect--1.0--1.sql for checksum changes 2013-03-18 14:39:17 +00:00
Simon Riggs ef04cb745f Add pageinspect--1.1.sql for checksum changes 2013-03-18 14:19:06 +00:00
Simon Riggs bb7cc2623f Remove PageSetTLI and rename pd_tli to pd_checksum
Remove use of PageSetTLI() from all page manipulation functions
and adjust README to indicate change in the way we make changes
to pages. Repurpose those bytes into the pd_checksum field and
explain how that works in comments about page header.

Refactoring ahead of actual feature patch which would make use
of the checksum field, arriving later.

Jeff Davis, with comments and doc changes by Simon Riggs
Direction suggested by Robert Haas; many others providing
review comments.
2013-03-18 13:46:42 +00:00
Alvaro Herrera 0ac5ad5134 Improve concurrency of foreign key locking
This patch introduces two additional lock modes for tuples: "SELECT FOR
KEY SHARE" and "SELECT FOR NO KEY UPDATE".  These don't block each
other, in contrast with already existing "SELECT FOR SHARE" and "SELECT
FOR UPDATE".  UPDATE commands that do not modify the values stored in
the columns that are part of the key of the tuple now grab a SELECT FOR
NO KEY UPDATE lock on the tuple, allowing them to proceed concurrently
with tuple locks of the FOR KEY SHARE variety.

Foreign key triggers now use FOR KEY SHARE instead of FOR SHARE; this
means the concurrency improvement applies to them, which is the whole
point of this patch.

The added tuple lock semantics require some rejiggering of the multixact
module, so that the locking level that each transaction is holding can
be stored alongside its Xid.  Also, multixacts now need to persist
across server restarts and crashes, because they can now represent not
only tuple locks, but also tuple updates.  This means we need more
careful tracking of lifetime of pg_multixact SLRU files; since they now
persist longer, we require more infrastructure to figure out when they
can be removed.  pg_upgrade also needs to be careful to copy
pg_multixact files over from the old server to the new, or at least part
of multixact.c state, depending on the versions of the old and new
servers.

Tuple time qualification rules (HeapTupleSatisfies routines) need to be
careful not to consider tuples with the "is multi" infomask bit set as
being only locked; they might need to look up MultiXact values (i.e.
possibly do pg_multixact I/O) to find out the Xid that updated a tuple,
whereas they previously were assured to only use information readily
available from the tuple header.  This is considered acceptable, because
the extra I/O would involve cases that would previously cause some
commands to block waiting for concurrent transactions to finish.

Another important change is the fact that locking tuples that have
previously been updated causes the future versions to be marked as
locked, too; this is essential for correctness of foreign key checks.
This causes additional WAL-logging, also (there was previously a single
WAL record for a locked tuple; now there are as many as updated copies
of the tuple there exist.)

With all this in place, contention related to tuples being checked by
foreign key rules should be much reduced.

As a bonus, the old behavior that a subtransaction grabbing a stronger
tuple lock than the parent (sub)transaction held on a given tuple and
later aborting caused the weaker lock to be lost, has been fixed.

Many new spec files were added for isolation tester framework, to ensure
overall behavior is sane.  There's probably room for several more tests.

There were several reviewers of this patch; in particular, Noah Misch
and Andres Freund spent considerable time in it.  Original idea for the
patch came from Simon Riggs, after a problem report by Joel Jacobson.
Most code is from me, with contributions from Marti Raudsepp, Alexander
Shulgin, Noah Misch and Andres Freund.

This patch was discussed in several pgsql-hackers threads; the most
important start at the following message-ids:
	AANLkTimo9XVcEzfiBR-ut3KVNDkjm2Vxh+t8kAmWjPuv@mail.gmail.com
	1290721684-sup-3951@alvh.no-ip.org
	1294953201-sup-2099@alvh.no-ip.org
	1320343602-sup-2290@alvh.no-ip.org
	1339690386-sup-8927@alvh.no-ip.org
	4FE5FF020200002500048A3D@gw.wicourts.gov
	4FEAB90A0200002500048B7D@gw.wicourts.gov
2013-01-23 12:04:59 -03:00
Bruce Momjian bd61a623ac Update copyrights for 2013
Fully update git head, and update back branches in ./COPYRIGHT and
legal.sgml files.
2013-01-01 17:15:01 -05:00
Tom Lane d54a94b806 Take buffer lock while inspecting btree index pages in contrib/pageinspect.
It's not safe to examine a shared buffer without any lock.
2012-11-30 17:03:31 -05:00
Alvaro Herrera c219d9b0a5 Split tuple struct defs from htup.h to htup_details.h
This reduces unnecessary exposure of other headers through htup.h, which
is very widely included by many files.

I have chosen to move the function prototypes to the new file as well,
because that means htup.h no longer needs to include tupdesc.h.  In
itself this doesn't have much effect in indirect inclusion of tupdesc.h
throughout the tree, because it's also required by execnodes.h; but it's
something to explore in the future, and it seemed best to do the htup.h
change now while I'm busy with it.
2012-08-30 16:52:35 -04:00
Alvaro Herrera 9df55c8c3f Fix assorted compilation failures in contrib
Evidently I failed to test a compile after my earlier header shuffling.
2012-08-28 23:50:49 -04:00
Heikki Linnakangas 0ab9d1c4b3 Replace XLogRecPtr struct with a 64-bit integer.
This simplifies code that needs to do arithmetic on XLogRecPtrs.

To avoid changing on-disk format of data pages, the LSN on data pages is
still stored in the old format. That should keep pg_upgrade happy. However,
we have XLogRecPtrs embedded in the control file, and in the structs that
are sent over the replication protocol, so this changes breaks compatibility
of pg_basebackup and server. I didn't do anything about this in this patch,
per discussion on -hackers, the right thing to do would to be to change the
replication protocol to be architecture-independent, so that you could use
a newer version of pg_receivexlog, for example, against an older server
version.
2012-06-24 19:19:45 +03:00
Bruce Momjian e126958c2e Update copyright notices for year 2012. 2012-01-01 18:01:58 -05:00
Tom Lane 458857cc9d Throw a useful error message if an extension script file is fed to psql.
We have seen one too many reports of people trying to use 9.1 extension
files in the old-fashioned way of sourcing them in psql.  Not only does
that usually not work (due to failure to substitute for MODULE_PATHNAME
and/or @extschema@), but if it did work they'd get a collection of loose
objects not an extension.  To prevent this, insert an \echo ... \quit
line that prints a suitable error message into each extension script file,
and teach commands/extension.c to ignore lines starting with \echo.
That should not only prevent any adverse consequences of loading a script
file the wrong way, but make it crystal clear to users that they need to
do it differently now.

Tom Lane, following an idea of Andrew Dunstan's.  Back-patch into 9.1
... there is not going to be much value in this if we wait till 9.2.
2011-10-12 15:45:03 -04:00
Bruce Momjian 6416a82a62 Remove unnecessary #include references, per pgrminclude script. 2011-09-01 10:04:27 -04:00
Alvaro Herrera b93f5a5673 Move Trigger and TriggerDesc structs out of rel.h into a new reltrigger.h
This lets us stop including rel.h into execnodes.h, which is a widely
used header.
2011-07-04 14:35:58 -04:00
Peter Eisentraut 5caa3479c2 Clean up most -Wunused-but-set-variable warnings from gcc 4.6
This warning is new in gcc 4.6 and part of -Wall.  This patch cleans
up most of the noise, but there are some still warnings that are
trickier to remove.
2011-04-11 22:28:45 +03:00
Alvaro Herrera 0056066d06 Update pageinspect--1.0.sql to match the upgrade script
Per comment from Tom
2011-02-25 19:39:02 -03:00
Alvaro Herrera a338d65461 Fix pageinspect's heap_page_item to return infomasks as 32 bit values
HeapTupleHeader's t_infomask and t_infomask2 are defined as 16-bit
unsigned integers, so when the 16th bit was set, heap_page_item was
returning them as negative values, which was ugly.

The change to pageinspect--unpackaged--1.0.sql allows a module upgraded
from 9.0 to be cleanly updated from the previous definition.
2011-02-25 19:04:25 -03:00
Tom Lane 029fac2264 Avoid use of CREATE OR REPLACE FUNCTION in extension installation files.
It was never terribly consistent to use OR REPLACE (because of the lack of
comparable functionality for data types, operators, etc), and
experimentation shows that it's now positively pernicious in the extension
world.  We really want a failure to occur if there are any conflicts, else
it's unclear what the extension-ownership state of the conflicted object
ought to be.  Most of the time, CREATE EXTENSION will fail anyway because
of conflicts on other object types, but an extension defining only
functions can succeed, with bad results.
2011-02-13 22:54:52 -05:00
Tom Lane 629b3af27d Convert contrib modules to use the extension facility.
This isn't fully tested as yet, in particular I'm not sure that the
"foo--unpackaged--1.0.sql" scripts are OK.  But it's time to get some
buildfarm cycles on it.

sepgsql is not converted to an extension, mainly because it seems to
require a very nonstandard installation process.

Dimitri Fontaine and Tom Lane
2011-02-13 22:54:49 -05:00
Robert Haas 0d692a0dc9 Basic foreign table support.
Foreign tables are a core component of SQL/MED.  This commit does
not provide a working SQL/MED infrastructure, because foreign tables
cannot yet be queried.  Support for foreign table scans will need to
be added in a future patch.  However, this patch creates the necessary
system catalog structure, syntax support, and support for ancillary
operations such as COMMENT and SECURITY LABEL.

Shigeru Hanada, heavily revised by Robert Haas
2011-01-01 23:48:11 -05:00
Bruce Momjian 5d950e3b0c Stamp copyrights for year 2011. 2011-01-01 13:18:15 -05:00
Magnus Hagander fe9b36fd59 Convert cvsignore to gitignore, and add .gitignore for build targets. 2010-09-22 12:57:04 +02:00
Magnus Hagander 9f2e211386 Remove cvs keywords from all files. 2010-09-20 22:08:53 +02:00
Peter Eisentraut 3f11971916 Remove extra newlines at end and beginning of files, add missing newlines
at end of files.
2010-08-19 05:57:36 +00:00
Magnus Hagander 337b217572 Fix minor typos in comments.
Josh Kupershmidt
2010-04-02 15:19:22 +00:00
Bruce Momjian 0239800893 Update copyright for the year 2010. 2010-01-02 16:58:17 +00:00
Bruce Momjian d747140279 8.4 pgindent run, with new combined Linux/FreeBSD/MinGW typedef list
provided by Andrew.
2009-06-11 14:49:15 +00:00
Tom Lane a1fd650d2b Fix contrib/pageinspect to not create an ABI breakage between 8.3 and 8.4.
The original implementation of the 3-argument form of get_raw_page() risked
core dumps if the 8.3 SQL function definition was mistakenly used with the
8.4 module, which is entirely likely after a dump-and-reload upgrade.  To
protect 8.4 beta testers against upgrade problems, add a check on PG_NARGS.

In passing, fix missed additions to the uninstall script, and polish the
docs a trifle.
2009-06-08 16:22:44 +00:00