This makes almost all core code follow the policy introduced in the
previous commit. Specific decisions:
- Text search support functions with char* and length arguments, such as
prsstart and lexize, may receive unaligned strings. I doubt
maintainers of non-core text search code will notice.
- Use plain VARDATA() on values detoasted or synthesized earlier in the
same function. Use VARDATA_ANY() on varlenas sourced outside the
function, even if they happen to always have four-byte headers. As an
exception, retain the universal practice of using VARDATA() on return
values of SendFunctionCall().
- Retain PG_GETARG_BYTEA_P() in pageinspect. (Page images are too large
for a one-byte header, so this misses no optimization.) Sites that do
not call get_page_from_raw() typically need the four-byte alignment.
- For now, do not change btree_gist. Its use of four-byte headers in
memory is partly entangled with storage of 4-byte headers inside
GBT_VARKEY, on disk.
- For now, do not change gtrgm_consistent() or gtrgm_distance(). They
incorporate the varlena header into a cache, and there are multiple
credible implementation strategies to consider.
The contrib extensions pageinspect, pg_visibility and pgstattuple only
work against regular relations which have storage. They don't work
against foreign tables, partitioned (parent) tables, views, et al.
Add checks to the user-callable functions to return a useful error
message to the user if they mistakenly pass an invalid relation to a
function which doesn't accept that kind of relation.
In passing, improve some of the existing checks to use ereport() instead
of elog(), add a function to consolidate common checks where
appropriate, and add some regression tests.
Author: Amit Langote, with various changes by me
Reviewed by: Michael Paquier and Corey Huinker
Discussion: https://postgr.es/m/ab91fd9d-4751-ee77-c87b-4dd704c1e59c@lab.ntt.co.jp
Since pgstattuple v1.5 hasn't been released yet, no need for a new
extension version. The new function exposes statistics about hash
indexes similar to what other pgstatindex functions return for other
index types.
Ashutosh Sharma, reviewed by Kuntal Ghosh. Substantial further
revisions by me.
It's currently necessary to take a heavyweight lock when scanning a
hash bucket, but pgstattuple only examines individual pages, so it
doesn't need to do this. If, for some hypothetical reason, it did
need to do any heavyweight locking here, this logic would probably
still be incorrect, because most of the locks that it is taking are
meaningless. Only a heavyweight lock on a primary bucket page has any
meaning, but this takes heavyweight locks on all pages regardless of
function - and in particular overflow pages, where you might imagine
that we'd want to lock the primary bucket page if we needed to lock
anything at all.
This is arguably a bug that has existed since this code was added in
commit dab42382f4, but I'm not going to
bother back-patching it because in most cases the only consequence is
that running pgstattuple() on a hash index is a little slower than it
otherwise might be, which is no big deal.
Extracted from a vastly larger patch by Amit Kapila which heavyweight
locking for hash indexes entirely; analysis of why this can be done
independently of the rest by me.
Now that we track initial privileges on extension objects and changes to
those permissions, we can drop the superuser() checks from the various
functions which are part of the pgstattuple extension and rely on the
GRANT system to control access to those functions.
Since a pg_upgrade will preserve the version of the extension which
existed prior to the upgrade, we can't simply modify the existing
functions but instead need to create new functions which remove the
checks and update the SQL-level functions to use the new functions
(and to REVOKE EXECUTE rights on those functions from PUBLIC).
Thanks to Tom and Andres for adding support for extensions to follow
update paths (see: 40b449a), allowing this patch to be much smaller
since no new base version script needed to be included.
Approach suggested by Noah.
Reviewed by Michael Paquier.
Extension scripts should never use CREATE OR REPLACE for initial object
creation. If there is a collision with a pre-existing (probably
user-created) object, we want extension installation to fail, not silently
overwrite the user's object. Bloom and sslinfo both violated this precept.
Also fix a number of scripts that had no standard header (the file name
comment and the \echo...\quit guard). Probably the \echo...\quit hack
is less important now than it was in 9.1 days, but that doesn't mean
that individual extensions get to choose whether to use it or not.
And fix a couple of evident copy-and-pasteos in file name comments.
No need for back-patch: the REPLACE bugs are both new in 9.6, and the
rest of this is pretty much cosmetic.
Andreas Karlsson and Tom Lane
CHECK_PAGE_OFFSET_RANGE() has been unused forever.
CHECK_RELATION_BLOCK_RANGE() has been unused in pgstatindex.c ever since
bt_page_stats() and bt_page_items() functions were moved from pgstattuple
to pageinspect module. It still exists in pageinspect/btreefuncs.c.
Daniel Gustafsson
The reverted changes were intended to force a choice of whether any
newly-added BufferGetPage() calls needed to be accompanied by a
test of the snapshot age, to support the "snapshot too old"
feature. Such an accompanying test is needed in about 7% of the
cases, where the page is being used as part of a scan rather than
positioning for other purposes (such as DML or vacuuming). The
additional effort required for back-patching, and the doubt whether
the intended benefit would really be there, have indicated it is
best just to rely on developers to do the right thing based on
comments and existing usage, as we do with many other conventions.
This change should have little or no effect on generated executable
code.
Motivated by the back-patching pain of Tom Lane and Robert Haas
This patch is a no-op patch which is intended to reduce the chances
of failures of omission once the functional part of the "snapshot
too old" patch goes in. It adds parameters for snapshot, relation,
and an enum to specify whether the snapshot age check needs to be
done for the page at this point. This initial patch passes NULL
for the first two new parameters and BGP_NO_SNAPSHOT_TEST for the
third. The follow-on patch will change the places where the test
needs to be made.
The new bit indicates whether every tuple on the page is already frozen.
It is cleared only when the all-visible bit is cleared, and it can be
set only when we vacuum a page and find that every tuple on that page is
both visible to every transaction and in no need of any future
vacuuming.
A future commit will use this new bit to optimize away full-table scans
that would otherwise be triggered by XID wraparound considerations. A
page which is merely all-visible must still be scanned in that case, but
a page which is all-frozen need not be. This commit does not attempt
that optimization, although that optimization is the goal here. It
seems better to get the basic infrastructure in place first.
Per discussion, it's very desirable for pg_upgrade to automatically
migrate existing VM forks from the old format to the new format. That,
too, will be handled in a follow-on patch.
Masahiko Sawada, reviewed by Kyotaro Horiguchi, Fujii Masao, Amit
Kapila, Simon Riggs, Andres Freund, and others, and substantially
revised by me.
Dead or half-dead index leaf pages were incorrectly reported as live, as a
consequence of a code rearrangement I made (during a moment of severe brain
fade, evidently) in commit d287818eb5.
The index metapage was not counted in index_size, causing that result to
not agree with the actual index size on-disk.
Index root pages were not counted in internal_pages, which is inconsistent
compared to the case of a root that's also a leaf (one-page index), where
the root would be counted in leaf_pages. Aside from that inconsistency,
this could lead to additional transient discrepancies between the reported
page counts and index_size, since it's possible for pgstatindex's scan to
see zero or multiple pages marked as BTP_ROOT, if the root moves due to
a split during the scan. With these fixes, index_size will always be
exactly one page more than the sum of the displayed page counts.
Also, the index_size result was incorrectly documented as being measured in
pages; it's always been measured in bytes. (While fixing that, I couldn't
resist doing some small additional wordsmithing on the pgstattuple docs.)
Including the metapage causes the reported index_size to not be zero for
an empty index. To preserve the desired property that the pgstattuple
regression test results are platform-independent (ie, BLCKSZ configuration
independent), scale the index_size result in the regression tests.
The documentation issue was reported by Otsuka Kenji, and the inconsistent
root page counting by Peter Geoghegan; the other problems noted by me.
Back-patch to all supported branches, because this has been broken for
a long time.
This patch reduces pg_am to just two columns, a name and a handler
function. All the data formerly obtained from pg_am is now provided
in a C struct returned by the handler function. This is similar to
the designs we've adopted for FDWs and tablesample methods. There
are multiple advantages. For one, the index AM's support functions
are now simple C functions, making them faster to call and much less
error-prone, since the C compiler can now check function signatures.
For another, this will make it far more practical to define index access
methods in installable extensions.
A disadvantage is that SQL-level code can no longer see attributes
of index AMs; in particular, some of the crosschecks in the opr_sanity
regression test are no longer possible from SQL. We've addressed that
by adding a facility for the index AM to perform such checks instead.
(Much more could be done in that line, but for now we're content if the
amvalidate functions more or less replace what opr_sanity used to do.)
We might also want to expose some sort of reporting functionality, but
this patch doesn't do that.
Alexander Korotkov, reviewed by Petr Jelínek, and rather heavily
editorialized on by me.
The new function allows to estimate bloat and other table level statics
in a faster, but approximate, way. It does so by using information from
the free space map for pages marked as all visible in the visibility
map. The rest of the table is actually read and free space/bloat is
measured accurately. In many cases that allows to get bloat information
much quicker, causing less IO.
Author: Abhijit Menon-Sen
Reviewed-By: Andres Freund, Amit Kapila and Tomas Vondra
Discussion: 20140402214144.GA28681@kea.toroid.org
Some of the many error messages introduced in 458857cc missed 'FROM
unpackaged'. Also e016b724 and 45ffeb7e forgot to quote extension
version numbers.
Backpatch to 9.1, just like 458857cc which introduced the messages. Do
so because the error messages thrown when the wrong command is copy &
pasted aren't easy to understand.
Prominent binaries already had this metadata. A handful of minor
binaries, such as pg_regress.exe, still lack it; efforts to eliminate
such exceptions are welcome.
Michael Paquier, reviewed by MauMau.
This function continued to use it after heap_endscan() freed it. In
passing, don't explicit create a strategy here. Instead, use the one
created by heap_beginscan_strat(), if any. Back-patch to 9.2, where use
of a BufferAccessStrategy here was introduced.
Because of gcc -Wmissing-prototypes, all functions in dynamically
loadable modules must have a separate prototype declaration. This is
meant to detect global functions that are not declared in header files,
but in cases where the function is called via dfmgr, this is redundant.
Besides filling up space with boilerplate, this is a frequent source of
compiler warnings in extension modules.
We can fix that by creating the function prototype as part of the
PG_FUNCTION_INFO_V1 macro, which such modules have to use anyway. That
makes the code of modules cleaner, because there is one less place where
the entry points have to be listed, and creates an additional check that
functions have the right prototype.
Remove now redundant prototypes from contrib and other modules.
GIN posting lists are now encoded using varbyte-encoding, which allows them
to fit in much smaller space than the straight ItemPointer array format used
before. The new encoding is used for both the lists stored in-line in entry
tree items, and in posting tree leaf pages.
To maintain backwards-compatibility and keep pg_upgrade working, the code
can still read old-style pages and tuples. Posting tree leaf pages in the
new format are flagged with GIN_COMPRESSED flag, to distinguish old and new
format pages. Likewise, entry tree tuples in the new format have a
GIN_ITUP_COMPRESSED flag set in a bit that was previously unused.
This patch bumps GIN_CURRENT_VERSION from 1 to 2. New indexes created with
version 9.4 will therefore have version number 2 in the metapage, while old
pg_upgraded indexes will have version 1. The code treats them the same, but
it might be come handy in the future, if we want to drop support for the
uncompressed format.
Alexander Korotkov and me. Reviewed by Tomas Vondra and Amit Langote.
Tuples belonging to uncommitted transactions should not be
counted as dead.
This is arguably a bug fix that should be back-patched, but
as no one ever noticed until it came time to try to get rid
of SnapshotNow, I'm only doing this in master for now.
This allows us to specify the target relation with several expressions,
'relname', 'schemaname.relname' and OID in all pgstattuple functions.
pgstatindex() and pg_relpages() could not accept OID as the argument
so far.
Per discussion on -hackers, we decided to keep two types of interfaces,
with regclass-type and TEXT-type argument, for each pgstattuple
function because of the backward-compatibility issue. The functions
which have TEXT-type argument will be deprecated in the future release.
Patch by Satoshi Nagayasu, reviewed by Rushabh Lathia and Fujii Masao.
A materialized view has a rule just like a view and a heap and
other physical properties like a table. The rule is only used to
populate the table, references in queries refer to the
materialized data.
This is a minimal implementation, but should still be useful in
many cases. Currently data is only populated "on demand" by the
CREATE MATERIALIZED VIEW and REFRESH MATERIALIZED VIEW statements.
It is expected that future releases will add incremental updates
with various timings, and that a more refined concept of defining
what is "fresh" data will be developed. At some point it may even
be possible to have queries use a materialized in place of
references to underlying tables, but that requires the other
above-mentioned features to be working first.
Much of the documentation work by Robert Haas.
Review by Noah Misch, Thom Brown, Robert Haas, Marko Tiikkaja
Security review by KaiGai Kohei, with a decision on how best to
implement sepgsql still pending.
Extracted from a larger patch by Jaime Casanova, reviewed by Noah Misch.
I think this error message could use some more extensive revision, but
this at least makes the handling of spgist consistent with what we do for
other types of indexes that this code doesn't know how to handle.
We have seen one too many reports of people trying to use 9.1 extension
files in the old-fashioned way of sourcing them in psql. Not only does
that usually not work (due to failure to substitute for MODULE_PATHNAME
and/or @extschema@), but if it did work they'd get a collection of loose
objects not an extension. To prevent this, insert an \echo ... \quit
line that prints a suitable error message into each extension script file,
and teach commands/extension.c to ignore lines starting with \echo.
That should not only prevent any adverse consequences of loading a script
file the wrong way, but make it crystal clear to users that they need to
do it differently now.
Tom Lane, following an idea of Andrew Dunstan's. Back-patch into 9.1
... there is not going to be much value in this if we wait till 9.2.
A similar problem for pgstattuple() was fixed in April of 2010 by commit
33065ef8bc, but pgstatindex() seems to have
been overlooked.
Back-patch all the way, as with that commit, though not to 7.4 through
8.1, since those are now EOL.
For an empty index, the pgstatindex() function would compute 0.0/0.0 for
its avg_leaf_density and leaf_fragmentation outputs. On machines that
follow the IEEE float arithmetic standard with any care, that results in
a NaN. However, per report from Rushabh Lathia, Microsoft couldn't
manage to get this right, so you'd get a bizarre error on Windows.
Fix by forcing the results to be NaN explicitly, rather than relying on
the division operator to give that or the snprintf function to print it
correctly. I have some doubts that this is really the most useful
definition, but it seems better to remain backward-compatible with
those platforms for which the behavior wasn't completely broken.
Back-patch to 8.2, since the code is like that in all current releases.
It was never terribly consistent to use OR REPLACE (because of the lack of
comparable functionality for data types, operators, etc), and
experimentation shows that it's now positively pernicious in the extension
world. We really want a failure to occur if there are any conflicts, else
it's unclear what the extension-ownership state of the conflicted object
ought to be. Most of the time, CREATE EXTENSION will fail anyway because
of conflicts on other object types, but an extension defining only
functions can succeed, with bad results.
This isn't fully tested as yet, in particular I'm not sure that the
"foo--unpackaged--1.0.sql" scripts are OK. But it's time to get some
buildfarm cycles on it.
sepgsql is not converted to an extension, mainly because it seems to
require a very nonstandard installation process.
Dimitri Fontaine and Tom Lane
Foreign tables are a core component of SQL/MED. This commit does
not provide a working SQL/MED infrastructure, because foreign tables
cannot yet be queried. Support for foreign table scans will need to
be added in a future patch. However, this patch creates the necessary
system catalog structure, syntax support, and support for ancillary
operations such as COMMENT and SECURITY LABEL.
Shigeru Hanada, heavily revised by Robert Haas
temporary tables of other sessions; that is unsafe because of the way our
buffer management works. Per report from Stuart Bishop.
This is redundant with the bufmgr.c checks in HEAD, but not at all redundant
in the back branches.
corresponding struct definitions. This allows other headers to avoid including
certain highly-loaded headers such as rel.h and relscan.h, instead using just
relcache.h, heapam.h or genam.h, which are more lightweight and thus cause less
unnecessary dependencies.
unnecessary #include lines in it. Also, move some tuple routine prototypes and
macros to htup.h, which allows removal of heapam.h inclusion from some .c
files.
For this to work, a new header file access/sysattr.h needed to be created,
initially containing attribute numbers of system columns, for pg_dump usage.
While at it, make contrib ltree, intarray and hstore header files more
consistent with our header style.
remove transactions
use create or replace function
make formatting consistent
set search patch on first line
Add documentation on modifying *.sql to set the search patch, and
mention that major upgrades should still run the installation scripts.
Some of these issues were spotted by Tom today.
columns, and the new version can be stored on the same heap page, we no longer
generate extra index entries for the new version. Instead, index searches
follow the HOT-chain links to ensure they find the correct tuple version.
In addition, this patch introduces the ability to "prune" dead tuples on a
per-page basis, without having to do a complete VACUUM pass to recover space.
VACUUM is still needed to clean up dead index entries, however.
Pavan Deolasee, with help from a bunch of other people.
than two independent bits (one of which was never used in heap pages anyway,
or at least hadn't been in a very long time). This gives us flexibility to
add the HOT notions of redirected and dead item pointers without requiring
anything so klugy as magic values of lp_off and lp_len. The state values
are chosen so that for the states currently in use (pre-HOT) there is no
change in the physical representation.
installations whose pg_config program does not appear first in the PATH.
Per gripe from Eddie Stanley and subsequent discussions with Fabien Coelho
and others.
pages it intends to zero immediately. Just to show there is some use for that
function besides WAL recovery :-).
Along the way, fold _hash_checkpage and _hash_pageinit calls into _hash_getbuf
and friends, instead of expecting callers to do that separately.
pointer" in every Snapshot struct. This allows removal of the case-by-case
tests in HeapTupleSatisfiesVisibility, which should make it a bit faster
(I didn't try any performance tests though). More importantly, we are no
longer violating portable C practices by assuming that small integers are
distinct from all pointer values, and HeapTupleSatisfiesDirty no longer
has a non-reentrant API involving side-effects on a global variable.
There were a couple of places calling HeapTupleSatisfiesXXX routines
directly rather than through the HeapTupleSatisfiesVisibility macro.
Since these places had to be changed anyway, I chose to make them go
through the macro for uniformity.
Along the way I renamed HeapTupleSatisfiesSnapshot to HeapTupleSatisfiesMVCC
to emphasize that it's only used with MVCC-type snapshots. I was sorely
tempted to rename HeapTupleSatisfiesVisibility to HeapTupleSatisfiesSnapshot,
but forebore for the moment to avoid confusion and reduce the likelihood that
this patch breaks some of the pending patches. Might want to reconsider
doing that later.
This is an extension of pgstattuple to query information from indexes.
It supports btree, hash and gist. Gin is not supported. It scans only
index pages and does not read corresponding heap tuples. Therefore,
'dead_tuple' means the number of tuples with LP_DELETE flag.
Also, I added an experimental feature for btree indexes. It checks
fragmentation factor of indexes. If an leaf has the right link on the
next adjacent page in the file, it is assumed to be continuous (not
fragmented). It will help us to decide when to REINDEX.
ITAGAKI Takahiro
and RelationNameGetTupleDesc() as deprecated; remove uses of the
latter in the contrib library. Along the way, clean up crosstab()
code and documentation a little.
http://archives.postgresql.org/pgsql-hackers/2004-10/msg00464.php.
This fix is intended to be permanent: it moves the responsibility for
calling SetBufferCommitInfoNeedsSave() into the tqual.c routines,
eliminating the requirement for callers to test whether t_infomask changed.
Also, tighten validity checking on buffer IDs in bufmgr.c --- several
routines were paranoid about out-of-range shared buffer numbers but not
about out-of-range local ones, which seems a tad pointless.
>
> The patch adds missing the "libpgport.a" file to the installation under
> "install-all-headers". It is needed by some contribs. I install the
> library in "pkglibdir", but I was wondering whether it should be "libdir"?
> I was wondering also whether it would make sense to have a "libpgport.so"?
>
> It fixes various macros which are used by contrib makefiles, especially
> libpq_*dir and LDFLAGS when used under PGXS. It seems to me that they are
> needed to
>
> It adds the ability to test and use PGXS with contribs, with "make
> USE_PGXS=1". Without the macro, this is exactly as before, there should be
> no difference, esp. wrt the vpath feature that seemed broken by previous
> submission. So it should not harm anybody, and it is useful at least to me.
>
> It fixes some inconsistencies in various contrib makefiles
> (useless override, ":=" instead of "=").
Fabien COELHO
costing us lots more to maintain than it was worth. On shared tables
it was of exactly zero benefit because we couldn't trust it to be
up to date. On temp tables it sometimes saved an lseek, but not often
enough to be worth getting excited about. And the real problem was that
we forced an lseek on every relcache flush in order to update the field.
So all in all it seems best to lose the complexity.
results with tuples as ordinary varlena Datums. This commit does not
in itself do much for us, except eliminate the horrid memory leak
associated with evaluation of whole-row variables. However, it lays the
groundwork for allowing composite types as table columns, and perhaps
some other useful features as well. Per my proposal of a few days ago.