Logical messages, added in 3fe3511d05, during decoding failed to filter
messages emitted in other databases and messages emitted "under" a
replication origin the output plugin isn't interested in.
Add tests to verify that both types of filtering actually work. While
touching message.sql remove hunk obsoleted by d25379e.
Bump XLOG_PAGE_MAGIC because xl_logical_message changed and because
3fe3511d05 had omitted doing so. 3fe3511d05 additionally didn't bump
catversion, but 7a542700d has done so since.
Author: Petr Jelinek
Reported-By: Andres Freund
Discussion: 20160406142513.wotqy3ba3kanr423@alap3.anarazel.de
Rename this function to GenericXLogRegisterBuffer() to make it clearer
what it does, and leave room for other sorts of "register" actions in
future. Also, replace its "bool isNew" argument with an integer flags
argument, so as to allow adding more flags in future without an API
break.
Alexander Korotkov, adjusted slightly by me
Added to ensure that bloom index pages can be distinguished from other pages
by pg_filedump. Because there wasn't any public/production versions before,
it doesn't pay attention to any compatibility issues.
Per notice from Tom Lane
Pinning/Unpinning a buffer is a very frequent operation; especially in
read-mostly cache resident workloads. Benchmarking shows that in various
scenarios the spinlock protecting a buffer header's state becomes a
significant bottleneck. The problem can be reproduced with pgbench -S on
larger machines, but can be considerably worse for queries which touch
the same buffers over and over at a high frequency (e.g. nested loops
over a small inner table).
To allow atomic operations to be used, cram BufferDesc's flags,
usage_count, buf_hdr_lock, refcount into a single 32bit atomic variable;
that allows to manipulate them together using 32bit compare-and-swap
operations. This requires reducing MAX_BACKENDS to 2^18-1 (which could
be lifted by using a 64bit field, but it's not a realistic configuration
atm).
As not all operations can easily implemented in a lockfree manner,
implement the previous buf_hdr_lock via a flag bit in the atomic
variable. That way we can continue to lock the header in places where
it's needed, but can get away without acquiring it in the more frequent
hot-paths. There's some additional operations which can be done without
the lock, but aren't in this patch; but the most important places are
covered.
As bufmgr.c now essentially re-implements spinlocks, abstract the delay
logic from s_lock.c into something more generic. It now has already two
users, and more are coming up; there's a follupw patch for lwlock.c at
least.
This patch is based on a proof-of-concept written by me, which Alexander
Korotkov made into a fully working patch; the committed version is again
revised by me. Benchmarking and testing has, amongst others, been
provided by Dilip Kumar, Alexander Korotkov, Robert Haas.
On a large x86 system improvements for readonly pgbench, with a high
client count, of a factor of 8 have been observed.
Author: Alexander Korotkov and Andres Freund
Discussion: 2400449.GjM57CE0Yg@dinodell
Originally, this test created a 100000-row test table, which made it
run rather slowly compared to other contrib tests. Investigation with
gcov showed that we got no further improvement in code coverage after
the first 700 or so rows, making the large table 99% a waste of time.
Cut it back to 2000 rows to fix the runtime problem and still leave
some headroom for testing behaviors that may appear later.
A closer look at the gcov results showed that the main coverage
omissions in contrib/bloom occurred because the test never filled more
than one entry in the notFullPage array; which is unsurprising because
it exercised index cleanup only in the scenario of complete table
deletion, allowing every page in the index to become deleted rather
than not-full. Add testing that allows the not-full path to be
exercised as well.
Also, test the amvalidate function, because blvalidate.c had zero
coverage without that, and besides it's a good idea to check for
mistakes in the bloom opclass definitions.
That routine is dangerous, and unnecessary once we get rid of this
one caller.
In passing, fix failure to clean up temp memory context, or switch
back to caller's context, during slowest exit path.
This feature is controlled by a new old_snapshot_threshold GUC. A
value of -1 disables the feature, and that is the default. The
value of 0 is just intended for testing. Above that it is the
number of minutes a snapshot can reach before pruning and vacuum
are allowed to remove dead tuples which the snapshot would
otherwise protect. The xmin associated with a transaction ID does
still protect dead tuples. A connection which is using an "old"
snapshot does not get an error unless it accesses a page modified
recently enough that it might not be able to produce accurate
results.
This is similar to the Oracle feature, and we use the same SQLSTATE
and error message for compatibility.
This patch is a no-op patch which is intended to reduce the chances
of failures of omission once the functional part of the "snapshot
too old" patch goes in. It adds parameters for snapshot, relation,
and an enum to specify whether the snapshot age check needs to be
done for the page at this point. This initial patch passes NULL
for the first two new parameters and BGP_NO_SNAPSHOT_TEST for the
third. The follow-on patch will change the places where the test
needs to be made.
Now indexes (but only B-tree for now) can contain "extra" column(s) which
doesn't participate in index structure, they are just stored in leaf
tuples. It allows to use index only scan by using single index instead
of two or more indexes.
Author: Anastasia Lubennikova with minor editorializing by me
Reviewers: David Rowley, Peter Geoghegan, Jeff Janes
Patch introduces new text search operator (<-> or <DISTANCE>) into tsquery.
On-disk and binary in/out format of tsquery are backward compatible.
It has two side effect:
- change order for tsquery, so, users, who has a btree index over tsquery,
should reindex it
- less number of parenthesis in tsquery output, and tsquery becomes more
readable
Authors: Teodor Sigaev, Oleg Bartunov, Dmitry Ivanov
Reviewers: Alexander Korotkov, Artur Zakirov
API and mechanism to allow generic messages to be inserted into WAL that are
intended to be read by logical decoding plugins. This commit adds an optional
new callback to the logical decoding API.
Messages are either text or bytea. Messages can be transactional, or not, and
are identified by a prefix to allow multiple concurrent decoding plugins.
(Not to be confused with Generic WAL records, which are intended to allow crash
recovery of extensible objects.)
Author: Petr Jelinek and Andres Freund
Reviewers: Artur Zakirov, Tomas Vondra, Simon Riggs
Discussion: 5685F999.6010202@2ndquadrant.com
The restore() function assumed that the result of sprintf() with %e format
would necessarily contain an 'e', which is false: what if the supplied
number is an infinity or NaN? If that did happen, we'd get a
null-pointer-dereference core dump. The case appears impossible currently,
because seg_in() does not accept such values, and there are no seg-creating
functions that would create one. But it seems unwise to rely on it never
happening in future.
Quite aside from that, the code was pretty ugly: it relied on modifying a
static format string when it could use a "*" precision argument, and it
used strtok() entirely gratuitously, and it stripped off trailing spaces
by hand instead of just not asking for them to begin with.
Coverity noticed the potential null pointer dereference (though I wonder
why it didn't complain years ago, since this code is ancient).
Since this is just code cleanup and forestalling a hypothetical future
bug, there seems no need for back-patching.
The code was supposing that rd_amcache wouldn't disappear from under it
during a scan; which is wrong. Copy the data out of the relcache rather
than trying to reference it there.
Coverity complained about implicit sign-extension in the
BloomPageGetFreeSpace macro, probably because sizeOfBloomTuple isn't wide
enough for size calculations. No overflow is really possible as long as
maxoff and sizeOfBloomTuple are small enough to represent a realistic
situation, but it seems like a good idea to declare sizeOfBloomTuple as
Size not int32.
Add missing check on BloomPageAddItem() result, again from Coverity.
Avoid core dump due to not allocating so->sign array when
scan->numberOfKeys is zero. Also thanks to Coverity.
Use FLEXIBLE_ARRAY_MEMBER rather than declaring an array as size 1
when it isn't necessarily.
Very minor beautification of related code.
Unfortunately, none of the Coverity-detected mistakes look like they
could account for the remaining buildfarm unhappiness with this
module. It's barely possible that the FLEXIBLE_ARRAY_MEMBER mistake
does account for that, if it's enabling bogus compiler optimizations;
but I'm not terribly optimistic. We probably still have bugs to
find here.
Looking at result of buildfarm member jaguarundi it seems to me that
BloomOptions isn't inited sometime, but I don't see yet how it's possible.
Nevertheless, check of signature length's is missed, so, add
a limit of it. Also add missed GenericXLogAbort() in case of already
deleted page in vacuum + minor code refactoring.
Module provides new access method. It is actually a simple Bloom filter
implemented as pgsql's index. It could give some benefits on search
with large number of columns.
Module is a single way to test generic WAL interface committed earlier.
Author: Teodor Sigaev, Alexander Korotkov
Reviewers: Aleksander Alekseev, Michael Paquier, Jim Nasby
Commit fbe5a3fb73 accidentally changed
this behavior; put things back the way they were, and add some
regression tests.
Report by Andres Freund; patch by Ashutosh Bapat, with a bit of
kibitzing by me.
brin_page_type() and brin_metapage_info() did not enforce being called
by superuser, like other pageinspect functions that take bytea do.
Since they don't verify the passed page thoroughly, it is possible to
use them to read the server memory with a carefully crafted bytea value,
up to a file kilobytes from where the input bytea is located.
Have them throw errors if called by a non-superuser.
Report and initial patch: Andreas Seltenreich
Security: CVE-2016-3065
A join clause might mention multiple relations on either side, so it
need not be the case that a given joinrel's constituent relations are
all on one side of the join clause or all on the other.
Report by Rajkumar Raghuwanshi. Analysis and fix by Michael Paquier
and Ashutosh Bapat.
The two get_tle_by_resno() calls introduced by this commit lacked any
check for a NULL return, unlike any other calls of that function anywhere
in our tree. Coverity quite properly complained about it. Also fix a
misindented line in process_query_params(), which Coverity also complained
about on the grounds that the bad indentation suggested possible programmer
misinterpretation.
postgres_fdw can now sent an UPDATE or DELETE statement directly to
the foreign server in simple cases, rather than sending a SELECT FOR
UPDATE statement and then updating or deleting rows one-by-one.
Etsuro Fujita, reviewed by Rushabh Lathia, Shigeru Hanada, Kyotaro
Horiguchi, Albe Laurenz, Thom Brown, and me.
Deprecated set_limit() is modified to use SetConfigOption() to set
similarity_threshold which is actually an instance of
pg_trgm.similarity_threshold GUC variable. Previous coding directly sets
similarity_threshold what could cause an inconsistency between states of
actual variable and GUC representation.
Per gripe from Tom Lane
Patch introduces a concept of similarity over string and just a word from
another string.
Version of extension is not changed because 1.2 was already introduced in 9.6
release cycle, so, there wasn't a public version.
Author: Alexander Korotkov, Artur Zakirov
Use GUC variable pg_trgm.similarity_threshold insead of
set_limit()/show_limit() which was introduced when defining GUC varuables
by modules was absent.
Author: Artur Zakirov
There's no reason for this function to do this for every other
attribute number and omit it for CTID, especially since
conversion_error_callback has code to handle that case. This seems
to be an oversight in commit e690b95150.
Etsuro Fujita
Although the default choice of rel->reltarget should typically be
sufficient for scan or join paths, it's not at all sufficient for the
purposes PathTargets were invented for; in particular not for
upper-relation Paths. So break API compatibility by adding a PathTarget
argument to create_foreignscan_path(). To ease updating of existing
code, accept a NULL value of the argument as selecting rel->reltarget.
In commit 19a541143a I did not make PathTarget a subtype of Node,
and embedded a RelOptInfo's reltarget directly into it rather than having
a separately-allocated Node. In hindsight that was misguided
micro-optimization, enabled by the fact that at that point we didn't have
any Paths with custom PathTargets. Now that PathTarget processing has
been fleshed out some more, it's easier to see that it's better to have
PathTarget as an indepedent Node type, even if it does cost us one more
palloc to create a RelOptInfo. So change it while we still can.
This commit just changes the representation, without doing anything more
interesting than that.
This patch widens SPI_processed, EState's es_processed field, PortalData's
portalPos field, FuncCallContext's call_cntr and max_calls fields,
ExecutorRun's count argument, PortalRunFetch's result, and the max number
of rows in a SPITupleTable to uint64, and deals with (I hope) all the
ensuing fallout. Some of these values were declared uint32 before, and
others "long".
I also removed PortalData's posOverflow field, since that logic seems
pretty useless given that portalPos is now always 64 bits.
The user-visible results are that command tags for SELECT etc will
correctly report tuple counts larger than 4G, as will plpgsql's GET
GET DIAGNOSTICS ... ROW_COUNT command. Queries processing more tuples
than that are still not exactly the norm, but they're becoming more
common.
Most values associated with FETCH/MOVE distances, such as PortalRun's count
argument and the count argument of most SPI functions that have one, remain
declared as "long". It's not clear whether it would be worth promoting
those to int64; but it would definitely be a large dollop of additional
API churn on top of this, and it would only help 32-bit platforms which
seem relatively less likely to see any benefit.
Andreas Scherbaum, reviewed by Christian Ullrich, additional hacking by me
New configuration parameter auto_explain.sample_ratio makes it
possible to log just a fraction of the queries meeting the configured
threshold, to reduce the amount of logging.
Author: Craig Ringer and Julien Rouhaud
Review: Petr Jelinek
In commit 1d97c19a0f and later c1d9579dd8, we extended
pull_var_clause's API by adding enum-type arguments. That's sort of a pain
to maintain, though, because it means every time we add a new behavior we
must touch every last one of the call sites, even if there's a reasonable
default behavior that most of them could use. Let's switch over to using a
bitmask of flags, instead; that seems more maintainable and might save a
nanosecond or two as well. This commit changes no behavior in itself,
though I'm going to follow it up with one that does add a new behavior.
In passing, remove flatten_tlist(), which has not been used since 9.1
and would otherwise need the same API changes.
Removing these enums means that optimizer/tlist.h no longer needs to
depend on optimizer/var.h. Changing that caused a number of C files to
need addition of #include "optimizer/var.h" (probably we can thank old
runs of pgrminclude for that); but on balance it seems like a good change
anyway.
Renaming a file using rename(2) is not guaranteed to be durable in face
of crashes. Use the previously added durable_rename()/durable_link_or_rename()
in various places where we previously just renamed files.
Most of the changed call sites are arguably not critical, but it seems
better to err on the side of too much durability. The most prominent
known case where the previously missing fsyncs could cause data loss is
crashes at the end of a checkpoint. After the actual checkpoint has been
performed, old WAL files are recycled. When they're filled, their
contents are fdatasynced, but we did not fsync the containing
directory. An OS/hardware crash in an unfortunate moment could then end
up leaving that file with its old name, but new content; WAL replay
would thus not replay it.
Reported-By: Tomas Vondra
Author: Michael Paquier, Tomas Vondra, Andres Freund
Discussion: 56583BDD.9060302@2ndquadrant.com
Backpatch: All supported branches
pgcrypto already supports key-stretching during symmetric encryption,
including the salted-and-iterated method; but the number of iterations
was not configurable. This commit implements a new s2k-count parameter
to pgp_sym_encrypt() which permits selecting a larger number of
iterations.
Author: Jeff Janes
Commit ccd8f97922 gave us the ability to
request that the remote side sort the data, and, later, commit
e4106b2528 gave us the ability to
request that the remote side perform the join for us rather than doing
it locally. But we could not do both things at the same time: a
remote SQL query that had an ORDER BY clause would never be a join.
This commit adds that capability.
Ashutosh Bapat, reviewed by me.
ltree/ltree_gist/ltxtquery's headers stores data at MAXALIGN alignment,
requiring some padding bytes. So far we left these uninitialized. Zero
those by using palloc0.
Author: Andres Freund
Reported-By: Andres Freund / valgrind / buildarm animal skink
Backpatch: 9.1-
This lets you examine the visibility map as well as page-level
visibility information. I initially wrote it as a debugging aid,
but was encouraged to polish it for commit.
Patch by me, reviewed by Masahiko Sawada.
Discussion: 56D77803.6080503@BlueTreble.com
When decoding the old version of an UPDATE or DELETE change, and if that
tuple was bigger than MaxHeapTupleSize, we either Assert'ed out, or
failed in more subtle ways in non-assert builds. Normally individual
tuples aren't bigger than MaxHeapTupleSize, with big datums toasted.
But that's not the case for the old version of a tuple for logical
decoding; the replica identity is logged as one piece. With the default
replica identity btree limits that to small tuples, but that's not the
case for FULL.
Change the tuple buffer infrastructure to separate allocate over-large
tuples, instead of always going through the slab cache.
This unfortunately requires changing the ReorderBufferTupleBuf
definition, we need to store the allocated size someplace. To avoid
requiring output plugins to recompile, don't store HeapTupleHeaderData
directly after HeapTupleData, but point to it via t_data; that leaves
rooms for the allocated size. As there's no reason for an output plugin
to look at ReorderBufferTupleBuf->t_data.header, remove the field. It
was just a minor convenience having it directly accessible.
Reported-By: Adam Dratwiński
Discussion: CAKg6ypLd7773AOX4DiOGRwQk1TVOQKhNwjYiVjJnpq8Wo+i62Q@mail.gmail.com
Somehow I managed to flip the order of restoring old & new tuples when
de-spooling a change in a large transaction from disk. This happens to
only take effect when a change is spooled to disk which has old/new
versions of the tuple. That only is the case for UPDATEs where he
primary key changed or where replica identity is changed to FULL.
The tests didn't catch this because either spooled updates, or updates
that changed primary keys, were tested; not both at the same time.
Found while adding tests for the following commit.
Backpatch: 9.4, where logical decoding was added
Logical decoding's reorderbuffer keeps transactions in an LSN ordered
list for efficiency. To make that's efficiently possible upper-level
xids are forced to be logged before nested subtransaction xids. That
only works though if these records are all looked at: Unfortunately we
didn't do so for e.g. row level locks, which are otherwise uninteresting
for logical decoding.
This could lead to errors like:
"ERROR: subxact logged without previous toplevel record".
It's not sufficient to just look at row locking records, the xid could
appear first due to a lot of other types of records (which will trigger
the transaction to be marked logged with MarkCurrentTransactionIdLoggedIfAny).
So invent infrastructure to tell reorderbuffer about xids seen, when
they'd otherwise not pass through reorderbuffer.c.
Reported-By: Jarred Ward
Bug: #13844
Discussion: 20160105033249.1087.66040@wrigleys.postgresql.org
Backpatch: 9.4, where logical decoding was added
Previously, we included NULLS FIRST when appropriate but relied on the
default behavior to be NULLS LAST. This is, however, not true for a
sort in descending order and seems like a fragile assumption anyway.
Report by Rajkumar Raghuwanshi. Patch by Ashutosh Bapat. Review
comments from Michael Paquier and Tom Lane.
Otherwise running installcheck-force on a server with
synchronous_commit=off will result in the tests failing. All the other
tests already do so...
Backpatch: 9.4, where logical decoding was added
When adding replication origins in 5aa235042, I somehow managed to set
the timestamp of decoded transactions to InvalidXLogRecptr when decoding
one made without a replication origin. Fix that, and the wrong type of
the new commit_time variable.
This didn't trigger a regression test failure because we explicitly
don't show commit timestamps in the regression tests, as they obviously
are variable. Add a test that checks that a decoded commit's timestamp
is within minutes of NOW() from before the commit.
Reported-By: Weiping Qu
Diagnosed-By: Artur Zakirov
Discussion: 56D4197E.9050706@informatik.uni-kl.de,
56D42918.1010108@postgrespro.ru
Backpatch: 9.5, where 5aa235042 originates.
The new bit indicates whether every tuple on the page is already frozen.
It is cleared only when the all-visible bit is cleared, and it can be
set only when we vacuum a page and find that every tuple on that page is
both visible to every transaction and in no need of any future
vacuuming.
A future commit will use this new bit to optimize away full-table scans
that would otherwise be triggered by XID wraparound considerations. A
page which is merely all-visible must still be scanned in that case, but
a page which is all-frozen need not be. This commit does not attempt
that optimization, although that optimization is the goal here. It
seems better to get the basic infrastructure in place first.
Per discussion, it's very desirable for pg_upgrade to automatically
migrate existing VM forks from the old format to the new format. That,
too, will be handled in a follow-on patch.
Masahiko Sawada, reviewed by Kyotaro Horiguchi, Fujii Masao, Amit
Kapila, Simon Riggs, Andres Freund, and others, and substantially
revised by me.
This is basically a bug fix; the old code assumes that a ForeignScan
is always parallel-safe, but for postgres_fdw, for example, this is
definitely false. It should be true for file_fdw, though, since a
worker can read a file from the filesystem just as well as any other
backend process.
Original patch by Thomas Munro. Documentation, and changes to the
comments, by me.
list_concat(list_concat(a, b), c) destructively changes both a and b;
to avoid such perils, copy lists of remote_conds before incorporating
them into larger lists via list_concat().
Ashutosh Bapat, per a report from Etsuro Fujita
Up to now, there's been an assumption that all Paths for a given relation
compute the same output column set (targetlist). However, there are good
reasons to remove that assumption. For example, an indexscan on an
expression index might be able to return the value of an expensive function
"for free". While we have the ability to generate such a plan today in
simple cases, we don't have a way to model that it's cheaper than a plan
that computes the function from scratch, nor a way to create such a plan
in join cases (where the function computation would normally happen at
the topmost join node). Also, we need this so that we can have Paths
representing post-scan/join steps, where the targetlist may well change
from one step to the next. Therefore, invent a "struct PathTarget"
representing the columns we expect a plan step to emit. It's convenient
to include the output tuple width and tlist evaluation cost in this struct,
and there will likely be additional fields in future.
While Path nodes that actually do have custom outputs will need their own
PathTargets, it will still be true that most Paths for a given relation
will compute the same tlist. To reduce the overhead added by this patch,
keep a "default PathTarget" in RelOptInfo, and allow Paths that compute
that column set to just point to their parent RelOptInfo's reltarget.
(In the patch as committed, actually every Path is like that, since we
do not yet have any cases of custom PathTargets.)
I took this opportunity to provide some more-honest costing of
PlaceHolderVar evaluation. Up to now, the assumption that "scan/join
reltargetlists have cost zero" was applied not only to Vars, where it's
reasonable, but also PlaceHolderVars where it isn't. Now, we add the eval
cost of a PlaceHolderVar's expression to the first plan level where it can
be computed, by including it in the PathTarget cost field and adding that
to the cost estimates for Paths. This isn't perfect yet but it's much
better than before, and there is a way forward to improve it more. This
costing change affects the join order chosen for a couple of the regression
tests, changing expected row ordering.
Dead or half-dead index leaf pages were incorrectly reported as live, as a
consequence of a code rearrangement I made (during a moment of severe brain
fade, evidently) in commit d287818eb5.
The index metapage was not counted in index_size, causing that result to
not agree with the actual index size on-disk.
Index root pages were not counted in internal_pages, which is inconsistent
compared to the case of a root that's also a leaf (one-page index), where
the root would be counted in leaf_pages. Aside from that inconsistency,
this could lead to additional transient discrepancies between the reported
page counts and index_size, since it's possible for pgstatindex's scan to
see zero or multiple pages marked as BTP_ROOT, if the root moves due to
a split during the scan. With these fixes, index_size will always be
exactly one page more than the sum of the displayed page counts.
Also, the index_size result was incorrectly documented as being measured in
pages; it's always been measured in bytes. (While fixing that, I couldn't
resist doing some small additional wordsmithing on the pgstattuple docs.)
Including the metapage causes the reported index_size to not be zero for
an empty index. To preserve the desired property that the pgstattuple
regression test results are platform-independent (ie, BLCKSZ configuration
independent), scale the index_size result in the regression tests.
The documentation issue was reported by Otsuka Kenji, and the inconsistent
root page counting by Peter Geoghegan; the other problems noted by me.
Back-patch to all supported branches, because this has been broken for
a long time.
If we've got a relatively straightforward join between two tables,
this pushes that join down to the remote server instead of fetching
the rows for each table and performing the join locally. Some cases
are not handled yet, such as SEMI and ANTI joins. Also, we don't
yet attempt to create presorted join paths or parameterized join
paths even though these options do get tried for a base relation
scan. Nevertheless, this seems likely to be a very significant win
in many practical cases.
Shigeru Hanada and Ashutosh Bapat, reviewed by Robert Haas, with
additional review at various points by Tom Lane, Etsuro Fujita,
KaiGai Kohei, and Jeevan Chalke.
deparseReturningList ended up adding up RETURNING NULL to the code, but
code elsewhere saw an empty list of attributes and concluded that it
should not expect tuples from the remote side.
Etsuro Fujita and Robert Haas, reviewed by Thom Brown
The previous RequestAddinLWLocks() method had several disadvantages.
First, the locks would be in the main tranche; we've recently decided
that it's useful for LWLocks used for separate purposes to have
separate tranche IDs. Second, there wasn't any correlation between
what code called RequestAddinLWLocks() and what code called
LWLockAssign(); when multiple modules are in use, it could become
quite difficult to troubleshoot problems where LWLockAssign() ran out
of locks. To fix, create a concept of named LWLock tranches which
can be used either by extension or by core code.
Amit Kapila and Robert Haas
Commit e09996ff8d removed some ad-hoc code in hstore_to_json_loose
that determined whether an hstore value string looked like a number,
in favor of calling the JSON parser's is-it-a-number code. However,
it neglected the fact that the exact same code appeared in
hstore_to_jsonb_loose.
This is not a bug, exactly, because the requirements on the two functions
are not the same: hstore_to_json_loose must accept only syntactically legal
JSON numbers as numbers, or it will produce invalid JSON output, as per bug
#12070 which spawned the prior commit. But hstore_to_jsonb_loose could
accept anything that numeric_in will eat, other than Inf and NaN.
Nonetheless it seems surprising and arbitrary that the two functions don't
use the same rules for what is a number versus what is a string; especially
since they did use the same rules before the aforesaid commit. For one
thing, that means that doing hstore_to_json_loose and then casting to jsonb
can produce results different from doing just hstore_to_jsonb_loose.
Hence, change hstore_to_jsonb_loose's logic to match hstore_to_json_loose,
ie, hstore values are treated as numbers when they match the JSON syntax
for numbers.
No back-patch, since this is more in the nature of a definitional change
than a bug fix.
Remove duplicate assignment. This part by Ashutosh Bapat.
Remove now-obsolete comment. This part by me, although the pending
join pushdown patch does something similar, and for the same reason:
there's no reason to keep two lists of the things in the fdw_private
structure that have to be kept in sync with each other.
The default fetch size of 100 rows might not be right in every
environment, so allow users to configure it.
Corey Huinker, reviewed by Kyotaro Horiguchi, Andres Freund, and me.
Commit e09996ff8d was one brick shy of a load: it didn't insist
that the detected JSON number be the whole of the supplied string.
This allowed inputs such as "2016-01-01" to be misdetected as valid JSON
numbers. Per bug #13906 from Dmitry Ryabov.
In passing, be more wary of zero-length input (I'm not sure this can
happen given current callers, but better safe than sorry), and do some
minor cosmetic cleanup.
The code that generates a complete SQL query for a given foreign relation
was repeated in two places, and they didn't quite agree: the EXPLAIN case
left out the locking clause. Centralize the code so we get the same
behavior everywhere, and adjust calling conventions and which functions
are static vs. extern accordingly . Centralize the code so we get the same
behavior everywhere, and adjust calling conventions and which functions
are static vs. extern accordingly.
Ashutosh Bapat, reviewed and slightly adjusted by me.
listForeignTables' invocation of processSQLNamePattern did not match up
with the other ones that handle potentially-schema-qualified names; it
failed to make use of pg_table_is_visible() and also passed the name
arguments in the wrong order. Bug seems to have been aboriginal in commit
0d692a0dc9. It accidentally sort of worked as long as you didn't
inquire too closely into the behavior, although the silliness was later
exposed by inconsistencies in the test queries added by 59efda3e50
(which I probably should have questioned at the time, but didn't).
Per bug #13899 from Reece Hart. Patch by Reece Hart and Tom Lane.
Back-patch to all affected branches.
The upcoming patch to allow join pushdown in postgres_fdw needs to use
this code multiple times, which requires moving it to deparse.c. That
seems like a good idea anyway, so do that now both on general principle
and to simplify the future patch.
Inspired by a patch by Shigeru Hanada and Ashutosh Bapat, but I did
it a little differently than what that patch did.
Previously, postgres_fdw's connection cache was keyed by user OID and
server OID, but this can lead to multiple connections when it's not
really necessary. In particular, if all relevant users are mapped to
the public user mapping, then their connection options are certainly
the same, so one connection can be used for all of them.
While we're cleaning things up here, drop the "server" argument to
GetConnection(), which isn't really needed. This saves a few cycles
because callers no longer have to look this up; the function itself
does, but only when establishing a new connection, not when reusing
an existing one.
Ashutosh Bapat, with a few small changes by me.
Commit e529cd4ffa introduced an Assert requiring NAMEDATALEN to be
less than MAX_LEVENSHTEIN_STRLEN, which has been 255 for a long time.
Since up to that instant we had always allowed NAMEDATALEN to be
substantially more than that, this was ill-advised.
It's debatable whether we need MAX_LEVENSHTEIN_STRLEN at all (versus
putting a CHECK_FOR_INTERRUPTS into the loop), or whether it has to be
so tight; but this patch takes the narrower approach of just not applying
the MAX_LEVENSHTEIN_STRLEN limit to calls from the parser.
Trusting the parser for this seems reasonable, first because the strings
are limited to NAMEDATALEN which is unlikely to be hugely more than 256,
and second because the maximum distance is tightly constrained by
MAX_FUZZY_DISTANCE (though we'd forgotten to make use of that limit in one
place). That means the cost is not really O(mn) but more like O(max(m,n)).
Relaxing the limit for user-supplied calls is left for future research;
given the lack of complaints to date, it doesn't seem very high priority.
In passing, fix confusion between lengths-in-bytes and lengths-in-chars
in comments and error messages.
Per gripe from Kevin Day; solution suggested by Robert Haas. Back-patch
to 9.5 where the unwanted restriction was introduced.
GIN had some minor issues too, mostly using "internal" where something
else would be more appropriate. I went with the same approach as in
9ff60273e3, namely preferring the opclass' indexed datatype for
arguments that receive an operator RHS value, even if that's not
necessarily what they really are.
Again, this is with an eye to having a uniform rule for ginvalidate()
to check support function signatures.
The conventions specified by the GiST SGML documentation were widely
ignored. For example, the strategy-number argument for "consistent" and
"distance" functions is specified to be a smallint, but most of the
built-in support functions declared it as an integer, and for that matter
the core code passed it using Int32GetDatum not Int16GetDatum. None of
that makes any real difference at runtime, but it's quite confusing for
newcomers to the code, and it makes it very hard to write an amvalidate()
function that checks support function signatures. So let's try to instill
some consistency here.
Another similar issue is that the "query" argument is not of a single
well-defined type, but could have different types depending on the strategy
(corresponding to search operators with different righthand-side argument
types). Some of the functions threw up their hands and declared the query
argument as being of "internal" type, which surely isn't right ("any" would
have been more appropriate); but the majority position seemed to be to
declare it as being of the indexed data type, corresponding to a search
operator with both input types the same. So I've specified a convention
that that's what to do always.
Also, the result of the "union" support function actually must be of the
index's storage type, but the documentation suggested declaring it to
return "internal", and some of the functions followed that. Standardize
on telling the truth, instead.
Similarly, standardize on declaring the "same" function's inputs as
being of the storage type, not "internal".
Also, somebody had forgotten to add the "recheck" argument to both
the documentation of the "distance" support function and all of their
SQL declarations, even though the C code was happily using that argument.
Clean that up too.
Fix up some other omissions in the docs too, such as documenting that
union's second input argument is vestigial.
So far as the errors in core function declarations go, we can just fix
pg_proc.h and bump catversion. Adjusting the erroneous declarations in
contrib modules is more debatable: in principle any change in those
scripts should involve an extension version bump, which is a pain.
However, since these changes are purely cosmetic and make no functional
difference, I think we can get away without doing that.
This patch reduces pg_am to just two columns, a name and a handler
function. All the data formerly obtained from pg_am is now provided
in a C struct returned by the handler function. This is similar to
the designs we've adopted for FDWs and tablesample methods. There
are multiple advantages. For one, the index AM's support functions
are now simple C functions, making them faster to call and much less
error-prone, since the C compiler can now check function signatures.
For another, this will make it far more practical to define index access
methods in installable extensions.
A disadvantage is that SQL-level code can no longer see attributes
of index AMs; in particular, some of the crosschecks in the opr_sanity
regression test are no longer possible from SQL. We've addressed that
by adding a facility for the index AM to perform such checks instead.
(Much more could be done in that line, but for now we're content if the
amvalidate functions more or less replace what opr_sanity used to do.)
We might also want to expose some sort of reporting functionality, but
this patch doesn't do that.
Alexander Korotkov, reviewed by Petr Jelínek, and rather heavily
editorialized on by me.
Commit 866566a690 is insufficient to prevent dump/reload failures
when using transform modules in a database with both plpython2 and
plpython3 installed. The reason is that the transform extension scripts
use DO blocks as a mechanism to pull in the libpython library before
creating the transform function. It's necessary to preload the library
because the dynamic loader won't do it for us on every platform, leading
to "unresolved symbol" failures when the transform library is loaded.
But it's *not* necessary to execute Python code, and doing so will
provoke a multiple-Pythons-are-loaded error even after the preceding
commit.
To fix, use LOAD instead of a DO block. That requires superuser privilege,
but creation of a C function does anyway. It also embeds knowledge of
the underlying library name for each PL language; but that's wired into
the initdb-time contents of pg_pltemplate too, so that doesn't seem like
a large problem either. Note that CREATE TRANSFORM as such doesn't call
the language module at all.
Per a report from Paul Jones. Back-patch to 9.5 where transform modules
were introduced.
The order of inclusion of .o files makes a difference in linker output;
not a functional difference, but still a bitwise difference, which annoys
some packagers who would like reproducible builds.
Report and patch by Christoph Berg
On closer inspection, the reason copyright.pl was missing files is
that it is looking for 'Copyright (c)' and they had 'Copyright (C)'.
Fix that, and update a couple more that grepping for that revealed.
postgresImportForeignSchema pays attention to IMPORT's EXCEPT and LIMIT TO
options, but only as an efficiency hack, not for correctness' sake. The
FDW documentation does explain that, but someone using postgres_fdw.c
as a coding guide might not remember it, so let's add a comment here.
Per question from Regina Obe.
Commit 33bd250f6c could have done with
some more review:
Adjust coding so that compilers unfamiliar with elog/ereport don't complain
about uninitialized values.
Fix misuse of PG_GETARG_INT16 to retrieve arguments declared as "integer"
at the SQL level. (This was evidently copied from cube_ll_coord and
cube_ur_coord, but those were wrong too.)
Fix non-style-guide-conforming error messages.
Fix underparenthesized if statements, which pgindent would have made a
hash of, and remove some unnecessary parens elsewhere.
Run pgindent over new code.
Revise documentation: repeated accretion of more operators without any
rethinking of the text already there had left things in a bit of a mess.
Merge all the cube operators into one table and adjust surrounding text
appropriately.
David Rowley and Tom Lane
As of commit d5563d7df, psql -c no longer implies -X, but not all of
our regression testing scripts had gotten that memo.
To ensure consistency of results across different developers, make
sure that *all* invocations of psql in all scripts in our tree
use -X, even where this is not what previously happened.
Michael Paquier and Tom Lane
Both Blowfish and DES implementations of crypt() can take arbitrarily
long time, depending on the number of rounds specified by the caller;
make sure they can be interrupted.
Author: Andreas Karlsson
Reviewer: Jeff Janes
Backpatch to 9.1.
When use_remote_estimate is enabled, consider adding ORDER BY to the
query we sending to the remote server so that we can use that ordered
data for a merge join. Commit f18c944b61
arranges to push down the query pathkeys, which seems like the case
mostly likely to be a win, but testing shows this can sometimes win,
too.
For a regular table, we know which indexes are present and therefore
test whether the ordering provided by each such index is useful. Here,
we take the opposite approach: guess what orderings would be useful if
they could be generated cheaply, and then ask the remote side what those
will cost.
Ashutosh Bapat, with very substantial cosmetic revisions by me. Also
reviewed by Rushabh Lathia.
Introduce distance operators over cubes:
<#> taxicab distance
<-> euclidean distance
<=> chebyshev distance
Also add kNN support of those distances in GiST opclass.
Author: Stas Kelvich
Commit e7cb7ee145 provided basic
infrastructure for allowing a foreign data wrapper or custom scan
provider to replace a join of one or more tables with a scan.
However, this infrastructure failed to take into account the need
for possible EvalPlanQual rechecks, and ExecScanFetch would fail
an assertion (or just overwrite memory) if such a check was attempted
for a plan containing a pushed-down join. To fix, adjust the EPQ
machinery to skip some processing steps when scanrelid == 0, making
those the responsibility of scan's recheck method, which also has
the responsibility in this case of correctly populating the relevant
slot.
To allow foreign scans to gain control in the right place to make
use of this new facility, add a new, optional RecheckForeignScan
method. Also, allow a foreign scan to have a child plan, which can
be used to correctly populate the slot (or perhaps for something
else, but this is the only use currently envisioned).
KaiGai Kohei, reviewed by Robert Haas, Etsuro Fujita, and Kyotaro
Horiguchi.
Some versions of Perl export a macro named HS_KEY. This creates a
conflict in contrib/hstore_plperl against hstore's macro of the same
name. The most future-proof solution seems to be to rename our macro;
I chose HSTORE_KEY. For consistency, rename HS_VAL and related macros
similarly.
Back-patch to 9.5. contrib/hstore_plperl doesn't exist before that
so there is no need to worry about the conflict in older releases.
Per reports from Marco Atzeri and Mike Blackwell.
Rather than relying on other extensions to be available for installation,
let's just add some test objects to the postgres_fdw extension itself
within the regression script.
The user can whitelist specified extension(s) in the foreign server's
options, whereupon we will treat immutable functions and operators of those
extensions as candidates to be sent for remote execution.
Whitelisting an extension in this way basically promises that the extension
exists on the remote server and behaves compatibly with the local instance.
We have no way to prove that formally, so we have to rely on the user to
get it right. But this seems like something that people can usually get
right in practice.
We might in future allow functions and operators to be whitelisted
individually, but extension granularity is a very convenient special case,
so it got done first.
The patch as-committed lacks any regression tests, which is unfortunate,
but introducing dependencies on other extensions for testing purposes
would break "make installcheck" scenarios, which is worse. I have some
ideas about klugy ways around that, but it seems like material for a
separate patch. For the moment, leave the problem open.
Paul Ramsey, hacked up a bit more by me
If the join problem's entire ORDER BY clause can be pushed to the
remote server, consider a path that adds this ORDER BY clause. If
use_remote_estimate is on, we cost this path using an additional
remote EXPLAIN. If not, we just estimate that the path costs 20%
more, which is intended to be large enough that we won't request a
remote sort when it's not helpful, but small enough that we'll have
the remote side do the sort when in doubt. In some cases, the remote
sort might actually be free, because the remote query plan might
happen to produce output that is ordered the way we need, but without
remote estimates we have no way of knowing that.
It might also be useful to request sorted output from the remote side
if it enables an efficient merge join, but this patch doesn't attempt
to handle that case.
Ashutosh Bapat with revisions by me. Also reviewed by Fabrízio de Royes
Mello and Jeevan Chalke.
This fixes a long-standing bug which was discovered while investigating
the interaction between the new join pushdown code and the EvalPlanQual
machinery: if a ForeignScan appears on the inner side of a paramaterized
nestloop, an EPQ recheck would re-return the original tuple even if
it no longer satisfied the pushed-down quals due to changed parameter
values.
This fix adds a new member to ForeignScan and ForeignScanState and a
new argument to make_foreignscan, and requires changes to FDWs which
push down quals to populate that new argument with a list of quals they
have chosen to push down. Therefore, I'm only back-patching to 9.5,
even though the bug is not new in 9.5.
Etsuro Fujita, reviewed by me and by Kyotaro Horiguchi.
The tsquery, ltxtquery and query_int data types have a common ancestor.
Having acquired check_stack_depth() calls independently, each was
missing at least one call. Back-patch to 9.0 (all supported versions).
Certain short salts crashed the backend or disclosed a few bytes of
backend memory. For existing salt-induced error conditions, emit a
message saying as much. Back-patch to 9.0 (all supported versions).
Josh Kupershmidt
Security: CVE-2015-5288
If we can't read the query texts file (whether because out-of-memory, or
for some other reason), give up and reset the file to empty, discarding all
stored query texts, though not the statistics per se. We used to leave
things alone and hope for better luck next time, but the problem is that
the file is only going to get bigger and even harder to slurp into memory.
Better to do something that will get us out of trouble.
Likewise reset the file to empty for any other failure within gc_qtexts().
The previous behavior after a write error was to discard query texts but
not do anything to truncate the file, which is just weird.
Also, increase the maximum supported file size from MaxAllocSize to
MaxAllocHugeSize; this makes it more likely we'll be able to do a garbage
collection successfully.
Also, fix recalculation of mean_query_len within entry_dealloc() to match
the calculation in gc_qtexts(). The previous coding overlooked the
possibility of dropped texts (query_len == -1) and would underestimate the
mean of the remaining entries in such cases, thus possibly causing excess
garbage collection cycles.
In passing, add some errdetail to the log entry that complains about
insufficient memory to read the query texts file, which after all was
Jim Nasby's original complaint.
Back-patch to 9.4 where the current handling of query texts was
introduced.
Peter Geoghegan, rather editorialized upon by me
Without CASCADE, if an extension has an unfullfilled dependency on
another extension, CREATE EXTENSION ERRORs out with "required extension
... is not installed". That is annoying, especially when that dependency
is an implementation detail of the extension, rather than something the
extension's user can make sense of.
In addition to CASCADE this also includes a small set of regression
tests around CREATE EXTENSION.
Author: Petr Jelinek, editorialized by Michael Paquier, Andres Freund
Reviewed-By: Michael Paquier, Andres Freund, Jeff Janes
Discussion: 557E0520.3040800@2ndquadrant.com
The existing hint talked about "may only contain letters", but the
actual requirement is more strict: only lower case letters are allowed.
Reported-By: Rushabh Lathia
Author: Rushabh Lathia
Discussion: AGPqQf2x50qcwbYOBKzb4x75sO_V3g81ZsA8+Ji9iN5t_khFhQ@mail.gmail.com
Backpatch: 9.4-, where replication slots were added
If we have a local Var of say varchar type with default collation, and
we apply a RelabelType to convert that to text with default collation, we
don't want to consider that as creating an FDW_COLLATE_UNSAFE situation.
It should be okay to compare that to a remote Var, so long as the remote
Var determines the comparison collation. (When we actually ship such an
expression to the remote side, the local Var would become a Param with
default collation, meaning the remote Var would in fact control the
comparison collation, because non-default implicit collation overrides
default implicit collation in parse_collate.c.) To fix, be more precise
about what FDW_COLLATE_NONE means: it applies either to a noncollatable
data type or to a collatable type with default collation, if that collation
can't be traced to a remote Var. (When it can, FDW_COLLATE_SAFE is
appropriate.) We were essentially using that interpretation already at
the Var/Const/Param level, but we weren't bubbling it up properly.
An alternative fix would be to introduce a separate FDW_COLLATE_DEFAULT
value to describe the second situation, but that would add more code
without changing the actual behavior, so it didn't seem worthwhile.
Also, since we're clarifying the rule to be that we care about whether
operator/function input collations match, there seems no need to fail
immediately upon seeing a Const/Param/non-foreign-Var with nondefault
collation. We only have to reject if it appears in a collation-sensitive
context (for example, "var IS NOT NULL" is perfectly safe from a collation
standpoint, whatever collation the var has). So just set the state to
UNSAFE rather than failing immediately.
Per report from Jeevan Chalke. This essentially corrects some sloppy
thinking in commit ed3ddf918b, so back-patch
to 9.3 where that logic appeared.
A bunch of tests missed specifying that empty transactions shouldn't be
displayed. That causes problems when e.g. autovacuum runs in an
unfortunate moment. The tests in question only run for a very short
time, making this quite unlikely.
Reported-By: Buildfarm member axolotl
Backpatch: 9.4, where logical decoding was introduced
This new function provides information about SSL extensions present in
the X509 certificate used for the current connection.
Extension version updated to version 1.1.
Author: Дмитрий Воронин (Dmitry Voronin)
Reviewed by: Michael Paquier, Heikki Linnakangas, Álvaro Herrera
We were missing a few return checks on OpenSSL calls. Should be pretty
harmless, since we haven't seen any user reports about problems, and
this is not a high-traffic module anyway; still, a bug is a bug, so
backpatch this all the way back to 9.0.
Author: Michael Paquier, while reviewing another sslinfo patch
Recent commit 0426f349e changed handling of error context reports
in such a way to have a minor effect on the sepgsql regression
output. Adapt the expected output file to suit. Since that commit
was HEAD only, so is this one.
Remove the code in plpgsql that suppressed the innermost line of CONTEXT
for messages emitted by RAISE commands. That was never more than a quick
backwards-compatibility hack, and it's pretty silly in cases where the
RAISE is nested in several levels of function. What's more, it violated
our design theory that verbosity of error reports should be controlled
on the client side not the server side.
To alleviate the resulting noise increase, introduce a feature in libpq
and psql whereby the CONTEXT field of messages can be suppressed, either
always or only for non-error messages. Printing CONTEXT for errors only
is now their default behavior.
The actual code changes here are pretty small, but the effects on the
regression test outputs are widespread. I had to edit some of the
alternative expected outputs by hand; hopefully the buildfarm will soon
find anything I fat-fingered.
In passing, fix up (again) the output line counts in psql's various
help displays. Add some commentary about how to verify them.
Pavel Stehule, reviewed by Petr Jelínek, Jeevan Chalke, and others
Add Python script for buiding unaccent.rules from Unicode data. Don't
backpatch because unaccent changes may require tsvector/index
rebuild.
Thomas Munro <thomas.munro@enterprisedb.com>
The regression tests for sepgsql were broken by changes in the
base distro as-shipped policies. Specifically, definition of
unconfined_t in the system default policy was changed to bypass
multi-category rules, which the regression test depended on.
Fix that by defining a custom privileged domain
(sepgsql_regtest_superuser_t) and using it instead of system's
unconfined_t domain. The new sepgsql_regtest_superuser_t domain
performs almost like the current unconfined_t, but restricted by
multi-category policy as the traditional unconfined_t was.
The custom policy module is a self defined domain, and so should not
be affected by related future system policy changes. However, it still
uses the unconfined_u:unconfined_r pair for selinux-user and role.
Those definitions have not been changed for several years and seem
less risky to rely on than the unconfined_t domain. Additionally, if
we define custom user/role, they would need to be manually defined
at the operating system level, adding more complexity to an already
non-standard and complex regression test.
Back-patch to 9.3. The regression tests will need more work before
working correctly on 9.2. Starting with 9.2, sepgsql has had dependencies
on libselinux versions that are only available on newer distros with
the changed set of policies (e.g. RHEL 7.x). On 9.1 sepgsql works
fine with the older distros with original policy set (e.g. RHEL 6.x),
and on which the existing regression tests work fine. We might want
eventually change 9.1 sepgsql regression tests to be more independent
from the underlying OS policies, however more work will be needed to
make that happen and it is not clear that it is worth the effort.
Kohei KaiGai with review by Adam Brightwell and me, commentary by
Stephen, Alvaro, Tom, Robert, and others.
This fixes a bunch of somewhat pedantic warnings with new
compilers. Since by far the majority of other functions definitions use
the (void) style it just seems to be consistent to do so as well in the
remaining few places.
Commit 43b4a16817 made the isolation
tester reject duplicate step names, and it turns out that the
test_decoding module's concurrent_ddl_dml isolation test has a
duplicate name. I think the second definition isn't actually getting
used, so just remove it.
Per buildfarm.
This function was using the single-value-per-call mechanism, but the
code relied on a relcache entry that wasn't kept open across calls.
This manifested as weird errors in buildfarm during the short time that
the "brin-1" isolation test lived.
Backpatch to 9.5, where it was introduced.
There's no reason not to expose both restart_lsn and confirmed_flush
since they have rather distinct meanings. The former is the oldest WAL
still required and valid for both physical and logical slots, whereas
the latter is the location up to which a logical slot's consumer has
confirmed receiving data. Most of the time a slot will require older
WAL (i.e. restart_lsn) than the confirmed
position (i.e. confirmed_flush_lsn).
Author: Marko Tiikkaja, editorialized by me
Discussion: 559D110B.1020109@joh.to
It's against project policy to use elog() for user-facing errors, or to
omit an errcode() selection for errors that aren't supposed to be "can't
happen" cases. Fix all the violations of this policy that result in
ERRCODE_INTERNAL_ERROR log entries during the standard regression tests,
as errors that can reliably be triggered from SQL surely should be
considered user-facing.
I also looked through all the files touched by this commit and fixed
other nearby problems of the same ilk. I do not claim to have fixed
all violations of the policy, just the ones in these files.
In a few places I also changed existing ERRCODE choices that didn't
seem particularly appropriate; mainly replacing ERRCODE_SYNTAX_ERROR
by something more specific.
Back-patch to 9.5, but no further; changing ERRCODE assignments in
stable branches doesn't seem like a good idea.
An EAN beginning with 979 (but not 9790 - those are ISMN's) are accepted
as ISBN numbers, but they cannot be represented in the old, 10-digit ISBN
format. They must be output in the new 13-digit ISBN-13 format. We printed
out an incorrect value for those.
Also add a regression test, to test this and some other basic functionality
of the module.
Patch by Fabien Coelho. This fixes bug #13442, reported by B.Z. Backpatch
to 9.1, where we started to recognize ISBN-13 numbers.
The original implementation of TABLESAMPLE modeled the tablesample method
API on index access methods, which wasn't a good choice because, without
specialized DDL commands, there's no way to build an extension that can
implement a TSM. (Raw inserts into system catalogs are not an acceptable
thing to do, because we can't undo them during DROP EXTENSION, nor will
pg_upgrade behave sanely.) Instead adopt an API more like procedural
language handlers or foreign data wrappers, wherein the only SQL-level
support object needed is a single handler function identified by having
a special return type. This lets us get rid of the supporting catalog
altogether, so that no custom DDL support is needed for the feature.
Adjust the API so that it can support non-constant tablesample arguments
(the original coding assumed we could evaluate the argument expressions at
ExecInitSampleScan time, which is undesirable even if it weren't outright
unsafe), and discourage sampling methods from looking at invisible tuples.
Make sure that the BERNOULLI and SYSTEM methods are genuinely repeatable
within and across queries, as required by the SQL standard, and deal more
honestly with methods that can't support that requirement.
Make a full code-review pass over the tablesample additions, and fix
assorted bugs, omissions, infelicities, and cosmetic issues (such as
failure to put the added code stanzas in a consistent ordering).
Improve EXPLAIN's output of tablesample plans, too.
Back-patch to 9.5 so that we don't have to support the original API
in production.
The result closely resembles linking of these modules for the "win32"
port. Augment the $(exports_file) header so the file is also usable as
an import file. Unfortunately, relocating an AIX installation will now
require adding $(pkglibdir) to LD_LIBRARY_PATH. Back-patch to 9.5,
where the modules were introduced.
The AIX 7.1 libm is static, and AIX postgres executables do not export
symbols acquired from libraries. Back-patch to 9.5, where commit
cfe12763c3 added a sqrt() call.
This allows convenient checking for existence of a GUC from SQL, which is
particularly useful when dealing with custom variables.
David Christensen, reviewed by Jeevan Chalke