This allows to deduplicate some existing code, but mainly avoids some
duplication in upcoming commits.
In passing, fix variable names indicating wrong unit (seconds instead
of ms).
Author: Andres Freund
Discussion: https://postgr.es/m/20180314002740.cah3mdsonz5mxney@alap3.anarazel.de
'long' is not useful type across platforms, as it's 32bit on 32 bit
platforms, and even on some 64bit platforms (e.g. windows) it's still
only 32bits wide.
As ExplainPropertyInteger should never be performance critical, change
it to accept a 64bit argument and remove ExplainPropertyLong.
Author: Andres Freund
Discussion: https://postgr.es/m/20180314164832.n56wt7zcbpzi6zxe@alap3.anarazel.de
Fix the warnings created by the compiler warning options
-Wformat-overflow=2 -Wformat-truncation=2, supported since GCC 7. This
is a more aggressive variant of the fixes in
6275f5d28a, which GCC 7 warned about by
default.
The issues are all harmless, but some dubious coding patterns are
cleaned up.
One issue that is of external interest is that BGW_MAXLEN is increased
from 64 to 96. Apparently, the old value would cause the bgw_name of
logical replication workers to be truncated in some circumstances.
But this doesn't actually add those warning options. It appears that
the warnings depend a bit on compilation and optimization options, so it
would be annoying to have to keep up with that. This is more of a
once-in-a-while cleanup.
Reviewed-by: Michael Paquier <michael@paquier.xyz>
The existing logic for updating pg_class.reltuples trusted the sampling
results only for the pages ANALYZE actually visited, preferring to
believe the previous tuple density estimate for all the unvisited pages.
While there's some rationale for doing that for VACUUM (first that
VACUUM is likely to visit a very nonrandom subset of pages, and second
that we know for sure that the unvisited pages did not change), there's
no such rationale for ANALYZE: by assumption, it's looked at an unbiased
random sample of the table's pages. Furthermore, in a very large table
ANALYZE will have examined only a tiny fraction of the table's pages,
meaning it cannot slew the overall density estimate very far at all.
In a table that is physically growing, this causes reltuples to increase
nearly proportionally to the change in relpages, regardless of what is
actually happening in the table. This has been observed to cause reltuples
to become so much larger than reality that it effectively shuts off
autovacuum, whose threshold for doing anything is a fraction of reltuples.
(Getting to the point where that would happen seems to require some
additional, not well understood, conditions. But it's undeniable that if
reltuples is seriously off in a large table, ANALYZE alone will not fix it
in any reasonable number of iterations, especially not if the table is
continuing to grow.)
Hence, restrict the use of vac_estimate_reltuples() to VACUUM alone,
and in ANALYZE, just extrapolate from the sample pages on the assumption
that they provide an accurate model of the whole table. If, by very bad
luck, they don't, at least another ANALYZE will fix it; in the old logic
a single bad estimate could cause problems indefinitely.
In HEAD, let's remove vac_estimate_reltuples' is_analyze argument
altogether; it was never used for anything and now it's totally pointless.
But keep it in the back branches, in case any third-party code is calling
this function.
Per bug #15005. Back-patch to all supported branches.
David Gould, reviewed by Alexander Kuzmenkov, cosmetic changes by me
Discussion: https://postgr.es/m/20180117164916.3fdcf2e9@engels
This reverts commit c2c537c56d.
It's now clear that whatever is going on there, it can't be blamed
on unexpected ANALYZE runs, because the statistics are the same
just before the failing query as they were at the start of the test.
It seems fairly hard to explain recent buildfarm failures without the
theory that something is doing an ANALYZE behind our backs. Probe
for this directly to see if it's true.
In principle the outputs of these queries should be stable, since the table
in question is small enough that ANALYZE's sample will include all rows.
But even if that turns out to be wrong, we can put up with some failures
for a bit. I don't intend to leave this here indefinitely.
Discussion: https://postgr.es/m/25502.1520277552@sss.pgh.pa.us
If convert_to_scalar is passed a pair of datatypes it can't cope with,
its former behavior was just to elog(ERROR). While this is OK so far as
the core code is concerned, there's extension code that would like to use
scalarltsel/scalargtsel/etc as selectivity estimators for operators that
work on non-core datatypes, and this behavior is a show-stopper for that
use-case. If we simply allow convert_to_scalar to return FALSE instead of
outright failing, then the main logic of scalarltsel/scalargtsel will work
fine for any operator that behaves like a scalar inequality comparison.
The lack of conversion capability will mean that we can't estimate to
better than histogram-bin-width precision, since the code will effectively
assume that the comparison constant falls at the middle of its bin. But
that's still a lot better than nothing. (Someday we should provide a way
for extension code to supply a custom version of convert_to_scalar, but
today is not that day.)
While poking at this issue, we noted that the existing code for handling
type bytea in convert_to_scalar is several bricks shy of a load.
It assumes without checking that if the comparison value is type bytea,
the bounds values are too; in the worst case this could lead to a crash.
It also fails to detoast the input values, so that the comparison result is
complete garbage if any input is toasted out-of-line, compressed, or even
just short-header. I'm not sure how often such cases actually occur ---
the bounds values, at least, are probably safe since they are elements of
an array and hence can't be toasted. But that doesn't make this code OK.
Back-patch to all supported branches, partly because author requested that,
but mostly because of the bytea bugs. The change in API for the exposed
routine convert_network_to_scalar() is theoretically a back-patch hazard,
but it seems pretty unlikely that any third-party code is calling that
function directly.
Tomas Vondra, with some adjustments by me
Discussion: https://postgr.es/m/b68441b6-d18f-13ab-b43b-9a72188a4e02@2ndquadrant.com
Commit 1bc0100d27 added this test, and
commits 882ea509fe,
958e20e42d,
4fa396464e tried to stabilize it. It's
still not stable, so keep trying.
The latest comment from Tom Lane is that disabling autovacuum seems
like a good strategy, but we might need to do it on more tables, hence
this patch.
Etsuro Fujita
Discussion: http://postgr.es/m/5A9928F1.2010206@lab.ntt.co.jp
Since 9.5, it's possible that some but not all columns of an index
support returning the indexed value for index-only scans. If the
same indexed column appears in index columns that behave both ways,
check_index_only() supposed that it'd be OK to do an index-only scan
testing that column; but that fails if we have to recheck the indexed
condition on one of the columns that doesn't support this.
In principle we could make this work by remapping the recheck expressions
to pull the value from a column that does support returning the indexed
value. But such cases are so weird and rare that, at least for now,
it doesn't seem worth the trouble. Instead, just teach check_index_only
that a value is returnable only if all the index columns containing it
are returnable, rather than any of them.
Per report from David Pereiro Lagares. Back-patch to 9.5 where the
possibility of this situation appeared.
Kyotaro Horiguchi
Discussion: https://postgr.es/m/1516210494.1798.16.camel@nlpgo.com
Commit a26116c6c accidentally changed the behavior of the SQL format_type()
function while refactoring. For the reasons explained in that function's
comment, a NULL typemod argument should behave differently from a -1
argument. Since we've managed to break this, add a regression test
memorializing the intended behavior.
In passing, be consistent about the type of the "flags" parameter.
Noted by Rushabh Lathia, though I revised the patch some more.
Discussion: https://postgr.es/m/CAGPqQf3RB2q-d2Awp_-x-Ur6aOxTUwnApt-vm-iTtceZxYnePg@mail.gmail.com
Solaris 11.4 has built-in functions named b64_encode and b64_decode.
Rename ours to something else to avoid the conflict (fortunately,
ours are static so the impact is limited).
One could wish for less duplication of code in this area, but that
would be a larger patch and not very suitable for back-patching.
Since this is a portability fix, we want to put it into all supported
branches.
Report and initial patch by Rainer Orth, reviewed and adjusted a bit
by Michael Paquier
Discussion: https://postgr.es/m/ydd372wk28h.fsf@CeBiTec.Uni-Bielefeld.DE
Commit 1bc0100d27 added this test,
and commit 882ea509fe tried to
stabilize it. There were still failures, so commit
958e20e42d tried again to stabilize
it. That approach is still failing on jaguarundi, though, so
back it out and try something else. Specifically, instead of
disabling remote estimates for the table in question, let's tell
autovacuum to leave it alone.
Etsuro Fujita
Discussion: http://postgr.es/m/5A82DCCE.3060107@lab.ntt.co.jp
This makes the client programs behave as documented regardless of the
connect-time search_path and regardless of user-created objects. Today,
a malicious user with CREATE permission on a search_path schema can take
control of certain of these clients' queries and invoke arbitrary SQL
functions under the client identity, often a superuser. This is
exploitable in the default configuration, where all users have CREATE
privilege on schema "public".
This changes behavior of user-defined code stored in the database, like
pg_index.indexprs and pg_extension_config_dump(). If they reach code
bearing unqualified names, "does not exist" or "no schema has been
selected to create in" errors might appear. Users may fix such errors
by schema-qualifying affected names. After upgrading, consider watching
server logs for these errors.
The --table arguments of src/bin/scripts clients have been lax; for
example, "vacuumdb -Zt pg_am\;CHECKPOINT" performed a checkpoint. That
now fails, but for now, "vacuumdb -Zt 'pg_am(amname);CHECKPOINT'" still
performs a checkpoint.
Back-patch to 9.3 (all supported versions).
Reviewed by Tom Lane, though this fix strategy was not his first choice.
Reported by Arseniy Sharoglazov.
Security: CVE-2018-1058
The previous limit of INT_MAX / 1000 seems to have been cargo-culted in
from somewhere else. Or possibly the value was converted to microseconds
at some point; but in all supported releases, it's just compared to other
values, so there's no need for the restriction. This change raises the
effective limit from ~35 minutes to ~24 days, which conceivably is useful
to somebody, and anyway it's more consistent with the range of the core
log_min_duration_statement GUC.
Per complaint from Kevin Bloch. Back-patch to all supported releases.
Discussion: https://postgr.es/m/8ea82d7e-cb78-8e05-0629-73aa14d2a0ca@codingthat.com
Commit bf6c614a2f broke the sepgsql test
due to a new invocation of the function access hook during grouping
equal initialization.
The new behaviour seems at least as correct as the old one, so try
adapt the tests. As I've no working sepgsql setup here, this is just
going from buildfarm results.
Author: Andres Freund
Discussion: https://postgr.es/m/20180217000337.lfsdvro3l6ccsksp@alap3.anarazel.de
Introduce a new format_type_extended, with a flags bitmask argument that
can modify the default behavior. A few compatibility and readability
wrappers remain:
format_type_be
format_type_be_qualified
format_type_with_typemod
while format_type_with_typemod_qualified, which had a single caller, is
removed.
Author: Michael Paquier, some revisions by me
Discussion: 20180213035107.GA2915@paquier.xyz
The modern way is to use a missing_ok argument instead of two separate
almost-identical routines, so do that.
Author: Michaël Paquier
Reviewed-by: Álvaro Herrera
Discussion: https://postgr.es/m/20180201063212.GE6398@paquier.xyz
Even after commit 882ea509fe, some
buildfarm members are still failing in the postgres_fdw tests.
Try to fix that by disabling use of remote statistics for some
test cases.
Etsuro Fujita
Discussion: http://postgr.es/m/5A7D76CF.8080601@lab.ntt.co.jp
Commit 1bc0100d27 added these tests,
but they're not stable enough to survive in the buildfarm. Remove
CTIDs from the output in the hopes of fixing that.
Commit 0bf3ae88af allowed direct
foreign table modification; instead of fetching each row, updating
it locally, and then pushing the modification back to the remote
side, we would instead do all the work on the remote server via a
single remote UPDATE or DELETE command. However, that commit only
enabled this optimization when join tree consisted only of the
target table.
This change allows the same optimization when an UPDATE statement
has a FROM clause or a DELETE statement has a USING clause. This
works much like ordinary foreign join pushdown, in that the tables
must be on the same remote server, relevant parts of the query
must be pushdown-safe, and so forth.
Etsuro Fujita, reviewed by Ashutosh Bapat, Rushabh Lathia, and me.
Some formatting corrections by me.
Discussion: http://postgr.es/m/5A57193A.2080003@lab.ntt.co.jp
Discussion: http://postgr.es/m/b9cee735-62f8-6c07-7528-6364ce9347d0@lab.ntt.co.jp
To make this work, tuplesort.c and logtape.c must also support
parallelism, so this patch adds that infrastructure and then applies
it to the particular case of parallel btree index builds. Testing
to date shows that this can often be 2-3x faster than a serial
index build.
The model for deciding how many workers to use is fairly primitive
at present, but it's better than not having the feature. We can
refine it as we get more experience.
Peter Geoghegan with some help from Rushabh Lathia. While Heikki
Linnakangas is not an author of this patch, he wrote other patches
without which this feature would not have been possible, and
therefore the release notes should possibly credit him as an author
of this feature. Reviewed by Claudio Freire, Heikki Linnakangas,
Thomas Munro, Tels, Amit Kapila, me.
Discussion: http://postgr.es/m/CAM3SWZQKM=Pzc=CAHzRixKjp2eO5Q0Jg1SoFQqeXFQ647JiwqQ@mail.gmail.com
Discussion: http://postgr.es/m/CAH2-Wz=AxWqDoVvGU7dq856S4r6sJAj6DBn7VMtigkB33N5eyg@mail.gmail.com
Commit 4bbf6edfbd added a test case,
but it turns out that the test case doesn't reliably test for the
bug, and in the context of the regression test suite did not because
ANALYZE had not been run.
Report and patch by Etsuro Fujita. I added a comment along lines
previously suggested by Tom Lane.
Discussion: http://postgr.es/m/5A6195D8.8060206@lab.ntt.co.jp
We have a lot of code in which option names, which from the user's
viewpoint are logically keywords, are passed through the grammar as plain
identifiers, and then matched to string literals during command execution.
This approach avoids making words into lexer keywords unnecessarily. Some
places matched these strings using plain strcmp, some using pg_strcasecmp.
But the latter should be unnecessary since identifiers would have been
downcased on their way through the parser. Aside from any efficiency
concerns (probably not a big factor), the lack of consistency in this area
creates a hazard of subtle bugs due to different places coming to different
conclusions about whether two option names are the same or different.
Hence, standardize on using strcmp() to match any option names that are
expected to have been fed through the parser.
This does create a user-visible behavioral change, which is that while
formerly all of these would work:
alter table foo set (fillfactor = 50);
alter table foo set (FillFactor = 50);
alter table foo set ("fillfactor" = 50);
alter table foo set ("FillFactor" = 50);
now the last case will fail because that double-quoted identifier is
different from the others. However, none of our documentation says that
you can use a quoted identifier in such contexts at all, and we should
discourage doing so since it would break if we ever decide to parse such
constructs as true lexer keywords rather than poor man's substitutes.
So this shouldn't create a significant compatibility issue for users.
Daniel Gustafsson, reviewed by Michael Paquier, small changes by me
Discussion: https://postgr.es/m/29405B24-564E-476B-98C0-677A29805B84@yesql.se
When an UPDATE causes a row to no longer match the partition
constraint, try to move it to a different partition where it does
match the partition constraint. In essence, the UPDATE is split into
a DELETE from the old partition and an INSERT into the new one. This
can lead to surprising behavior in concurrency scenarios because
EvalPlanQual rechecks won't work as they normally did; the known
problems are documented. (There is a pending patch to improve the
situation further, but it needs more review.)
Amit Khandekar, reviewed and tested by Amit Langote, David Rowley,
Rajkumar Raghuwanshi, Dilip Kumar, Amul Sul, Thomas Munro, Álvaro
Herrera, Amit Kapila, and me. A few final revisions by me.
Discussion: http://postgr.es/m/CAJ3gD9do9o2ccQ7j7+tSgiE1REY65XRiMb=yJO3u3QhyP8EEPQ@mail.gmail.com
AclObjectKind was basically just another enumeration for object types,
and we already have a preferred one for that. It's only used in
aclcheck_error. By using ObjectType instead, we can also give some more
precise error messages, for example "index" instead of "relation".
Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
When pushing down a join to a foreign server, postgres_fdw constructs
an alternative plan to be used for any EvalPlanQual rechecks that
prove to be necessary. This plan is stored as the outer subplan of
the Foreign Scan implementing the pushed-down join. Previously, this
alternative plan could have a different nominal sort ordering than its
parent, which seemed OK since there will only be one tuple per base
table anyway in the case of an EvalPlanQual recheck. Actually,
though, it caused a problem if that path was used as a building block
for the EvalPlanQual recheck plan of a higher-level foreign join,
because we could end up with a merge join one of whose inputs was not
labelled with the correct sort order. Repair by injecting an extra
Sort node into the EvalPlanQual recheck plan whenever it would
otherwise fail to be sorted at least as well as its parent Foreign
Scan.
Report by Jeff Janes. Patch by me, reviewed by Tom Lane, who also
provided the test case and comment text.
Discussion: http://postgr.es/m/CAMkU=1y2G8VOVBHv3iXU2TMAj7-RyBFFW1uhkr5sm9LQ2=X35g@mail.gmail.com
Ability to advance both physical and logical replication slots using a
new user function pg_replication_slot_advance().
For logical advance that means records are consumed as fast as possible
and changes are not given to output plugin for sending. Makes 2nd phase
(after we reached SNAPBUILD_FULL_SNAPSHOT) of replication slot creation
faster, especially when there are big transactions as the reorder buffer
does not have to deal with data changes and does not have to spill to
disk.
Author: Petr Jelinek
Reviewed-by: Simon Riggs
Commit 7012b132d, which added the ability to push down aggregates and
grouping to the remote server, wasn't careful to ensure that the remote
server would have the same idea we do about which columns are the grouping
columns, in cases where there are textually identical GROUP BY expressions.
Such cases typically led to "targetlist item has multiple sortgroupref
labels" errors.
To fix this reliably, switch over to using "GROUP BY column-number" syntax
rather than "GROUP BY expression" in transmitted queries, and adjust
foreign_grouping_ok() to be more careful about duplicating the sortgroupref
labeling of the local pathtarget.
Per bug #14890 from Sean Johnston. Back-patch to v10 where the buggy code
was introduced.
Jeevan Chalke, reviewed by Ashutosh Bapat
Discussion: https://postgr.es/m/20171107134948.1508.94783@wrigleys.postgresql.org
~> (cube, int) operator was especially designed for knn-gist search.
However, knn-gist supports only ascending ordering of results. Nevertheless
it would be useful to support descending ordering by ~> (cube, int) operator.
We provide workaround for that: negative coordinate give us inversed value
of corresponding cube bound. Therefore, knn search using negative coordinate
gives us an effect of descending ordering by cube bound.
Author: Alexander Korotkov
Reviewed by: Tomas Vondra, Andrey Borodin
Discussion: https://www.postgresql.org/message-id/flat/a9657f6a-b497-36ff-e56-482a2c7e3292@2ndquadrant.com
~> (cube, int) operator was especially designed for knn-gist search.
However, it appears that knn-gist search can't work correctly with current
behavior of this operator when dataset contains cubes of variable
dimensionality. In this case, the same value of second operator argument
can point to different dimension depending on dimensionality of particular cube.
Such behavior is incompatible with gist indexing of cubes, and knn-gist doesn't
work correctly for it.
This patch changes behavior of ~> (cube, int) operator by introducing dimension
numbering where value of second argument unambiguously identifies number of
dimension. With new behavior, this operator can be correctly supported by
knn-gist. Relevant changes to cube operator class are also included.
Backpatch to v9.6 where operator was introduced.
Since behavior of ~> (cube, int) operator is changed, depending entities
must be refreshed after upgrade. Such as, expression indexes using this
operator must be reindexed, materialized views must be rebuilt, stored
procedures and client code must be revised to correctly use new behavior.
That should be mentioned in release notes.
Noticed by: Tomas Vondra
Author: Alexander Korotkov
Reviewed by: Tomas Vondra, Andrey Borodin
Discussion: https://www.postgresql.org/message-id/flat/a9657f6a-b497-36ff-e56-482a2c7e3292@2ndquadrant.com
Instead of using our standard macro for this calculation, this code
did it itself ... and got it wrong, leading to incorrect display of
the null bitmap in some cases. Noted and fixed by Maksim Milyutin.
In passing, remove a uselessly duplicative error check.
Errors were introduced in commit d6061f83a; back-patch to 9.6
where that came in.
Maksim Milyutin, reviewed by Andrey Borodin
Discussion: https://postgr.es/m/ec295792-a69f-350f-6287-25a20e8f31d5@gmail.com
This patch makes a number of interrelated changes to reduce the overhead
involved in creating/deleting memory contexts. The key ideas are:
* Include the AllocSetContext header of an aset.c context in its first
malloc request, rather than allocating it separately in TopMemoryContext.
This means that we now always create an initial or "keeper" block in an
aset, even if it never receives any allocation requests.
* Create freelists in which we can save and recycle recently-destroyed
asets (this idea is due to Robert Haas).
* In the common case where the name of a context is a constant string,
just store a pointer to it in the context header, rather than copying
the string.
The first change eliminates a palloc/pfree cycle per context, and
also avoids bloat in TopMemoryContext, at the price that creating
a context now involves a malloc/free cycle even if the context never
receives any allocations. That would be a loser for some common
usage patterns, but recycling short-lived contexts via the freelist
eliminates that pain.
Avoiding copying constant strings not only saves strlen() and strcpy()
overhead, but is an essential part of the freelist optimization because
it makes the context header size constant. Currently we make no
attempt to use the freelist for contexts with non-constant names.
(Perhaps someday we'll need to think harder about that, but in current
usage, most contexts with custom names are long-lived anyway.)
The freelist management in this initial commit is pretty simplistic,
and we might want to refine it later --- but in common workloads that
will never matter because the freelists will never get full anyway.
To create a context with a non-constant name, one is now required to
call AllocSetContextCreateExtended and specify the MEMCONTEXT_COPY_NAME
option. AllocSetContextCreate becomes a wrapper macro, and it includes
a test that will complain about non-string-literal context name
parameters on gcc and similar compilers.
An unfortunate side effect of making AllocSetContextCreate a macro is
that one is now *required* to use the size parameter abstraction macros
(ALLOCSET_DEFAULT_SIZES and friends) with it; the pre-9.6 habit of
writing out individual size parameters no longer works unless you
switch to AllocSetContextCreateExtended.
Internally to the memory-context-related modules, the context creation
APIs are simplified, removing the rather baroque original design whereby
a context-type module called mcxt.c which then called back into the
context-type module. That saved a bit of code duplication, but not much,
and it prevented context-type modules from exercising control over the
allocation of context headers.
In passing, I converted the test-and-elog validation of aset size
parameters into Asserts to save a few more cycles. The original thought
was that callers might compute size parameters on the fly, but in practice
nobody does that, so it's useless to expend cycles on checking those
numbers in production builds.
Also, mark the memory context method-pointer structs "const",
just for cleanliness.
Discussion: https://postgr.es/m/2264.1512870796@sss.pgh.pa.us
A previous commit added inline functions that provide fast(er) and
correct overflow checks for signed integer math. Use them in a
significant portion of backend code. There's more to touch in both
backend and frontend code, but these were the easily identifiable
cases.
The old overflow checks are noticeable in integer heavy workloads.
A secondary benefit is that getting rid of overflow checks that rely
on signed integer overflow wrapping around, will allow us to get rid
of -fwrapv in the future. Which in turn slows down other code.
Author: Andres Freund
Discussion: https://postgr.es/m/20171024103954.ztmatprlglz3rwke@alap3.anarazel.de
This is a backward incompatibility which should be noted in the
release notes for PostgreSQL 11.
For security reasons, we require that a postgres_fdw foreign table use
password authentication when accessing a remote server, so that an
unprivileged user cannot usurp the server's credentials. Superusers
are exempt from this requirement, because we assume they are entitled
to usurp the server's credentials or, at least, can find some other
way to do it.
But what should happen when the foreign table is accessed by a view
owned by a user different from the session user? Is it the view owner
that must be a superuser in order to avoid the requirement of using a
password, or the session user? Historically it was the latter, but
this requirement makes it the former instead. This allows superusers
to delegate to other users the right to select from a foreign table
that doesn't use password authentication by creating a view over the
foreign table and handing out rights to the view. It is also more
consistent with the idea that access to a view should use the view
owner's privileges rather than the session user's privileges.
The upshot of this change is that a superuser selecting from a view
created by a non-superuser may now get an error complaining that no
password was used, while a non-superuser selecting from a view
created by a superuser will no longer receive such an error.
No documentation changes are present in this patch because the
wording of the documentation already suggests that it works this
way. We should perhaps adjust the documentation in the back-branches,
but that's a task for another patch.
Originally proposed by Jeff Janes, but with different semantics;
adjusted to work like this by me per discussion.
Discussion: http://postgr.es/m/CA+TgmoaY4HsVZJv5SqEjCKLDwtCTSwXzKpRftgj50wmMMBwciA@mail.gmail.com
This patch fixes a couple of low-probability bugs that could lead to
reporting an irrelevant errno value (and hence possibly a wrong SQLSTATE)
concerning directory-open or file-open failures. It also fixes places
where we took shortcuts in reporting such errors, either by using elog
instead of ereport or by using ereport but forgetting to specify an
errcode. And it eliminates a lot of just plain redundant error-handling
code.
In service of all this, export fd.c's formerly-static function
ReadDirExtended, so that external callers can make use of the coding
pattern
dir = AllocateDir(path);
while ((de = ReadDirExtended(dir, path, LOG)) != NULL)
if they'd like to treat directory-open failures as mere LOG conditions
rather than errors. Also fix FreeDir to be a no-op if we reach it
with dir == NULL, as such a coding pattern would cause.
Then, remove code at many call sites that was throwing an error or log
message for AllocateDir failure, as ReadDir or ReadDirExtended can handle
that job just fine. Aside from being a net code savings, this gets rid of
a lot of not-quite-up-to-snuff reports, as mentioned above. (In some
places these changes result in replacing a custom error message such as
"could not open tablespace directory" with more generic wording "could not
open directory", but it was agreed that the custom wording buys little as
long as we report the directory name.) In some other call sites where we
can't just remove code, change the error reports to be fully
project-style-compliant.
Also reorder code in restoreTwoPhaseData that was acquiring a lock
between AllocateDir and ReadDir; in the unlikely but surely not
impossible case that LWLockAcquire changes errno, AllocateDir failures
would be misreported. There is no great value in opening the directory
before acquiring TwoPhaseStateLock, so just do it in the other order.
Also fix CheckXLogRemoved to guarantee that it preserves errno,
as quite a number of call sites are implicitly assuming. (Again,
it's unlikely but I think not impossible that errno could change
during a SpinLockAcquire. If so, this function was broken for its
own purposes as well as breaking callers.)
And change a few places that were using not-per-project-style messages,
such as "could not read directory" when "could not open directory" is
more correct.
Back-patch the exporting of ReadDirExtended, in case we have occasion
to back-patch some fix that makes use of it; it's not needed right now
but surely making it global is pretty harmless. Also back-patch the
restoreTwoPhaseData and CheckXLogRemoved fixes. The rest of this is
essentially cosmetic and need not get back-patched.
Michael Paquier, with a bit of additional work by me
Discussion: https://postgr.es/m/CAB7nPqRpOCxjiirHmebEFhXVTK7V5Jvw4bz82p7Oimtsm3TyZA@mail.gmail.com
Antonin Houska reported that the planner does consider pushing
postgres_fdw_abs() to the remote side, which happens because we make
it shippable earlier in the test case file.
Jeevan Chalke provided this patch, which changes the join
condition to use random(), which is not shippable, instead.
Antonin reviewed the patch.
Discussion: http://postgr.es/m/15265.1511985971@localhost
rewriteTargetListUD's processing is dependent on the relkind of the query's
target table. That was fine at the time it was made to act that way, even
for queries on inheritance trees, because all tables in an inheritance tree
would necessarily be plain tables. However, the 9.5 feature addition
allowing some members of an inheritance tree to be foreign tables broke the
assumption that rewriteTargetListUD's output tlist could be applied to all
child tables with nothing more than column-number mapping. This led to
visible failures if foreign child tables had row-level triggers, and would
also break in cases where child tables belonged to FDWs that used methods
other than CTID for row identification.
To fix, delay running rewriteTargetListUD until after the planner has
expanded inheritance, so that it is applied separately to the (already
mapped) tlist for each child table. We can conveniently call it from
preprocess_targetlist. Refactor associated code slightly to avoid the
need to heap_open the target relation multiple times during
preprocess_targetlist. (The APIs remain a bit ugly, particularly around
the point of which steps scribble on parse->targetList and which don't.
But avoiding such scribbling would require a change in FDW callback APIs,
which is more pain than it's worth.)
Also fix ExecModifyTable to ensure that "tupleid" is reset to NULL when
we transition from rows providing a CTID to rows that don't. (That's
really an independent bug, but it manifests in much the same cases.)
Add a regression test checking one manifestation of this problem, which
was that row-level triggers on a foreign child table did not work right.
Back-patch to 9.5 where the problem was introduced.
Etsuro Fujita, reviewed by Ildus Kurbangaliev and Ashutosh Bapat
Discussion: https://postgr.es/m/20170514150525.0346ba72@postgrespro.ru
hstoreUniquePairs() often called memcpy with equal source and destination
pointers. Although this is almost surely harmless in practice, it's
undefined according to the letter of the C standard. Some versions of
valgrind will complain about it, and some versions of libc as well
(cf. commit ad520ec4a). Tweak the code to avoid doing that.
Noted by Tomas Vondra. Back-patch to all supported versions because
of the hazard of libc assertions.
Discussion: https://postgr.es/m/bf84d940-90d4-de91-19dd-612e011007f4@fuzzy.cz
To do this, we only have to remove the compress and decompress support
functions, which have never done anything more than detoasting.
In the wake of commit d3a4f89d8, this results in automatically enabling
index-only scans, since the core code will now know that the stored
representation is the same as the original data (up to detoasting).
The only exciting part of this is that ALTER OPERATOR FAMILY lacks
a way to drop a support function that was declared as being part of
an opclass rather than being loose in the family. For the moment,
we'll hack our way to a solution with a manual update of the pg_depend
entry type, which is what distinguishes the two cases. Perhaps
someday it'll be worth providing a cleaner way to do that, but for
now it seems like a very niche problem.
Note that the underlying C functions remain, to support use of the shared
libraries with older versions of the modules' SQL declarations. Someday
we may be able to remove them, but not soon.
Andrey Borodin, reviewed by me
Discussion: https://postgr.es/m/D0F53A05-4F4A-4DEC-8339-3C069FA0EE11@yandex-team.ru
Python Py*_New() functions can fail and return NULL in out-of-memory
conditions. The previous code handled that inconsistently or not at
all. This change organizes that better. If we are in a function that
is called from Python, we just check for failure and return NULL
ourselves, which will cause any exception information to be passed up.
If we are called from PostgreSQL, we consistently create an "out of
memory" error.
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Since those scripts haven't worked at all in macOS releases of 2014
and later, and aren't the recommended way to do it on any release
since 2005, there seems little point carrying them into the future.
It's very unlikely that anyone would be installing PG >= 11 on a
macOS release where they couldn't use contrib/start-scripts/macos/.
Discussion: https://postgr.es/m/31338.1510763554@sss.pgh.pa.us
The scripts in contrib/start-scripts/osx don't work at all on macOS
10.10 (Yosemite) or later, because they depend on SystemStarter which
Apple deprecated long ago and removed in 10.10. Add a new subdirectory
contrib/start-scripts/macos with scripts that use the newer launchd
infrastructure.
Since this problem is independent of which Postgres version you're using,
back-patch to all supported branches.
Discussion: https://postgr.es/m/31338.1510763554@sss.pgh.pa.us
Fix PL/Python so that it can handle domains over composite, and so that
it enforces domain constraints correctly in other cases that were not
always done properly before. Notably, it didn't do arrays of domains
right (oversight in commit c12d570fa), and it failed to enforce domain
constraints when returning a composite type containing a domain field,
and if a transform function is being used for a domain's base type then
it failed to enforce domain constraints on the result. Also, in many
places it missed checking domain constraints on null values, because
the plpy_typeio code simply wasn't called for Py_None.
Rather than try to band-aid these problems, I made a significant
refactoring of the plpy_typeio logic. The existing design of recursing
for array and composite members is extended to also treat domains as
containers requiring recursion, and the APIs for the module are cleaned
up and simplified.
The patch also modifies plpy_typeio to rely on the typcache more than
it did before (which was pretty much not at all). This reduces the
need for repetitive lookups, and lets us get rid of an ad-hoc scheme
for detecting changes in composite types. I added a couple of small
features to typcache to help with that.
Although some of this is fixing bugs that long predate v11, I don't
think we should risk a back-patch: it's a significant amount of code
churn, and there've been no complaints from the field about the bugs.
Tom Lane, reviewed by Anthony Bykov
Discussion: https://postgr.es/m/24449.1509393613@sss.pgh.pa.us
Make bloom WAL test compare psql output text, not just result codes;
this was evidently the intent all along, but it was mis-coded.
In passing, make sure we will notice any failure in setup steps.
Alexander Korotkov, reviewed by Michael Paquier and Masahiko Sawada
Discussion: https://postgr.es/m/CAPpHfdtohPdQ9rc5mdWjxq+3VsBNw534KV_5O65dTQrSdVJNgw@mail.gmail.com
The lower case spellings are C and C++ standard and are used in most
parts of the PostgreSQL sources. The upper case spellings are only used
in some files/modules. So standardize on the standard spellings.
The APIs for ICU, Perl, and Windows define their own TRUE and FALSE, so
those are left as is when using those APIs.
In code comments, we use the lower-case spelling for the C concepts and
keep the upper-case spelling for the SQL concepts.
Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
By default, $PGUSER has permission to unlink $PGLOG. If $PGUSER
replaces $PGLOG with a symbolic link, the server will corrupt the
link-targeted file by appending log messages. Since these scripts open
$PGLOG as root, the attack works regardless of target file ownership.
"make install" does not install these scripts anywhere. Users having
manually installed them in the past should repeat that process to
acquire this fix. Most script users have $PGLOG writable to root only,
located in $PGDATA. Just before updating one of these scripts, such
users should rename $PGLOG to $PGLOG.old. The script will then recreate
$PGLOG with proper ownership.
Reviewed by Peter Eisentraut. Reported by Antoine Scemama.
Security: CVE-2017-12172
btree, hash, and bloom indexes all set up their metapages in standard
format (that is, with pd_lower and pd_upper correctly delimiting the
unused area); but they mostly didn't inform the xlog routines of this.
When calling log_newpage[_buffer], this is bad because it loses the
opportunity to compress unused data out of the WAL record. When
calling XLogRegisterBuffer, it's not such a performance problem because
all of these call sites also use REGBUF_WILL_INIT, preventing an FPI
image from being written. But it's still a good idea to provide the
flag when relevant, because that aids WAL consistency checking.
This completes the project of getting all the in-core index AMs to
handle their metapage WAL operations similarly.
Amit Kapila, reviewed by Michael Paquier
Discussion: https://postgr.es/m/0d273805-0e9e-ec1a-cb84-d4da400b8f85@lab.ntt.co.jp
This changes the use of a custom configuration file to enforce the value
of preload_shared_libraries to simply load the library during the tests.
This removes the restriction of running installcheck on the tests, and
simplifies its makefile contrary to what has been introduced in af7211e.
Author: Michael Paquier <michael.paquier@gmail.com>
This is the last major omission in our domains feature: you can now
make a domain over anything that's not a pseudotype.
The major complication from an implementation standpoint is that places
that might be creating tuples of a domain type now need to be prepared
to apply domain_check(). It seems better that unprepared code fail
with an error like "<type> is not composite" than that it silently fail
to apply domain constraints. Therefore, relevant infrastructure like
get_func_result_type() and lookup_rowtype_tupdesc() has been adjusted
to treat domain-over-composite as a distinct case that unprepared code
won't recognize, rather than just transparently treating it the same
as plain composite. This isn't a 100% solution to the possibility of
overlooked domain checks, but it catches most places.
In passing, improve typcache.c's support for domains (it can now cache
the identity of a domain's base type), and rewrite the argument handling
logic in jsonfuncs.c's populate_record[set]_worker to reduce duplicative
per-call lookups.
I believe this is code-complete so far as the core and contrib code go.
The PLs need varying amounts of work, which will be tackled in followup
patches.
Discussion: https://postgr.es/m/4206.1499798337@sss.pgh.pa.us
This is preparation for a future patch to extensively change how
reloptions work.
Author: Nikolay Shaplov
Reviewed-by: Michael Paquier
Discussion: https://postgr.es/m/2615372.orqtEn8VGB@x200m
Flex generates a lot of functions that are not actually used. In order
to avoid coverage figures being ruined by that, mark up the part of the
.l files where the generated code appears by lcov exclusion markers.
That way, lcov will typically only reported on coverage for the .l file,
which is under our control, but not for the .c file.
Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
This takes advantage of the infrastructure introduced by commit
81c5e46c49 to greatly reduce the
likelihood that two different queries will end up with the same query
ID. It's still possible, of course, but whereas before it the chances
of a collision reached 25% around 50,000 queries, it will now take
more than 3 billion queries.
Backward incompatibility: Because the type exposed at the SQL level is
int8, users may now see negative query IDs in the pg_stat_statements
view (and also, query IDs more than 4 billion, which was the old
limit).
Patch by me, reviewed by Michael Paquier and Peter Geoghegan.
Discussion: http://postgr.es/m/CA+TgmobG_Kp4cBKFmsznUAaM1GWW6hhRNiZC0KjRMOOeYnz5Yw@mail.gmail.com
The logical decoding functions do BeginInternalSubTransaction and
RollbackAndReleaseCurrentSubTransaction to clean up after themselves.
It turns out that AtEOSubXact_SPI has an unrecognized assumption that
we always need to cancel the active SPI operation in the SPI context
that surrounds the subtransaction (if there is one). That's true
when the RollbackAndReleaseCurrentSubTransaction call is coming from
the SPI-using function itself, but not when it's happening inside
some unrelated function invoked by a SPI query. In practice the
affected callers are the various PLs.
To fix, record the current subtransaction ID when we begin a SPI
operation, and clean up only if that ID is the subtransaction being
canceled.
Also, remove AtEOSubXact_SPI's assertion that it must have cleaned
up the surrounding SPI context's active tuptable. That's proven
wrong by the same test case.
Also clarify (or, if you prefer, reinterpret) the calling conventions
for _SPI_begin_call and _SPI_end_call. The memory context cleanup
in the latter means that these have always had the flavor of a matched
resource-management pair, but they weren't documented that way before.
Per report from Ben Chobot.
Back-patch to 9.4 where logical decoding came in. In principle,
the SPI changes should go all the way back, since the problem dates
back to commit 7ec1c5a86. But given the lack of field complaints
it seems few people are using internal subtransactions in this way.
So I don't feel a need to take any risks in 9.2/9.3.
Discussion: https://postgr.es/m/73FBA179-C68C-4540-9473-71E865408B15@silentmedia.com
Instead of joining two partitioned tables in their entirety we can, if
it is an equi-join on the partition keys, join the matching partitions
individually. This involves teaching the planner about "other join"
rels, which are related to regular join rels in the same way that
other member rels are related to baserels. This can use significantly
more CPU time and memory than regular join planning, because there may
now be a set of "other" rels not only for every base relation but also
for every join relation. In most practical cases, this probably
shouldn't be a problem, because (1) it's probably unusual to join many
tables each with many partitions using the partition keys for all
joins and (2) if you do that scenario then you probably have a big
enough machine to handle the increased memory cost of planning and (3)
the resulting plan is highly likely to be better, so what you spend in
planning you'll make up on the execution side. All the same, for now,
turn this feature off by default.
Currently, we can only perform joins between two tables whose
partitioning schemes are absolutely identical. It would be nice to
cope with other scenarios, such as extra partitions on one side or the
other with no match on the other side, but that will have to wait for
a future patch.
Ashutosh Bapat, reviewed and tested by Rajkumar Raghuwanshi, Amit
Langote, Rafia Sabih, Thomas Munro, Dilip Kumar, Antonin Houska, Amit
Khandekar, and by me. A few final adjustments by me.
Discussion: http://postgr.es/m/CAFjFpRfQ8GrQvzp3jA2wnLqrHmaXna-urjm_UY9BqXj=EaDTSA@mail.gmail.com
Discussion: http://postgr.es/m/CAFjFpRcitjfrULr5jfuKWRPsGUX0LQ0k8-yG0Qw2+1LBGNpMdw@mail.gmail.com
A lot of semi-internal code just prints out numeric SPI error codes,
which is not very helpful. We already have an API function to convert
the codes to a string, so let's make more use of that.
Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
All postgres internal usages are replaced, it's just libpq example
usages that haven't been converted. External users of libpq can't
generally rely on including postgres internal headers.
Note that this includes replacing open-coded byte swapping of 64bit
integers (using two 32 bit swaps) with a single 64bit swap.
Where it looked applicable, I have removed netinet/in.h and
arpa/inet.h usage, which previously provided the relevant
functionality. It's perfectly possible that I missed other reasons for
including those, the buildfarm will tell.
Author: Andres Freund
Discussion: https://postgr.es/m/20170927172019.gheidqy6xvlxb325@alap3.anarazel.de
Allowing arrays with a domain type as their element type was left un-done
in the original domain patch, but not for any very good reason. This
omission leads to such surprising results as array_agg() not working on
a domain column, because the parser can't identify a suitable output type
for the polymorphic aggregate.
In order to fix this, first clean up the APIs of coerce_to_domain() and
some internal functions in parse_coerce.c so that we consistently pass
around a CoercionContext along with CoercionForm. Previously, we sometimes
passed an "isExplicit" boolean flag instead, which is strictly less
information; and coerce_to_domain() didn't even get that, but instead had
to reverse-engineer isExplicit from CoercionForm. That's contrary to the
documentation in primnodes.h that says that CoercionForm only affects
display and not semantics. I don't think this change fixes any live bugs,
but it makes things more consistent. The main reason for doing it though
is that now build_coercion_expression() receives ccontext, which it needs
in order to be able to recursively invoke coerce_to_target_type().
Next, reimplement ArrayCoerceExpr so that the node does not directly know
any details of what has to be done to the individual array elements while
performing the array coercion. Instead, the per-element processing is
represented by a sub-expression whose input is a source array element and
whose output is a target array element. This simplifies life in
parse_coerce.c, because it can build that sub-expression by a recursive
invocation of coerce_to_target_type(). The executor now handles the
per-element processing as a compiled expression instead of hard-wired code.
The main advantage of this is that we can use a single ArrayCoerceExpr to
handle as many as three successive steps per element: base type conversion,
typmod coercion, and domain constraint checking. The old code used two
stacked ArrayCoerceExprs to handle type + typmod coercion, which was pretty
inefficient, and adding yet another array deconstruction to do domain
constraint checking seemed very unappetizing.
In the case where we just need a single, very simple coercion function,
doing this straightforwardly leads to a noticeable increase in the
per-array-element runtime cost. Hence, add an additional shortcut evalfunc
in execExprInterp.c that skips unnecessary overhead for that specific form
of expression. The runtime speed of simple cases is within 1% or so of
where it was before, while cases that previously required two levels of
array processing are significantly faster.
Finally, create an implicit array type for every domain type, as we do for
base types, enums, etc. Everything except the array-coercion case seems
to just work without further effort.
Tom Lane, reviewed by Andrew Dunstan
Discussion: https://postgr.es/m/9852.1499791473@sss.pgh.pa.us
Upcoming patches are going to address performance issues that involve
slow system provided ntohs/htons etc. To address that expand
pg_bswap.h to provide pg_ntoh{16,32,64}, pg_hton{16,32,64} and
optimize their respective implementations by using compiler intrinsics
for gcc compatible compilers and msvc. Fall back to manual
implementations using shifts etc otherwise.
Additionally remove multiple evaluation hazards from the existing
BSWAP32/64 macros, by replacing them with inline functions when
necessary. In the course of that the naming scheme is changed to
pg_bswap16/32/64.
Author: Andres Freund
Discussion: https://postgr.es/m/20170927172019.gheidqy6xvlxb325@alap3.anarazel.de
Add bgw_type field to background worker structure. It is intended to be
set to the same value for all workers of the same type, so they can be
grouped in pg_stat_activity, for example.
The backend_type column in pg_stat_activity now shows bgw_type for a
background worker. The ps listing also no longer calls out that a
process is a background worker but just show the bgw_type. That way,
being a background worker is more of an implementation detail now that
is not shown to the user. However, most log messages still refer to
'background worker "%s"'; otherwise constructing sensible and
translatable log messages would become tricky.
Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
* Failure to check for malloc failure (ok, pretty unlikely here, but
that's not an excuse).
* Leakage of open fd on read error, and of malloc'd buffer always.
* Incorrect assumption that a short read would set errno to zero.
* Failure to adhere to message style conventions (in particular,
not reporting errno where relevant; using "couldn't open" rather than
"could not open" is not really in line with project style either).
* Missing newlines on some messages.
Coverity spotted the leak problems; I noticed the rest while
fixing the leaks.
The file handling functions from fd.c were called with a diverse mix of
notations for the file permissions when they were opening new files.
Almost all files created by the server should have the same permissions
set. So change the API so that e.g. OpenTransientFile() automatically
uses the standard permissions set, and OpenTransientFilePerm() is a new
function that takes an explicit permissions set for the few cases where
it is needed. This also saves an unnecessary argument for call sites
that are just opening an existing file.
While we're reviewing these APIs, get rid of the FileName typedef and
use the standard const char * for the file name and mode_t for the file
mode. This makes these functions match other file handling functions
and removes an unnecessary layer of mysteriousness. We can also get rid
of a few casts that way.
Author: David Steele <david@pgmasters.net>
The recent addition of a test suite for this module revealed a few
problems. It uses a crypt() method that is no longer considered secure
and doesn't work anymore on some platforms. Using a volatile input
function violates internal sanity check assumptions and leads to
failures on the build farm.
So this module is neither a usable security tool nor a good example for
an extension. No one wanted to argue for keeping or improving it, so
remove it.
Discussion: https://www.postgresql.org/message-id/5645b0d7-cc40-6ab5-c553-292a91091ee7%402ndquadrant.com
For performance reasons a larger segment size than the default 16MB
can be useful. A larger segment size has two main benefits: Firstly,
in setups using archiving, it makes it easier to write scripts that
can keep up with higher amounts of WAL, secondly, the WAL has to be
written and synced to disk less frequently.
But at the same time large segment size are disadvantageous for
smaller databases. So far the segment size had to be configured at
compile time, often making it unrealistic to choose one fitting to a
particularly load. Therefore change it to a initdb time setting.
This includes a breaking changes to the xlogreader.h API, which now
requires the current segment size to be configured. For that and
similar reasons a number of binaries had to be taught how to recognize
the current segment size.
Author: Beena Emerson, editorialized by Andres Freund
Reviewed-By: Andres Freund, David Steele, Kuntal Ghosh, Michael
Paquier, Peter Eisentraut, Robert Hass, Tushar Ahuja
Discussion: https://postgr.es/m/CAOG9ApEAcQ--1ieKbhFzXSQPw_YLmepaa4hNdnY5+ZULpt81Mw@mail.gmail.com
This reverts commit 890faaf1 which attempted unsuccessfully to deal with
the problem, and instead just comments out these tests like other similar
tests elsewhere in the script.
Make the btree page-flags test macros (P_ISLEAF and friends) return clean
boolean values, rather than values that might not fit in a bool. Use them
in a few places that were randomly referencing the flag bits directly.
In passing, change access/nbtree/'s only direct use of BUFFER_LOCK_SHARE to
BT_READ. (Some think we should go the other way, but as long as we have
BT_READ/BT_WRITE, let's use them consistently.)
Masahiko Sawada, reviewed by Doug Doole
Discussion: https://postgr.es/m/CAD21AoBmWPeN=WBB5Jvyz_Nt3rmW1ebUyAnk3ZbJP3RMXALJog@mail.gmail.com
By project convention, these names should include "P" when dealing with a
pointer type; that is, if the result of a GETARG macro is of type FOO *,
it should be called PG_GETARG_FOO_P not just PG_GETARG_FOO. Some newer
types such as JSONB and ranges had not followed the convention, and a
number of contrib modules hadn't gotten that memo either. Rename the
offending macros to improve consistency.
In passing, fix a few places that thought PG_DETOAST_DATUM() returns
a Datum; it does not, it returns "struct varlena *". Applying
DatumGetPointer to that happens not to cause any bad effects today,
but it's formally wrong. Also, adjust an ltree macro that was designed
without any thought for what pgindent would do with it.
This is all cosmetic and shouldn't have any impact on generated code.
Mark Dilger, some further tweaks by me
Discussion: https://postgr.es/m/EA5676F4-766F-4F38-8348-ECC7DB427C6A@gmail.com
The ISN_DEBUG code did not compile. Fix that code, don't hide it behind
an #ifdef, make it run when building with asserts, and make it error out
instead of just logging if it fails.
Reviewed-by: David Steele <david@pgmasters.net>
Historically, the selectivity functions have simply not distinguished
< from <=, or > from >=, arguing that the fraction of the population that
satisfies the "=" aspect can be considered to be vanishingly small, if the
comparison value isn't any of the most-common-values for the variable.
(If it is, the code path that executes the operator against each MCV will
take care of things properly.) But that isn't really true unless we're
dealing with a continuum of variable values, and in practice we seldom are.
If "x = const" would estimate a nonzero number of rows for a given const
value, then it follows that we ought to estimate different numbers of rows
for "x < const" and "x <= const", even if the const is not one of the MCVs.
Handling this more honestly makes a significant difference in edge cases,
such as the estimate for a tight range (x BETWEEN y AND z where y and z
are close together).
Hence, split scalarltsel into scalarltsel/scalarlesel, and similarly
split scalargtsel into scalargtsel/scalargesel. Adjust <= and >=
operator definitions to reference the new selectivity functions.
Improve the core ineq_histogram_selectivity() function to make a
correction for equality. (Along the way, I learned quite a bit about
exactly why that function gives good answers, which I tried to memorialize
in improved comments.)
The corresponding join selectivity functions were, and remain, just stubs.
But I chose to split them similarly, to avoid confusion and to prevent the
need for doing this exercise again if someone ever makes them less stubby.
In passing, change ineq_histogram_selectivity's clamp for extreme
probability estimates so that it varies depending on the histogram
size, instead of being hardwired at 0.0001. With the default histogram
size of 100 entries, you still get the old clamp value, but bigger
histograms should allow us to put more faith in edge values.
Tom Lane, reviewed by Aleksander Alekseev and Kuntal Ghosh
Discussion: https://postgr.es/m/12232.1499140410@sss.pgh.pa.us
It is equivalent in ANSI C to write (*funcptr) () and funcptr(). These
two styles have been applied inconsistently. After discussion, we'll
use the more verbose style for plain function pointer variables, to make
it clear that it's a variable, and the shorter style when the function
pointer is in a struct (s.func() or s->func()), because then it's clear
that it's not a plain function name, and otherwise the excessive
punctuation makes some of those invocations hard to read.
Discussion: https://www.postgresql.org/message-id/f52c16db-14ed-757d-4b48-7ef360b1631d@2ndquadrant.com
This doesn't allow routing tuple to the foreign partitions themselves,
but it permits tuples to be routed to regular partitions despite the
presence of foreign partitions in the same inheritance hierarchy.
Etsuro Fujita, reviewed by Amit Langote and by me.
Discussion: http://postgr.es/m/bc3db4c1-1693-3b8a-559f-33ad2b50b7ad@lab.ntt.co.jp
The parenthesized style has only been used in a few modules. Change
that to use the style that is predominant across the whole tree.
Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
Reviewed-by: Ryan Murphy <ryanfmurphy@gmail.com>
PG_MODULE_MAGIC has been around since 8.2, with 8.1 long since in EOL,
so remove the mention of #ifdef guards for compiling against pre-8.2
sources from the documentation.
Author: Daniel Gustafsson <daniel@yesql.se>
Periodically while the server is running, and at shutdown, write out a
list of blocks in shared buffers. When the server reaches consistency
-- unfortunatey, we can't do it before that point without breaking
things -- reload those blocks into any still-unused shared buffers.
Mithun Cy and Robert Haas, reviewed and tested by Beena Emerson,
Amit Kapila, Jim Nasby, and Rafia Sabih.
Discussion: http://postgr.es/m/CAD__OugubOs1Vy7kgF6xTjmEqTR4CrGAv8w+ZbaY_+MZeitukw@mail.gmail.com
This is a mechanical change in preparation for a later commit that
will change the layout of TupleDesc. Introducing a macro to abstract
the details of where attributes are stored will allow us to change
that in separate step and revise it in future.
Author: Thomas Munro, editorialized by Andres Freund
Reviewed-By: Andres Freund
Discussion: https://postgr.es/m/CAEepm=0ZtQ-SpsgCyzzYpsXS6e=kZWqk3g5Ygn3MDV7A8dabUA@mail.gmail.com
Improve generate_unaccent_rules.py to handle composed characters whose base
is another composed character rather than a plain letter. The net effect
of this is to add a bunch of multi-accented Vietnamese characters to
unaccent.rules.
Original complaint from Kha Nguyen, diagnosis of the script's shortcoming
by Thomas Munro.
Dang Minh Huong and Michael Paquier
Discussion: https://postgr.es/m/CALo3sF6EC8cy1F2JUz=GRf5h4LMUJTaG3qpdoiLrNbWEXL-tRg@mail.gmail.com
Change to appendStringInfoChar() or appendStringInfoString() where those
can be used.
Author: David Rowley <david.rowley@2ndquadrant.com>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>
Per a report from AP, it's not that hard to exhaust the supply of
bitmap pages if you create a table with a hash index and then insert a
few billion rows - and then you start getting errors when you try to
insert additional rows. In the particular case reported by AP,
there's another fix that we can make to improve recycling of overflow
pages, which is another way to avoid the error, but there may be other
cases where this problem happens and that fix won't help. So let's
buy ourselves as much headroom as we can without rearchitecting
anything.
The comments claim that the old limit was 64GB, but it was really
only 32GB, because we didn't use all the bits in the page for bitmap
bits - only the largest power of 2 that could fit after deducting
space for the page header and so forth. Thus, we have 4kB per page
for bitmap bits, not 8kB. The new limit is thus actually 8 times the
old *real* limit but only 4 times the old *purported* limit.
Since this breaks on-disk compatibility, bump HASH_VERSION. We've
already done this earlier in this release cycle, so this doesn't cause
any incremental inconvenience for people using pg_upgrade from
releases prior to v10. However, users who use pg_upgrade to reach
10beta3 or later from 10beta2 or earlier will need to REINDEX any hash
indexes again.
Amit Kapila and Robert Haas
Discussion: http://postgr.es/m/20170704105728.mwb72jebfmok2nm2@zip.com.au
The Perl documentation is very clear that stuff calling libperl should
be built with the compiler switches shown by Perl's $Config{ccflags}.
We'd been ignoring that up to now, and mostly getting away with it,
but recent Perl versions contain ABI compatibility cross-checks that
fail on some builds because of this omission. In particular the
sizeof(PerlInterpreter) can come out different due to some fields being
added or removed; which means we have a live ABI hazard that we'd better
fix rather than continuing to sweep it under the rug.
However, it still seems like a bad idea to just absorb $Config{ccflags}
verbatim. In some environments Perl was built with a different compiler
that doesn't even use the same switch syntax. -D switch syntax is pretty
universal though, and absorbing Perl's -D switches really ought to be
enough to fix the problem.
Furthermore, Perl likes to inject stuff like -D_LARGEFILE_SOURCE and
-D_FILE_OFFSET_BITS=64 into $Config{ccflags}, which affect libc ABIs on
platforms where they're relevant. Adopting those seems dangerous too.
It's unclear whether a build wherein Perl and Postgres have different ideas
of sizeof(off_t) etc would work, or whether anyone would care about making
it work. But it's dead certain that having different stdio ABIs in
core Postgres and PL/Perl will not work; we've seen that movie before.
Therefore, let's also ignore -D switches for symbols beginning with
underscore. The symbols that we actually need to import should be the ones
mentioned in perl.h's PL_bincompat_options stanza, and none of those start
with underscore, so this seems likely to work. (If it turns out not to
work everywhere, we could consider intersecting the symbols mentioned in
PL_bincompat_options with the -D switches. But that will be much more
complicated, so let's try this way first.)
This will need to be back-patched, but first let's see what the
buildfarm makes of it.
Ashutosh Sharma, some adjustments by me
Discussion: https://postgr.es/m/CANFyU97OVQ3+Mzfmt3MhuUm5NwPU=-FtbNH5Eb7nZL9ua8=rcA@mail.gmail.com
In Perl builds that define PERL_IMPLICIT_SYS, XSUB.h defines macros
that replace a whole lot of basic libc functions with Perl functions.
We can't tolerate that in plperl.c; it breaks at least PG_TRY and
probably other stuff. The core idea of this patch is to include XSUB.h
only in the .xs files where it's really needed, and to move any code
broken by PERL_IMPLICIT_SYS out of the .xs files and into plperl.c.
The reason this hasn't been a problem before is that our build techniques
did not result in PERL_IMPLICIT_SYS appearing as a #define in PL/Perl,
even on some platforms where Perl thinks it is defined. That's about to
change in order to fix a nasty portability issue, so we need this work to
make the code safe for that.
Rather unaccountably, the Perl people chose XSUB.h as the place to provide
the versions of the aTHX/aTHX_ macros that are needed by code that's not
explicitly aware of the MULTIPLICITY API conventions. Hence, just removing
XSUB.h from plperl.c fails miserably. But we can work around that by
defining PERL_NO_GET_CONTEXT (which would make the relevant stanza of
XSUB.h a no-op anyway). As explained in perlguts.pod, that means we need
to add a "dTHX" macro call in every C function that calls a Perl API
function. In most of them we just add this at the top; but since the
macro fetches the current Perl interpreter pointer, more care is needed
in functions that switch the active interpreter. Lack of the macro is
easily recognized since it results in bleats about "my_perl" not being
defined.
(A nice side benefit of this is that it significantly reduces the number
of fetches of the current interpreter pointer. On my machine, plperl.so
gets more than 10% smaller, and there's probably some performance win too.
We could reduce the number of fetches still more by decorating the code
with pTHX_/aTHX_ macros to pass the interpreter pointer around, as
explained by perlguts.pod; but that's a task for another day.)
Formatting note: pgindent seems happy to treat "dTHX;" as a declaration
so long as it's the first thing after the left brace, as we'd already
observed with respect to the similar macro "dSP;". If you try to put
it later in a set of declarations, pgindent puts ugly extra space
around it.
Having removed XSUB.h from plperl.c, we need only move the support
functions for spi_return_next and util_elog (both of which use PG_TRY)
out of the .xs files and into plperl.c. This seems sufficient to
avoid the known problems caused by PERL_IMPLICIT_SYS, although we
could move more code if additional issues emerge.
This will need to be back-patched, but first let's see what the
buildfarm makes of it.
Patch by me, with some help from Ashutosh Sharma
Discussion: https://postgr.es/m/CANFyU97OVQ3+Mzfmt3MhuUm5NwPU=-FtbNH5Eb7nZL9ua8=rcA@mail.gmail.com
If the user modifies a view that has CHECK OPTIONs and this gets
translated into a modification to an underlying relation which happens
to be a foreign table, the check options should be enforced. In the
normal code path, that was happening properly, but it was not working
properly for "direct" modification because the whole operation gets
pushed to the remote side in that case and we never have an option to
enforce the constraint against individual tuples. Fix by disabling
direct modification when there is a need to enforce CHECK OPTIONs.
Etsuro Fujita, reviewed by Kyotaro Horiguchi and by me.
Discussion: http://postgr.es/m/f8a48f54-6f02-9c8a-5250-9791603171ee@lab.ntt.co.jp
The new test cases added in commit 8bf58c0d9 turn out to have output
that can vary depending on the lc_messages setting prevailing on the
test server. Hide the remote end's error messages to ensure stable
output. This isn't a terribly desirable solution; we'd rather know
that the connection failed for the expected reason and not some other
one. But there seems little choice for the moment.
Per buildfarm.
Discussion: https://postgr.es/m/18419.1500658570@sss.pgh.pa.us
Previously, postgres_fdw would keep on using an existing connection even
if the user did ALTER SERVER or ALTER USER MAPPING commands that should
affect connection parameters. Teach it to watch for catcache invals
on these catalogs and re-establish connections when the relevant catalog
entries change. Per bug #14738 from Michal Lis.
In passing, clean up some rather crufty decisions in commit ae9bfc5d6
about where fields of ConnCacheEntry should be reset. We now reset
all the fields whenever we open a new connection.
Kyotaro Horiguchi, reviewed by Ashutosh Bapat and myself.
Back-patch to 9.3 where postgres_fdw appeared.
Discussion: https://postgr.es/m/20170710113917.7727.10247@wrigleys.postgresql.org
Add missing infrastructure for this node type, notably in ruleutils.c where
its lack could demonstrably cause EXPLAIN to fail. Add outfuncs/readfuncs
support. (outfuncs support is useful today for debugging purposes. The
readfuncs support may never be needed, since at present it would only
matter for parallel query and NextValueExpr should never appear in a
parallelizable query; but it seems like a bad idea to have a primnode type
that isn't fully supported here.) Teach planner infrastructure that
NextValueExpr is a volatile, parallel-unsafe, non-leaky expression node
with cost cpu_operator_cost. Given its limited scope of usage, there
*might* be no live bug today from the lack of that knowledge, but it's
certainly going to bite us on the rear someday. Teach pg_stat_statements
about the new node type, too.
While at it, also teach cost_qual_eval() that MinMaxExpr, SQLValueFunction,
XmlExpr, and CoerceToDomain should be charged as cpu_operator_cost.
Failing to do this for SQLValueFunction was an oversight in my commit
0bb51aa96. The others are longer-standing oversights, but no time like the
present to fix them. (In principle, CoerceToDomain could have cost much
higher than this, but it doesn't presently seem worth trying to examine the
domain's constraints here.)
Modify execExprInterp.c to execute NextValueExpr as an out-of-line
function; it seems quite unlikely to me that it's worth insisting that
it be inlined in all expression eval methods. Besides, providing the
out-of-line function doesn't stop anyone from inlining if they want to.
Adjust some places where NextValueExpr support had been inserted with the
aid of a dartboard rather than keeping it in the same order as elsewhere.
Discussion: https://postgr.es/m/23862.1499981661@sss.pgh.pa.us
Previously, gen_random_uuid() would fall back to a weak random number
generator, unlike gen_random_bytes() which would just fail. And this was
not made very clear in the docs. For consistency, also make
gen_random_uuid() fail outright, if compiled with --disable-strong-random.
Re-word the error message you get with --disable-strong-random. It is also
used by pgp functions that require random salts, and now also
gen_random_uuid().
Reported by Radek Slupik.
Discussion: https://www.postgresql.org/message-id/20170101232054.10135.50528@wrigleys.postgresql.org
Don't move parenthesized lines to the left, even if that means they
flow past the right margin.
By default, BSD indent lines up statement continuation lines that are
within parentheses so that they start just to the right of the preceding
left parenthesis. However, traditionally, if that resulted in the
continuation line extending to the right of the desired right margin,
then indent would push it left just far enough to not overrun the margin,
if it could do so without making the continuation line start to the left of
the current statement indent. That makes for a weird mix of indentations
unless one has been completely rigid about never violating the 80-column
limit.
This behavior has been pretty universally panned by Postgres developers.
Hence, disable it with indent's new -lpl switch, so that parenthesized
lines are always lined up with the preceding left paren.
This patch is much less interesting than the first round of indent
changes, but also bulkier, so I thought it best to separate the effects.
Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
Change pg_bsd_indent to follow upstream rules for placement of comments
to the right of code, and remove pgindent hack that caused comments
following #endif to not obey the general rule.
Commit e3860ffa4d wasn't actually using
the published version of pg_bsd_indent, but a hacked-up version that
tried to minimize the amount of movement of comments to the right of
code. The situation of interest is where such a comment has to be
moved to the right of its default placement at column 33 because there's
code there. BSD indent has always moved right in units of tab stops
in such cases --- but in the previous incarnation, indent was working
in 8-space tab stops, while now it knows we use 4-space tabs. So the
net result is that in about half the cases, such comments are placed
one tab stop left of before. This is better all around: it leaves
more room on the line for comment text, and it means that in such
cases the comment uniformly starts at the next 4-space tab stop after
the code, rather than sometimes one and sometimes two tabs after.
Also, ensure that comments following #endif are indented the same
as comments following other preprocessor commands such as #else.
That inconsistency turns out to have been self-inflicted damage
from a poorly-thought-through post-indent "fixup" in pgindent.
This patch is much less interesting than the first round of indent
changes, but also bulkier, so I thought it best to separate the effects.
Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
The new indent version includes numerous fixes thanks to Piotr Stefaniak.
The main changes visible in this commit are:
* Nicer formatting of function-pointer declarations.
* No longer unexpectedly removes spaces in expressions using casts,
sizeof, or offsetof.
* No longer wants to add a space in "struct structname *varname", as
well as some similar cases for const- or volatile-qualified pointers.
* Declarations using PG_USED_FOR_ASSERTS_ONLY are formatted more nicely.
* Fixes bug where comments following declarations were sometimes placed
with no space separating them from the code.
* Fixes some odd decisions for comments following case labels.
* Fixes some cases where comments following code were indented to less
than the expected column 33.
On the less good side, it now tends to put more whitespace around typedef
names that are not listed in typedefs.list. This might encourage us to
put more effort into typedef name collection; it's not really a bug in
indent itself.
There are more changes coming after this round, having to do with comment
indentation and alignment of lines appearing within parentheses. I wanted
to limit the size of the diffs to something that could be reviewed without
one's eyes completely glazing over, so it seemed better to split up the
changes as much as practical.
Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
We had three occurrences of essentially the same coding pattern
wherein we tried to retrieve a query result from a libpq connection
without blocking. In the case where PQconsumeInput failed (typically
indicating a lost connection), all three loops simply gave up and
returned, forgetting to clear any previously-collected PGresult
object. Since those are malloc'd not palloc'd, the oversight results
in a process-lifespan memory leak.
One instance, in libpqwalreceiver, is of little significance because
the walreceiver process would just quit anyway if its connection fails.
But we might as well fix it.
The other two instances, in postgres_fdw, are somewhat more worrisome
because at least in principle the scenario could be repeated, allowing
the amount of memory leaked to build up to something worth worrying
about. Moreover, in these cases the loops contain CHECK_FOR_INTERRUPTS
calls, as well as other calls that could potentially elog(ERROR),
providing another way to exit without having cleared the PGresult.
Here we need to add PG_TRY logic similar to what exists in quite a
few other places in postgres_fdw.
Coverity noted the libpqwalreceiver bug; I found the other two cases
by checking all calls of PQconsumeInput.
Back-patch to all supported versions as appropriate (9.2 lacks
postgres_fdw, so this is really quite unexciting for that branch).
Discussion: https://postgr.es/m/22620.1497486981@sss.pgh.pa.us
Commit 15ce775 changed tuple-routing constraint checking logic.
This affects the expected output for contrib/sepgsql, because
there's no longer LOG entries reporting allowance of int4eq()
execution. Per buildfarm.
Commit f039eaac71, later back-patched
with commit 1b812afb0e, allowed many of
the queries issued by postgres_fdw to fetch remote data to respond to
cancel interrupts in a timely fashion. However, it didn't do anything
about the transaction control commands, which remained
noninterruptible.
Improve the situation by changing do_sql_command() to retrieve query
results using pgfdw_get_result(), which uses the asynchronous
interface to libpq so that it can check for interrupts every time
libpq returns control. Since this might result in a situation
where we can no longer be sure that the remote transaction state
matches the local transaction state, add a facility to force all
levels of the local transaction to abort if we've lost track of
the remote state; without this, an apparently-successful commit of
the local transaction might fail to commit changes made on the
remote side. Also, add a 60-second timeout for queries issue during
transaction abort; if that expires, give up and mark the state of
the connection as unknown. Drop all such connections when we exit
the local transaction. Together, these changes mean that if we're
aborting the local toplevel transaction anyway, we can just drop the
remote connection in lieu of waiting (possibly for a very long time)
for it to complete an abort.
This still leaves quite a bit of room for improvement. PQcancel()
has no asynchronous interface, so if we get stuck sending the cancel
request we'll still hang. Also, PQsetnonblocking() is not used, which
means we could block uninterruptibly when sending a query. There
might be some other optimizations possible as well. Nonetheless,
this allows us to escape a wait for an unresponsive remote server
quickly in many more cases than previously.
Report by Suraj Kharage. Patch by me and Rafia Sabih. Review
and testing by Amit Kapila and Tushar Ahuja.
Discussion: http://postgr.es/m/CAF1DzPU8Kx+fMXEbFoP289xtm3bz3t+ZfxhmKavr98Bh-C0TqQ@mail.gmail.com
get_partition_parent felt that it could simply Assert that systable_getnext
found a tuple. This is unlike any other caller of that function, and it's
unsafe IMO --- in fact, the reason I noticed it was that the Assert failed.
(OK, I was working with known-inconsistent catalog contents, but I wasn't
expecting the DB to fall over quite that violently. The behavior in a
non-assert-enabled build wouldn't be very nice, either.) Fix it to do what
other callers do, namely an actual runtime-test-and-elog.
Also, standardize the wording of elog messages that are complaining about
unexpected failure of systable_getnext. 90% of them say "could not find
tuple for <object>", so make the remainder do likewise. Many of the
holdouts were using the phrasing "cache lookup failed", which is outright
misleading since no catcache search is involved.
Remove some gratuituous message differences by making the AM name
previously embedded in each message be a %s instead. While at it, get
rid of terminology that's unclear and unnecessary in one message.
Discussion: https://postgr.es/m/20170523001557.bq2hbq7hxyvyw62q@alvherre.pgsql
Commit 3ec76ff1f changed the partitioning logic to not install a forced
NOT NULL constraint on range partitioning columns. This affects the
expected output for contrib/sepgsql, because there's no longer LOG
entries reporting allowance of such a constraint. Per buildfarm.
This seemed like a good idea originally because there's no way to mark
a range partition as accepting NULL, but that now seems more like a
current limitation than something we want to lock down for all time.
For example, there's a proposal to add the notion of a default
partition which accepts all rows not otherwise routed, which directly
conflicts with the idea that a range-partitioned table should never
allow nulls anywhere. So let's change this while we still can, by
putting the NOT NULL test into the partition constraint instead of
changing the column properties.
Amit Langote and Robert Haas, reviewed by Amit Kapila
Discussion: http://postgr.es/m/8e2dd63d-c6fb-bb74-3c2b-ed6d63629c9d@lab.ntt.co.jp
Reformat various places in which pgindent will make a mess, and
fix a few small violations of coding style that I happened to notice
while perusing the diffs from a pgindent dry run.
There is one actual bug fix here: the need-to-enlarge-the-buffer code
path in icu_convert_case was obviously broken. Perhaps it's unreachable
in our usage? Or maybe this is just sadly undertested.
Ultra-modern versions of the perl Data::Dumper module have apparently
changed how they indent output. Instead of trying to keep up we choose
to tell it to supporess all indentation in the hstore_plperl regression
tests.
Backpatch to 9.5 where this feature was introduced.
The snapshot assembly during the creation of logical slots relied
waiting for transactions in xl_running_xacts to end, by checking for
their commit/abort records. Unfortunately, despite locking, it is
possible to see an xl_running_xact record listing transactions as
ready, that have already WAL-logged an commit/abort record, as the
locking just prevents the ProcArray to be adjusted, and the commit
record has to be logged first.
That lead to either delayed or hanging snapshot creation, because
snapbuild.c would wait "forever" to see commit/abort records for some
transactions. That hang resolved only if a xl_running_xacts record
without any running transactions happened to be logged, far from
certain on a busy server.
It's impractical to prevent that via more heavyweight locking, the
likelihood of deadlocks and significantly increased contention would
be too big.
Instead change the initial snapshot creation to be solely based on
tracking the oldest running transaction via
xl_running_xacts->oldestRunningXid - that actually ends up
significantly simplifying the code. That has two disadvantages:
1) Because we cannot fully "trust" the contents of xl_running_xacts,
we cannot use it to build the initial snapshot. Instead we have to
wait twice for all running transactions to finish.
2) Previously a slot, unless the race occurred, could be created when
the all transaction perceived as running based on commit/abort
records, now we have to wait for the next xl_running_xacts record.
To address that, trigger logging new xl_running_xacts record from
within snapbuild.c exactly when necessary.
Unfortunately snabuild.c's SnapBuild is stored on disk, one of the
stupider ideas of a certain Mr Freund, so we can't change it in a
minor release. As this is going to be backpatched, we have to hack
around a bit to keep on-disk compatibility. A later commit will
rejigger that on master.
Author: Andres Freund, based on a quite different patch from Petr Jelinek
Analyzed-By: Petr Jelinek
Reviewed-By: Petr Jelinek
Discussion: https://postgr.es/m/f37e975c-908f-858e-707f-058d3b1eb214@2ndquadrant.com
Backpatch: 9.4-, where logical decoding has been introduced
The mess cleaned up in commit da0759600 is clear evidence that it's a
bug hazard to expect the caller of get_attstatsslot()/free_attstatsslot()
to provide the correct type OID for the array elements in the slot.
Moreover, we weren't even getting any performance benefit from that,
since get_attstatsslot() was extracting the real type OID from the array
anyway. So we ought to get rid of that requirement; indeed, it would
make more sense for get_attstatsslot() to pass back the type OID it found,
in case the caller isn't sure what to expect, which is likely in binary-
compatible-operator cases.
Another problem with the current implementation is that if the stats array
element type is pass-by-reference, we incur a palloc/memcpy/pfree cycle
for each element. That seemed acceptable when the code was written because
we were targeting O(10) array sizes --- but these days, stats arrays are
almost always bigger than that, sometimes much bigger. We can save a
significant number of cycles by doing one palloc/memcpy/pfree of the whole
array. Indeed, in the now-probably-common case where the array is toasted,
that happens anyway so this method is basically free. (Note: although the
catcache code will inline any out-of-line toasted values, it doesn't
decompress them. At the other end of the size range, it doesn't expand
short-header datums either. In either case, DatumGetArrayTypeP would have
to make a copy. We do end up using an extra array copy step if the element
type is pass-by-value and the array length is neither small enough for a
short header nor large enough to have suffered compression. But that
seems like a very acceptable price for winning in pass-by-ref cases.)
Hence, redesign to take these insights into account. While at it,
convert to an API in which we fill a struct rather than passing a bunch
of pointers to individual output arguments. That will make it less
painful if we ever want further expansion of what get_attstatsslot can
pass back.
It's certainly arguable that this is new development and not something to
push post-feature-freeze. However, I view it as primarily bug-proofing
and therefore something that's better to have sooner not later. Since
we aren't quite at beta phase yet, let's put it in.
Discussion: https://postgr.es/m/16364.1494520862@sss.pgh.pa.us
Per discussion, "location" is a rather vague term that could refer to
multiple concepts. "LSN" is an unambiguous term for WAL locations and
should be preferred. Some function names, view column names, and function
output argument names used "lsn" already, but others used "location",
as well as yet other terms such as "wal_position". Since we've already
renamed a lot of things in this area from "xlog" to "wal" for v10,
we may as well incur a bit more compatibility pain and make these names
all consistent.
David Rowley, minor additional docs hacking by me
Discussion: https://postgr.es/m/CAKJS1f8O0njDKe8ePFQ-LK5-EjwThsDws6ohJ-+c6nWK+oUxtg@mail.gmail.com
Storing passwords in plaintext hasn't been a good idea for a very long
time, if ever. Now seems like a good time to finally forbid it, since we're
messing with this in PostgreSQL 10 anyway.
Remove the CREATE/ALTER USER UNENCRYPTED PASSSWORD 'foo' syntax, since
storing passwords unencrypted is no longer supported. ENCRYPTED PASSWORD
'foo' is still accepted, but ENCRYPTED is now just a noise-word, it does
the same as just PASSWORD 'foo'.
Likewise, remove the --unencrypted option from createuser, but accept
--encrypted as a no-op for backward compatibility. AFAICS, --encrypted was
a no-op even before this patch, because createuser encrypted the password
before sending it to the server even if --encrypted was not specified. It
added the ENCRYPTED keyword to the SQL command, but since the password was
already in encrypted form, it didn't make any difference. The documentation
was not clear on whether that was intended or not, but it's moot now.
Also, while password_encryption='on' is still accepted as an alias for
'md5', it is now marked as hidden, so that it is not listed as an accepted
value in error hints, for example. That's not directly related to removing
'plain', but it seems better this way.
Reviewed by Michael Paquier
Discussion: https://www.postgresql.org/message-id/16e9b768-fd78-0b12-cfc1-7b6b7f238fde@iki.fi
Objects in an extension are shippable to a foreign server if the
extension is part of the foreign server definition's shippable
extensions list. But this was not properly considered in some cases
when checking whether a join condition can be pushed to a foreign server
and the join condition uses an object from a shippable extension. So
the join would never be pushed down in those cases.
So, the list of extensions needs to be made available in fpinfo of the
relation being considered to be pushed down before any expressions are
assessed for being shippable. Fix foreign_join_ok() to do that for a
join relation.
The code to save FDW options in fpinfo is scattered at multiple places.
Bring all of that together into functions apply_server_options(),
apply_table_options(), and merge_fdw_options().
David Rowley and Ashutosh Bapat, per report from David Rowley
Standardize on testing a hash index page's type by doing
(opaque->hasho_flag & LH_PAGE_TYPE) == LH_xxx_PAGE
Various places were taking shortcuts like
opaque->hasho_flag & LH_BUCKET_PAGE
which while not actually wrong, is still bad practice because
it encourages use of
opaque->hasho_flag & LH_UNUSED_PAGE
which *is* wrong (LH_UNUSED_PAGE == 0, so the above is constant false).
hash_xlog.c's hash_mask() contained such an incorrect test.
This also ensures that we mask out the additional flag bits that
hasho_flag has accreted since 9.6. pgstattuple's pgstat_hash_page(),
for one, was failing to do that and was thus actively broken.
Also fix assorted comments that hadn't been updated to reflect the
extended usage of hasho_flag, and fix some macros that were testing
just "(hasho_flag & bit)" to use the less dangerous, project-approved
form "((hasho_flag & bit) != 0)".
Coverity found the bug in hash_mask(); I noted the one in
pgstat_hash_page() through code reading.
Commit 9e43e8714 turns out to have been insufficient: not only is it
necessary to track tentative parent links while considering a set of
arc removals, but it's necessary to track tentative flag additions
as well. This is because we always merge arc target states into
arc source states; therefore, when considering a merge of the final
state with some other, it is the other state that will acquire a new
TSTATE_FIN bit. If there's another arc for the same color trigram
that would cause merging of that state with the initial state, we
failed to recognize the problem. The test cases for the prior commit
evidently only exercised situations where a tentative merge with the
initial state occurs before one with the final state. If it goes the
other way around, we'll happily merge the initial and final states,
either producing a broken final graph that would never match anything,
or triggering the Assert added by the prior commit.
It's tempting to consider switching the merge direction when the merge
involves the final state, but I lack the time to analyze that idea in
detail. Instead just keep track of the flag changes that would result
from proposed merges, in the same way that the prior commit tracked
proposed parent links.
Along the way, add some more debugging support, because I'm not entirely
confident that this is the last bug here. And tweak matters so that
the transformed.dot file uses small integers rather than pointer values
to identify states; that makes it more readable if you're just eyeballing
it rather than fooling with Graphviz. And rename a couple of identically
named struct fields to reduce confusion.
Per report from Corey Csuhta. Add a test case based on his example.
(Note: this case does not trigger the bug under 9.3, apparently because
its different measurement of costs causes it to stop merging states before
it hits the failure. I spent some time trying to find a variant that would
fail in 9.3, without success; but I'm sure such cases exist.)
Like the previous patch, back-patch to 9.3 where this code was added.
Report: https://postgr.es/m/E2B01A4B-4530-406B-8D17-2F67CF9A16BA@csuhta.com
regexport.c thought it could just ignore LACON arcs, but the correct
behavior is to treat them as satisfiable while consuming zero input
(rather reminiscently of commit 9f1e642d5). Otherwise, the emitted
simplified-NFA representation may contain no paths leading from initial
to final state, which unsurprisingly confuses pg_trgm, as seen in
bug #14623 from Jeff Janes.
Since regexport's output representation has no concept of an arc that
consumes zero input, recurse internally to find the next normal arc(s)
after any LACON transitions. We'd be forced into changing that
representation if a LACON could be the last arc reaching the final
state, but fortunately the regex library never builds NFAs with such
a configuration, so there always is a next normal arc.
Back-patch to 9.3 where this logic was introduced.
Discussion: https://postgr.es/m/20170413180503.25948.94871@wrigleys.postgresql.org
Hash indexes can contain both pages which are all-zeroes (i.e.
PageIsNew()) and pages which have been initialized but currently
aren't used. The latter category can happen either when a page
has been reserved but not yet used or when it is used for a time
and then freed. pgstattuple was only prepared to deal with the
pages that are actually-zeroes, which it called zero_pages.
Rename the column to unused_pages (extension version 1.5 is
as-yet-unreleased) and make it count both kinds of unused pages.
Along the way, slightly tidy up the way we test for pages of
various types.
Robert Haas and Ashutosh Sharma, reviewed by Amit Kapila
Discussion: http://postgr.es/m/CAE9k0PkTtKFB3YndOyQMjwuHx+-FtUP1ynK8E-nHtetoow3NtQ@mail.gmail.com
Commit 0bf3ae88a encountered a need to pass the finally chosen remote qual
conditions forward from postgresGetForeignPlan to postgresPlanDirectModify.
It solved that by sticking them into the plan node's fdw_private list,
which in hindsight was a pretty bad idea. In the first place, there's no
use for those qual trees either in EXPLAIN or execution; indeed they could
never safely be used for any post-planning purposes, because they would not
get processed by setrefs.c. So they're just dead weight to carry around in
the finished plan tree, plus being an attractive nuisance for somebody who
might get the idea that they could be used that way. Secondly, because
those qual trees (sometimes) contained RestrictInfos, they created a
plan-transmission hazard for parallel query, which is how come we noticed a
problem. We dealt with that symptom in commit 28b047875, but really a more
straightforward and more efficient fix is to pass the data through in a new
field of struct PgFdwRelationInfo. So do it that way. (There's no need
to revert 28b047875, as it has sufficient reason to live anyway.)
Per fuzz testing by Andreas Seltenreich.
Discussion: https://postgr.es/m/87tw5x4vcu.fsf@credativ.de
Clauses in the lists retained by postgres_fdw during planning were
sometimes bare boolean clauses, sometimes RestrictInfos, and sometimes
a mixture of the two in the same list. The comment about that situation
didn't come close to telling the full truth, either. Aside from being
confusing, this had a couple of bad practical consequences:
* waste of planning cycles due to inability to cache per-clause selectivity
and cost estimates;
* sometimes, RestrictInfos would sneak into the fdw_private list of a
finished Plan node, causing failures if, for example, we tried to ship
the Plan tree to a parallel worker.
(It may well be that it's a bug in the parallel-query logic that we
would ever try to ship such a plan to a parallel worker, but in any
case this deserves to be cleaned up.)
To fix, rearrange so that clause lists in PgFdwRelationInfo are always
lists of RestrictInfos, and then strip the RestrictInfos at the last
minute when making a Plan node. In passing do a bit of refactoring and
comment cleanup in postgresGetForeignPlan and foreign_join_ok.
Although the messiness here dates back at least to 9.6, there's no evidence
that it causes anything worse than wasted planning cycles in 9.6, so no
back-patch for now.
Per fuzz testing by Andreas Seltenreich.
Tom Lane and Ashutosh Bapat
Discussion: https://postgr.es/m/87tw5x4vcu.fsf@credativ.de
This extends the castNode() notation introduced by commit 5bcab1114 to
provide, in one step, extraction of a list cell's pointer and coercion to
a concrete node type. For example, "lfirst_node(Foo, lc)" is the same
as "castNode(Foo, lfirst(lc))". Almost half of the uses of castNode
that have appeared so far include a list extraction call, so this is
pretty widely useful, and it saves a few more keystrokes compared to the
old way.
As with the previous patch, back-patch the addition of these macros to
pg_list.h, so that the notation will be available when back-patching.
Patch by me, after an idea of Andrew Gierth's.
Discussion: https://postgr.es/m/14197.1491841216@sss.pgh.pa.us
The new partitioned table capability added a new relkind, namely
RELKIND_PARTITIONED_TABLE. Update sepgsql to treat this new relkind
exactly the same way it does RELKIND_RELATION.
In addition, add regression test coverage for partitioned tables.
Issue raised by Stephen Frost and initial patch by Mike Palmiotto.
Review by Tom Lane and Robert Haas, and editorializing by me.
Discussion: https://postgr.es/m/flat/623bcaae-112e-ced0-8c22-a84f75ae0c53%40joeconway.com
If there can certainly be no more than one matching inner row for a given
outer row, then the executor can move on to the next outer row as soon as
it's found one match; there's no need to continue scanning the inner
relation for this outer row. This saves useless scanning in nestloop
and hash joins. In merge joins, it offers the opportunity to skip
mark/restore processing, because we know we have not advanced past the
first possible match for the next outer row.
Of course, the devil is in the details: the proof of uniqueness must
depend only on joinquals (not otherquals), and if we want to skip
mergejoin mark/restore then it must depend only on merge clauses.
To avoid adding more planning overhead than absolutely necessary,
the present patch errs in the conservative direction: there are cases
where inner_unique or skip_mark_restore processing could be used, but
it will not do so because it's not sure that the uniqueness proof
depended only on "safe" clauses. This could be improved later.
David Rowley, reviewed and rather heavily editorialized on by me
Discussion: https://postgr.es/m/CAApHDvqF6Sw-TK98bW48TdtFJ+3a7D2mFyZ7++=D-RyPsL76gw@mail.gmail.com
Instead of allocating memory in brin_deform_tuple and brin_copy_tuple
over and over during a scan, allow reuse of previously allocated memory.
This is said to make for a measurable performance improvement.
Author: Jinyu Zhang, Álvaro Herrera
Reviewed by: Tomas Vondra
Discussion: https://postgr.es/m/495deb78.4186.1500dacaa63.Coremail.beijing_pg@163.com
At -Og optimization gcc warns that variable tclass may be used
uninitialized when relkind == RELKIND_INDEX. Actually that can't
happen due to an early return, but quiet the compiler by initializing
tclass to 0.
In passing, use uint16_t consistently for the declaration of tclass.
Complaint and initial patch by Mike Palmiotto. Editorializing by me.
Probably not worth backpatching given that it is cosmetic, so apply
to development head only.
Discussion: https://postgr.es/m/flat/623bcaae-112e-ced0-8c22-a84f75ae0c53%40joeconway.com
<selinux/label.h> includes <stdbool.h>, which creates an incompatible
We don't care if <stdbool.h> redefines "true"/"false"; those are close
enough.
Complaint and initial patch by Mike Palmiotto. Final approach per
Tom Lane's suggestion, as discussed on hackers. Backpatching to
all supported branches.
Discussion: https://postgr.es/m/flat/623bcaae-112e-ced0-8c22-a84f75ae0c53%40joeconway.com
Follow on patch in the multi-variate statistics patch series.
CREATE STATISTICS s1 WITH (dependencies) ON (a, b) FROM t;
ANALYZE;
will collect dependency stats on (a, b) and then use the measured
dependency in subsequent query planning.
Commit 7b504eb282 added
CREATE STATISTICS with n-distinct coefficients. These are now
specified using the mutually exclusive option WITH (ndistinct).
Author: Tomas Vondra, David Rowley
Reviewed-by: Kyotaro HORIGUCHI, Álvaro Herrera, Dean Rasheed, Robert Haas
and many other comments and contributions
Discussion: https://postgr.es/m/56f40b20-c464-fad2-ff39-06b668fac47c@2ndquadrant.com
Make every page in a hash index which isn't all-zeroes have a valid
special space, so that tools like pageinspect don't error out.
Also, make pageinspect cope with all-zeroes pages, because
_hash_alloc_buckets can leave behind large numbers of those until
they're consumed by splits.
Ashutosh Sharma and Robert Haas, reviewed by Amit Kapila.
Original trouble report from Jeff Janes.
Discussion: http://postgr.es/m/CAMkU=1y6NjKmqbJ8wLMhr=F74WzcMALYWcVFhEpm7i=mV=XsOg@mail.gmail.com
Since hash indexes typically have very few overflow pages, adding a
new splitpoint essentially doubles the on-disk size of the index,
which can lead to large and abrupt increases in disk usage (and
perhaps long delays on occasion). To mitigate this problem to some
degree, divide larger splitpoints into four equal phases. This means
that, for example, instead of growing from 4GB to 8GB all at once, a
hash index will now grow from 4GB to 5GB to 6GB to 7GB to 8GB, which
is perhaps still not as smooth as we'd like but certainly an
improvement.
This changes the on-disk format of the metapage, so bump HASH_VERSION
from 2 to 3. This will force a REINDEX of all existing hash indexes,
but that's probably a good idea anyway. First, hash indexes from
pre-10 versions of PostgreSQL could easily be corrupted, and we don't
want to confuse corruption carried over from an older release with any
corruption caused despite the new write-ahead logging in v10. Second,
it will let us remove some backward-compatibility code added by commit
293e24e507.
Mithun Cy, reviewed by Amit Kapila, Jesper Pedersen and me. Regression
test outputs updated by me.
Discussion: http://postgr.es/m/CAD__OuhG6F1gQLCgMQNnMNgoCvOLQZz9zKYJQNYvYmmJoM42gA@mail.gmail.com
Discussion: http://postgr.es/m/CA+TgmoYty0jCf-pa+m+vYUJ716+AxM7nv_syvyanyf5O-L_i2A@mail.gmail.com
Currently, the only type of child relation is an "other member rel",
which is the child of a baserel, but in the future joins and even
upper relations may have child rels. To facilitate that, introduce
macros that test to test for particular RelOptKind values, and use
them in various places where they help to clarify the sense of a test.
(For example, a test may allow RELOPT_OTHER_MEMBER_REL either because
it intends to allow child rels, or because it intends to allow simple
rels.)
Also, remove find_childrel_top_parent, which will not work for a
child rel that is not a baserel. Instead, add a new RelOptInfo
member top_parent_relids to track the same kind of information in a
more generic manner.
Ashutosh Bapat, slightly tweaked by me. Review and testing of the
patch set from which this was taken by Rajkumar Raghuwanshi and Rafia
Sabih.
Discussion: http://postgr.es/m/CA+TgmoagTnF2yqR3PT2rv=om=wJiZ4-A+ATwdnriTGku1CLYxA@mail.gmail.com
A QueryEnvironment concept is added, which allows new types of
objects to be passed into queries from parsing on through
execution. At this point, the only thing implemented is a
collection of EphemeralNamedRelation objects -- relations which
can be referenced by name in queries, but do not exist in the
catalogs. The only type of ENR implemented is NamedTuplestore, but
provision is made to add more types fairly easily.
An ENR can carry its own TupleDesc or reference a relation in the
catalogs by relid.
Although these features can be used without SPI, convenience
functions are added to SPI so that ENRs can easily be used by code
run through SPI.
The initial use of all this is going to be transition tables in
AFTER triggers, but that will be added to each PL as a separate
commit.
An incidental effect of this patch is to produce a more informative
error message if an attempt is made to modify the contents of a CTE
from a referencing DML statement. No tests previously covered that
possibility, so one is added.
Kevin Grittner and Thomas Munro
Reviewed by Heikki Linnakangas, David Fetter, and Thomas Munro
with valuable comments and suggestions from many others
Don't import partitions. Do import partitioned tables which are
not themselves partitions.
Report by Stephen Frost. Design and patch by Michael Paquier,
reviewed by Amit Langote. Documentation revised by me.
Discussion: http://postgr.es/m/20170309141531.GD9812@tamriel.snowman.net
Three nologin roles with non-overlapping privs are created by default
* pg_read_all_settings - read all GUCs.
* pg_read_all_stats - pg_stat_*, pg_database_size(), pg_tablespace_size()
* pg_stat_scan_tables - may lock/scan tables
Top level role - pg_monitor includes all of the above by default, plus others
Author: Dave Page
Reviewed-by: Stephen Frost, Robert Haas, Peter Eisentraut, Simon Riggs
The V0 convention is failure prone because we've so far assumed that a
function is V0 if PG_FUNCTION_INFO_V1 is missing, leading to crashes
if a function was coded against the V1 interface. V0 doesn't allow
proper NULL, SRF and toast handling. V0 doesn't offer features that
V1 doesn't.
Thus remove V0 support and obsolete fmgr README contents relating to
it.
Author: Andres Freund, with contributions by Peter Eisentraut & Craig Ringer
Reviewed-By: Peter Eisentraut, Craig Ringer
Discussion: https://postgr.es/m/20161208213441.k3mbno4twhg2qf7g@alap3.anarazel.de
There are no functional changes here; this simply encapsulates knowledge
of the ItemPointerData struct so that a future patch can change things
without more breakage.
All direct users of ip_blkid and ip_posid are changed to use existing
macros ItemPointerGetBlockNumber and ItemPointerGetOffsetNumber
respectively. For callers where that's inappropriate (because they
Assert that the itempointer is is valid-looking), add
ItemPointerGetBlockNumberNoCheck and ItemPointerGetOffsetNumberNoCheck,
which lack the assertion but are otherwise identical.
Author: Pavan Deolasee
Discussion: https://postgr.es/m/CABOikdNnFon4cJiL=h1mZH3bgUeU+sWHuU4Yr8AB=j3A2p1GiA@mail.gmail.com
The conname variable was not initialized in some code paths, resulting
in error reports referring to the "unnamed" connection rather than the
correct connection name.
Author: Rushabh Lathia <rushabh.lathia@gmail.com>
The trouble with the original choice here is that "?" is a valid (and
indeed used) operator name, so that you could end up with ambiguous
statement texts like "SELECT ? ? ?". With this patch, you instead
see "SELECT $1 ? $2", which seems significantly more readable.
The numbers used for this purpose begin after the last actual $N parameter
in the particular query. The conflict with external parameters has its own
potential for confusion of course, but it was agreed to be an improvement
over the previous behavior.
Lukas Fittl
Discussion: https://postgr.es/m/CAP53PkxeaCuwYmF-A4J5z2-qk5fYFo5_NH3gpXGJJBxv1DMwEw@mail.gmail.com
Fix all perlcritic warnings of severity level 5, except in
src/backend/utils/Gen_dummy_probes.pl, which is automatically generated.
Reviewed-by: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
This extends the Aggregate node with two new features: HashAggregate
can now run multiple hashtables concurrently, and a new strategy
MixedAggregate populates hashtables while doing sorted grouping.
The planner will now attempt to save as many sorts as possible when
planning grouping sets queries, while not exceeding work_mem for the
estimated combined sizes of all hashtables used. No SQL-level changes
are required. There should be no user-visible impact other than the
new EXPLAIN output and possible changes to result ordering when ORDER
BY was not used (which affected a few regression tests). The
enable_hashagg option is respected.
Author: Andrew Gierth
Reviewers: Mark Dilger, Andres Freund
Discussion: https://postgr.es/m/87vatszyhj.fsf@news-spur.riddles.org.uk
This replaces the old, recursive tree-walk based evaluation, with
non-recursive, opcode dispatch based, expression evaluation.
Projection is now implemented as part of expression evaluation.
This both leads to significant performance improvements, and makes
future just-in-time compilation of expressions easier.
The speed gains primarily come from:
- non-recursive implementation reduces stack usage / overhead
- simple sub-expressions are implemented with a single jump, without
function calls
- sharing some state between different sub-expressions
- reduced amount of indirect/hard to predict memory accesses by laying
out operation metadata sequentially; including the avoidance of
nearly all of the previously used linked lists
- more code has been moved to expression initialization, avoiding
constant re-checks at evaluation time
Future just-in-time compilation (JIT) has become easier, as
demonstrated by released patches intended to be merged in a later
release, for primarily two reasons: Firstly, due to a stricter split
between expression initialization and evaluation, less code has to be
handled by the JIT. Secondly, due to the non-recursive nature of the
generated "instructions", less performance-critical code-paths can
easily be shared between interpreted and compiled evaluation.
The new framework allows for significant future optimizations. E.g.:
- basic infrastructure for to later reduce the per executor-startup
overhead of expression evaluation, by caching state in prepared
statements. That'd be helpful in OLTPish scenarios where
initialization overhead is measurable.
- optimizing the generated "code". A number of proposals for potential
work has already been made.
- optimizing the interpreter. Similarly a number of proposals have
been made here too.
The move of logic into the expression initialization step leads to some
backward-incompatible changes:
- Function permission checks are now done during expression
initialization, whereas previously they were done during
execution. In edge cases this can lead to errors being raised that
previously wouldn't have been, e.g. a NULL array being coerced to a
different array type previously didn't perform checks.
- The set of domain constraints to be checked, is now evaluated once
during expression initialization, previously it was re-built
every time a domain check was evaluated. For normal queries this
doesn't change much, but e.g. for plpgsql functions, which caches
ExprStates, the old set could stick around longer. The behavior
around might still change.
Author: Andres Freund, with significant changes by Tom Lane,
changes by Heikki Linnakangas
Reviewed-By: Tom Lane, Heikki Linnakangas
Discussion: https://postgr.es/m/20161206034955.bh33paeralxbtluv@alap3.anarazel.de
Previously, it was unsafe to execute a plan in parallel if
ExecutorRun() might be called with a non-zero row count. However,
it's quite easy to fix things up so that we can support that case,
provided that it is known that we will never call ExecutorRun() a
second time for the same QueryDesc. Add infrastructure to signal
this, and cross-checks to make sure that a caller who claims this is
true doesn't later reneg.
While that pattern never happens with queries received directly from a
client -- there's no way to know whether multiple Execute messages
will be sent unless the first one requests all the rows -- it's pretty
common for queries originating from procedural languages, which often
limit the result to a single tuple or to a user-specified number of
tuples.
This commit doesn't actually enable parallelism in any additional
cases, because currently none of the places that would be able to
benefit from this infrastructure pass CURSOR_OPT_PARALLEL_OK in the
first place, but it makes it much more palatable to pass
CURSOR_OPT_PARALLEL_OK in places where we currently don't, because it
eliminates some cases where we'd end up having to run the parallel
plan serially.
Patch by me, based on some ideas from Rafia Sabih and corrected by
Rafia Sabih based on feedback from Dilip Kumar and myself.
Discussion: http://postgr.es/m/CA+TgmobXEhvHbJtWDuPZM9bVSLiTj-kShxQJ2uM5GPDze9fRYA@mail.gmail.com
Add functionality for a new subscription to copy the initial data in the
tables and then sync with the ongoing apply process.
For the copying, add a new internal COPY option to have the COPY source
data provided by a callback function. The initial data copy works on
the subscriber by receiving COPY data from the publisher and then
providing it locally into a COPY that writes to the destination table.
A WAL receiver can now execute full SQL commands. This is used here to
obtain information about tables and publications.
Several new options were added to CREATE and ALTER SUBSCRIPTION to
control whether and when initial table syncing happens.
Change pg_dump option --no-create-subscription-slots to
--no-subscription-connect and use the new CREATE SUBSCRIPTION
... NOCONNECT option for that.
Author: Petr Jelinek <petr.jelinek@2ndquadrant.com>
Tested-by: Erik Rijkers <er@xs4all.nl>
This will allow enums to be used in exclusion constraints.
The code uses the new CallerFInfoFunctionCall infrastructure in fmgr,
and the support for it added to btree_gist in commit 393bb504d7.
Reviewed by Tom Lane and Anastasia Lubennikova
Discussion: http://postgr.es/m/56EA8A71.8060107@dunslane.net
None of the existing types actually need to use this mechanism, but this
will allow support for enum types which will need it. A separate patch
will adjust the varlena types support for consistency.
Reviewed by Tom Lane and Anastasia Lubennikova
Discussion: http://postgr.es/m/27220.1478360811@sss.pgh.pa.us
The previous deparsing logic wasn't smart enough to produce subqueries
when deparsing; make it smart enough to do that. However, we only do
it that way when necessary, because it generates more complicated SQL
which will be harder for any humans reading the queries to understand.
Etsuro Fujita, reviewed by Ashutosh Bapat
Discussion: http://postgr.es/m/c449261a-b033-dc02-9254-2fe5b7044795@lab.ntt.co.jp
This adds in support for EUI-64 MAC addresses by adding a new data type
called 'macaddr8' (using our usual convention of indicating the number
of bytes stored).
This was largely a copy-and-paste from the macaddr data type, with
appropriate adjustments for having 8 bytes instead of 6 and adding
support for converting a provided EUI-48 (6 byte format) to the EUI-64
format. Conversion from EUI-48 to EUI-64 inserts FFFE as the 4th and
5th bytes but does not perform the IPv6 modified EUI-64 action of
flipping the 7th bit, but we add a function to perform that specific
action for the user as it may be commonly done by users who wish to
calculate their IPv6 address based on their network prefix and 48-bit
MAC address.
Author: Haribabu Kommi, with a good bit of rework of macaddr8_in by me.
Reviewed by: Vitaly Burovoy, Kuntal Ghosh
Discussion: https://postgr.es/m/CAJrrPGcUi8ZH+KkK+=TctNQ+EfkeCEHtMU_yo1mvX8hsk_ghNQ@mail.gmail.com
Previously if a directory had both isolationtester and plain
regression tests, they couldn't be run in parallel, because they'd
access the same files/directories. That, so far, only affected
contrib/test_decoding.
Rather than fix that locally in contrib/test_decoding, improve
pg_regress_isolation_[install]check to use separate resources from
plain regression tests.
That requires a minor change in pg_regress, namely that the
--outputdir is created if not already existing, that seems like good
idea anyway.
Use the improved helpers even where previously not used.
Author: Tom Lane and Andres Freund
Discussion: https://postgr.es/m/20170311194831.vm5ikpczq52c2drg@alap3.anarazel.de
The previous coding of the test was vulnerable against autovacuum
triggering work on one of the tables in check_btree.sql.
For the purpose of the test it's entirely sufficient to check for
locks taken by the current process, so add an appropriate restriction.
While touching the test, expand it to also check for locks on the
underlying relations, rather than just the indexes.
Reported-By: Tom Lane
Discussion: https://postgr.es/m/30354.1489434301@sss.pgh.pa.us
The warning about hash indexes not being write-ahead logged and their
use being discouraged has been removed. "snapshot too old" is now
supported for tables with hash indexes. Most importantly, barring
bugs, hash indexes will now be crash-safe and usable on standbys.
This commit doesn't yet add WAL consistency checking for hash
indexes, as we now have for other index types; a separate patch has
been submitted to cure that lack.
Amit Kapila, reviewed and slightly modified by me. The larger patch
series of which this is a part has been reviewed and tested by Álvaro
Herrera, Ashutosh Sharma, Mark Kirkwood, Jeff Janes, and Jesper
Pedersen.
Discussion: http://postgr.es/m/CAA4eK1JOBX=YU33631Qh-XivYXtPSALh514+jR8XeD7v+K3r_Q@mail.gmail.com
This makes almost all core code follow the policy introduced in the
previous commit. Specific decisions:
- Text search support functions with char* and length arguments, such as
prsstart and lexize, may receive unaligned strings. I doubt
maintainers of non-core text search code will notice.
- Use plain VARDATA() on values detoasted or synthesized earlier in the
same function. Use VARDATA_ANY() on varlenas sourced outside the
function, even if they happen to always have four-byte headers. As an
exception, retain the universal practice of using VARDATA() on return
values of SendFunctionCall().
- Retain PG_GETARG_BYTEA_P() in pageinspect. (Page images are too large
for a one-byte header, so this misses no optimization.) Sites that do
not call get_page_from_raw() typically need the four-byte alignment.
- For now, do not change btree_gist. Its use of four-byte headers in
memory is partly entangled with storage of 4-byte headers inside
GBT_VARKEY, on disk.
- For now, do not change gtrgm_consistent() or gtrgm_distance(). They
incorporate the varlena header into a cache, and there are multiple
credible implementation strategies to consider.
Detect fclose() failures; given "ln -s /dev/full $PGDATA/devfull",
"pg_file_write('devfull', 'x', true)" now fails as it should. Don't
leak a stream when fwrite() fails. Remove a born-ineffective test that
aimed to skip zero-length writes. Back-patch to 9.2 (all supported
versions).
In functions that issue a deconstruct_array() call, consistently use
plain VARSIZE()/VARDATA() on the array elements. Prior practice was
divided between those and VARSIZE_ANY_EXHDR()/VARDATA_ANY().
Although it's reasonable to expect that most of these constants will
never change, that does not make it good programming style to hard-code
the value rather than using the RELKIND_FOO macros.
I think I've now gotten all the hard-coded references in C code.
Unfortunately there's no equally convenient way to parameterize
SQL files ...
Discussion: https://postgr.es/m/11145.1488931324@sss.pgh.pa.us
This is the beginning of a collection of SQL-callable functions to
verify the integrity of data files. For now it only contains code to
verify B-Tree indexes.
This adds two SQL-callable functions, validating B-Tree consistency to
a varying degree. Check the, extensive, docs for details.
The goal is to later extend the coverage of the module to further
access methods, possibly including the heap. Once checks for
additional access methods exist, we'll likely add some "dispatch"
functions that cover multiple access methods.
Author: Peter Geoghegan, editorialized by Andres Freund
Reviewed-By: Andres Freund, Tomas Vondra, Thomas Munro,
Anastasia Lubennikova, Robert Haas, Amit Langote
Discussion: CAM3SWZQzLMhMwmBqjzK+pRKXrNUZ4w90wYMUWfkeV8mZ3Debvw@mail.gmail.com
The contrib extensions pageinspect, pg_visibility and pgstattuple only
work against regular relations which have storage. They don't work
against foreign tables, partitioned (parent) tables, views, et al.
Add checks to the user-callable functions to return a useful error
message to the user if they mistakenly pass an invalid relation to a
function which doesn't accept that kind of relation.
In passing, improve some of the existing checks to use ereport() instead
of elog(), add a function to consolidate common checks where
appropriate, and add some regression tests.
Author: Amit Langote, with various changes by me
Reviewed by: Michael Paquier and Corey Huinker
Discussion: https://postgr.es/m/ab91fd9d-4751-ee77-c87b-4dd704c1e59c@lab.ntt.co.jp
XMLTABLE is defined by the SQL/XML standard as a feature that allows
turning XML-formatted data into relational form, so that it can be used
as a <table primary> in the FROM clause of a query.
This new construct provides significant simplicity and performance
benefit for XML data processing; what in a client-side custom
implementation was reported to take 20 minutes can be executed in 400ms
using XMLTABLE. (The same functionality was said to take 10 seconds
using nested PostgreSQL XPath function calls, and 5 seconds using
XMLReader under PL/Python).
The implemented syntax deviates slightly from what the standard
requires. First, the standard indicates that the PASSING clause is
optional and that multiple XML input documents may be given to it; we
make it mandatory and accept a single document only. Second, we don't
currently support a default namespace to be specified.
This implementation relies on a new executor node based on a hardcoded
method table. (Because the grammar is fixed, there is no extensibility
in the current approach; further constructs can be implemented on top of
this such as JSON_TABLE, but they require changes to core code.)
Author: Pavel Stehule, Álvaro Herrera
Extensively reviewed by: Craig Ringer
Discussion: https://postgr.es/m/CAFj8pRAgfzMD-LoSmnMGybD0WsEznLHWap8DO79+-GTRAPR4qA@mail.gmail.com
This introduces a new generic SASL authentication method, similar to the
GSS and SSPI methods. The server first tells the client which SASL
authentication mechanism to use, and then the mechanism-specific SASL
messages are exchanged in AuthenticationSASLcontinue and PasswordMessage
messages. Only SCRAM-SHA-256 is supported at the moment, but this allows
adding more SASL mechanisms in the future, without changing the overall
protocol.
Support for channel binding, aka SCRAM-SHA-256-PLUS is left for later.
The SASLPrep algorithm, for pre-processing the password, is not yet
implemented. That could cause trouble, if you use a password with
non-ASCII characters, and a client library that does implement SASLprep.
That will hopefully be added later.
Authorization identities, as specified in the SCRAM-SHA-256 specification,
are ignored. SET SESSION AUTHORIZATION provides more or less the same
functionality, anyway.
If a user doesn't exist, perform a "mock" authentication, by constructing
an authentic-looking challenge on the fly. The challenge is derived from
a new system-wide random value, "mock authentication nonce", which is
created at initdb, and stored in the control file. We go through these
motions, in order to not give away the information on whether the user
exists, to unauthenticated users.
Bumps PG_CONTROL_VERSION, because of the new field in control file.
Patch by Michael Paquier and Heikki Linnakangas, reviewed at different
stages by Robert Haas, Stephen Frost, David Steele, Aleksander Alekseev,
and many others.
Discussion: https://www.postgresql.org/message-id/CAB7nPqRbR3GmFYdedCAhzukfKrgBLTLtMvENOmPrVWREsZkF8g%40mail.gmail.com
Discussion: https://www.postgresql.org/message-id/CAB7nPqSMXU35g%3DW9X74HVeQp0uvgJxvYOuA4A-A3M%2B0wfEBv-w%40mail.gmail.com
Discussion: https://www.postgresql.org/message-id/55192AFE.6080106@iki.fi
This way both frontend and backends can use them. The functions are taken
from pgcrypto, which now fetches the source files it needs from
src/common/.
A new interface is designed for the SHA2 functions, which allow linking
to either OpenSSL or the in-core stuff taken from KAME as needed.
Michael Paquier, reviewed by Robert Haas.
Discussion: https://www.postgresql.org/message-id/CAB7nPqTGKuTM5jiZriHrNaQeVqp5e_iT3X4BFLWY_HyHxLvySQ%40mail.gmail.com
PQerrorMessage() returns an error message with a trailing newline, but
in backend use (dblink, postgres_fdw, libpqwalreceiver), we want to have
the error message without that for emitting via ereport(). To simplify
that, add a function pchomp() that returns a pstrdup'ed string with the
trailing newline characters removed.
c.h #includes a number of core libc header files, such as <stdio.h>.
There's no point in re-including these after having read postgres.h,
postgres_fe.h, or c.h; so remove code that did so.
While at it, also fix some places that were ignoring our standard pattern
of "include postgres[_fe].h, then system header files, then other Postgres
header files". While there's not any great magic in doing it that way
rather than system headers last, it's silly to have just a few files
deviating from the general pattern. (But I didn't attempt to enforce this
globally, only in files I was touching anyway.)
I'd be the first to say that this is mostly compulsive neatnik-ism,
but over time it might save enough compile cycles to be useful.
Twiddle the replication-related code so that its timestamp variables
are declared TimestampTz, rather than the uninformative "int64" that
was previously used for meant-to-be-always-integer timestamps.
This resolves the int64-vs-TimestampTz declaration inconsistencies
introduced by commit 7c030783a, though in the opposite direction to
what was originally suggested.
This required including datatype/timestamp.h in a couple more places
than before. I decided it would be a good idea to slim down that
header by not having it pull in <float.h> etc, as those headers are
no longer at all relevant to its purpose. Unsurprisingly, a small number
of .c files turn out to have been depending on those inclusions, so add
them back in the .c files as needed.
Discussion: https://postgr.es/m/26788.1487455319@sss.pgh.pa.us
Discussion: https://postgr.es/m/27694.1487456324@sss.pgh.pa.us
The logic for removing excess trigrams from the result was faulty.
It intends to avoid merging the initial and final states of the NFA,
which is necessary, but in testing whether removal of a specific trigram
would cause that, it failed to consider the combined effects of all the
state merges that that trigram's removal would cause. This could result
in a broken final graph that would never match anything, leading to GIN
or GiST indexscans not finding anything.
To fix, add a "tentParent" field that is used only within this loop,
and set it to show state merges that we are tentatively going to do.
While examining a particular arc, we must chase up through tentParent
links as well as regular parent links (the former can only appear atop
the latter), and we must account for state init/fin flag merges that
haven't actually been done yet.
To simplify the latter, combine the separate init and fin bool fields
into a bitmap flags field. I also chose to get rid of the "children"
state list, which seems entirely inessential.
Per bug #14563 from Alexey Isayko, which the added test cases are based on.
Back-patch to 9.3 where this code was added.
Report: https://postgr.es/m/20170222111446.1256.67547@wrigleys.postgresql.org
Discussion: https://postgr.es/m/8816.1487787594@sss.pgh.pa.us
There is no specific reason for this right now, but keeping support for
old Python versions around indefinitely increases the maintenance
burden. The oldest supported Python version is now Python 2.4, which is
still shipped in RHEL/CentOS 5 by default.
In configure, add a check for the required Python version and give a
friendly error message for an old version, instead of relying on an
obscure build error later on.
In combination with 569174f1be, which
taught the btree AM how to perform parallel index scans, this allows
parallel index scan plans on btree indexes. This infrastructure
should be general enough to support parallel index scans for other
index AMs as well, if someone updates them to support parallel
scans.
Amit Kapila, reviewed and tested by Anastasia Lubennikova, Tushar
Ahuja, and Haribabu Kommi, and me.
This module was intended to ease migrations of applications that used
the pre-8.3 version of text search to the in-core version introduced
in that release. However, since all pre-8.3 releases of the database
have been out of support for more than 5 years at this point, we
expect that few people are depending on it at this point. If some
people still need it, nothing prevents it from being maintained as a
separate extension, outside of core.
Discussion: http://postgr.es/m/CA+Tgmob5R8aDHiFRTQsSJbT1oreKg2FOSBrC=2f4tqEH3dOMAg@mail.gmail.com
Commit f82ec32ac3 renamed the pg_xlog
directory to pg_wal. To make things consistent, and because "xlog" is
terrible terminology for either "transaction log" or "write-ahead log"
rename all SQL-callable functions that contain "xlog" in the name to
instead contain "wal". (Note that this may pose an upgrade hazard for
some users.)
Similarly, rename the xlog_position argument of the functions that
create slots to be called wal_position.
Discussion: https://www.postgresql.org/message-id/CA+Tgmob=YmA=H3DbW1YuOXnFVgBheRmyDkWcD9M8f=5bGWYEoQ@mail.gmail.com
It did that to verify that the page was an overflow page rather than
anything else, but that means that checking the status of all the
overflow bits requires reading the entire index. So don't do that.
The new code validates that the page is not a primary bucket page
or bitmap page by looking at the metapage, so that using this on
large numbers of pages can be reasonably efficient.
Ashutosh Sharma, per a complaint from me, and with further
modifications by me.
It's always been possible for index AMs to cache data across successive
amgettuple calls within a single SQL command: the IndexScanDesc.opaque
field is meant for precisely that. However, no comparable facility
exists for amortizing setup work across successive aminsert calls.
This patch adds such a feature and teaches GIN, GIST, and BRIN to use it
to amortize catalog lookups they'd previously been doing on every call.
(The other standard index AMs keep everything they need in the relcache,
so there's little to improve there.)
For GIN, the overall improvement in a statement that inserts many rows
can be as much as 10%, though it seems a bit less for the other two.
In addition, this makes a really significant difference in runtime
for CLOBBER_CACHE_ALWAYS tests, since in those builds the repeated
catalog lookups are vastly more expensive.
The reason this has been hard up to now is that the aminsert function is
not passed any useful place to cache per-statement data. What I chose to
do is to add suitable fields to struct IndexInfo and pass that to aminsert.
That's not widening the index AM API very much because IndexInfo is already
within the ken of ambuild; in fact, by passing the same info to aminsert
as to ambuild, this is really removing an inconsistency in the AM API.
Discussion: https://postgr.es/m/27568.1486508680@sss.pgh.pa.us
This avoids a very significant amount of buffer manager traffic and
contention when scanning hash indexes, because it's no longer
necessary to lock and pin the metapage for every scan. We do need
some way of figuring out when the cache is too stale to use any more,
so that when we lock the primary bucket page to which the cached
metapage points us, we can tell whether a split has occurred since we
cached the metapage data. To do that, we use the hash_prevblkno field
in the primary bucket page, which would otherwise always be set to
InvalidBuffer.
This patch contains code so that it will continue working (although
less efficiently) with hash indexes built before this change, but
perhaps we should consider bumping the hash version and ripping out
the compatibility code. That decision can be made later, though.
Mithun Cy, reviewed by Jesper Pedersen, Amit Kapila, and by me.
Before committing, I made a number of cosmetic changes to the last
posted version of the patch, adjusted _hash_getcachedmetap to be more
careful about order of operation, and made some necessary updates to
the pageinspect documentation and regression tests.
Uniformly expose unsigned quantities using the next-wider signed
integer type (since we have no unsigned types at the SQL level).
At the SQL level, this results a change to report itemoffset as
int4 rather than int2. Also at the SQL level, report one value
that is an OID as type oid. Under the hood, uniformly use macros
that match the SQL output type as to both width and signedness.
Since pgstattuple v1.5 hasn't been released yet, no need for a new
extension version. The new function exposes statistics about hash
indexes similar to what other pgstatindex functions return for other
index types.
Ashutosh Sharma, reviewed by Kuntal Ghosh. Substantial further
revisions by me.
On machines with MAXALIGN = 8, the payload of a bytea is not maxaligned,
since it will start 4 bytes into a palloc'd value. On alignment-picky
hardware, this will cause failures in accesses to 8-byte-wide values
within the page. We already encountered this problem when we introduced
GIN index inspection functions, and fixed it in commit 84ad68d64. Make
use of the same function for hash indexes.
A small difficulty is that up to now contrib/pageinspect has not shared
any functions at all across files. To support that, introduce a common
header file "pageinspect.h" for the module.
Also, move get_page_from_raw() out of ginfuncs.c, where it didn't
especially belong, and put it in rawpage.c which seems a more natural home.
Per buildfarm.
Discussion: https://postgr.es/m/17311.1486134714@sss.pgh.pa.us
Per a report from Tom Lane, the ffactor reported by hash_metapage_info
and the free_size reported by hash_page_stats vary by platform.
Ashutosh Sharma and Robert Haas
It seems like somebody used a dartboard while choosing integer widths
for the various values taken and returned by these functions ... and
then threw a fresh set of darts while writing the SQL declarations.
This patch brings the C code into line with what the SQL declarations
say, which is enough to make it not dump core on the particular 32-bit
machine I'm testing on. But I think we could do with another round
of looking at what the datum widths *should* be. For instance, it's
not all that sensible that hash_bitmap_info decided to use int64 to
represent a BlockNumber input when get_raw_page doesn't do it that way.
There's also a remaining problem that the expected outputs from the
test script are platform-dependent, but I'll leave that issue for
somebody else.
Per buildfarm.
Commit 08bf6e5295 seems not to have
used the correct *GetDatum and PG_GETARG_* macros for the SQL types
in some cases, and some of the SQL types seem to have been poorly
chosen, too. Try to fix it. I'm not sure if this is the reason
why the buildfarm is currently unhappy with this code, but it
seems like a good place to start.
Buildfarm unhappiness reported by Tom Lane.
Patch by Jesper Pedersen and Ashutosh Sharma, with some error handling
improvements by me. Tests from Peter Eisentraut. Reviewed by Álvaro
Herrera, Michael Paquier, Jesper Pedersen, Jeff Janes, Peter
Eisentraut, Amit Kapila, Mithun Cy, and me.
Discussion: http://postgr.es/m/e2ac6c58-b93f-9dd9-f4e6-d6d30add7fdf@redhat.com
Remove $(pkglibdir) from $(rpathdir), since commits
d51924be88 and
eda04886c1 removed direct linkage to
objects stored there. Users are unlikely to notice the difference.
Accompany every $(python_libspec) with $(python_additional_libs); this
doesn't fix a demonstrated bug, but it might do so on rare Python
configurations. With these changes, AIX ceases to be a special case.
The rule is that if pg_authid.rolpassword begins with "md5" and has the
right length, it's an MD5 hash, otherwise it's a plaintext password. The
idiom has been to use isMD5() to check for that, but that gets awkward,
when we add new kinds of verifiers, like the verifiers for SCRAM
authentication in the pending SCRAM patch set. Replace isMD5() with a new
get_password_type() function, so that when new verifier types are added, we
don't need to remember to modify every place that currently calls isMD5(),
to also recognize the new kinds of verifiers.
Also, use the new plain_crypt_verify function in passwordcheck, so that it
doesn't need to know about MD5, or in the future, about other kinds of
hashes or password verifiers.
Reviewed by Michael Paquier and Peter Eisentraut.
Discussion: https://www.postgresql.org/message-id/2d07165c-1793-e243-a2a9-e45b624c7580@iki.fi
Commit 7012b132d added some tests that consumed an excessive amount of
time, more than tripling the time needed for "make installcheck" for this
module. Add filter conditions to reduce the number of rows scanned,
bringing the runtime down to within hailing distance of what it was before.
Jeevan Chalke and Ashutosh Bapat, per a gripe from me
Discussion: https://postgr.es/m/16565.1478104765@sss.pgh.pa.us
This patch doesn't actually make any index AM parallel-aware, but it
provides the necessary functions at the AM layer to do so.
Rahila Syed, Amit Kapila, Robert Haas
Since 69f4b9c plain expression evaluation (and thus normal projection)
can't return sets of tuples anymore. Thus remove code dealing with
that possibility.
This will require adjustments in external code using
ExecEvalExpr()/ExecProject() - that should neither be hard nor very
common.
Author: Andres Freund and Tom Lane
Discussion: https://postgr.es/m/20160822214023.aaxz5l4igypowyri@alap3.anarazel.de
This avoids additional translatable strings for each distinct type, as
well as making our quoting style around type names more consistent
(namely, that we don't quote type names). This continues what started
as f402b99501.
Discussion: https://postgr.es/m/20160401170642.GA57509@alvherre.pgsql
Gen_fmgrtab.pl creates a new file fmgrprotos.h, which contains
prototypes for all functions registered in pg_proc.h. This avoids
having to manually maintain these prototypes across a random variety of
header files. It also automatically enforces a correct function
signature, and since there are warnings about missing prototypes, it
will detect functions that are defined but not registered in
pg_proc.h (or otherwise used).
Reviewed-by: Pavel Stehule <pavel.stehule@gmail.com>