The upstream XSLT stylesheets use some very general XPath expressions in
some places that end up being very slow. We can optimize them with
knowledge about the DocBook document structure and our particular use
thereof. For example, when counting preceding chapters to get a number
for the current chapter, we only need to count preceding sibling
nodes (more or less) instead of searching through the entire node tree
for chapter elements.
This change attacks the slowest pieces as identified by xsltproc
--profile. This makes the HTML build roughly 10 times faster, resulting
in the new total build time being about the same as the old DSSSL-based
build. Some of the non-HTML build targets (especially FO) will also
benefit a bit, but they have not been specifically analyzed.
With this, also remove the pg.fast parameter, which was previously a
hack to get the build to a manageable speed.
Alexander Lakhin <a.lakhin@postgrespro.ru>, with some additional
tweaking by me
regexp_match() is like regexp_matches(), but it disallows the 'g' flag
and in consequence does not need to return a set. Instead, it returns
a simple text array value, or NULL if there's no match. Previously people
usually got that behavior with a sub-select, but this way is considerably
more efficient.
Documentation adjusted so that regexp_match() is presented first and then
regexp_matches() is introduced as a more complicated version. This is
a bit historically revisionist but seems pedagogically better.
Still TODO: extend contrib/citext to support this function.
Emre Hasegeli, reviewed by David Johnston
Discussion: <CAE2gYzy42sna2ME_e3y1KLQ-4UBrB-eVF0SWn8QG39sQSeVhEw@mail.gmail.com>
The performance overhead of this can be significant on Windows, and most
people don't have the tools to view it anyway as Windows does not have
native support for process titles.
Discussion: <0A3221C70F24FB45833433255569204D1F5BE3E8@G01JPEXMBYT05>
Takayuki Tsunakawa
This is a good bit more complicated than the average new-version stamping
commit, because it includes various adjustments in pursuit of changing
from three-part to two-part version numbers. It's likely some further
work will be needed around that change; but this is enough to get through
the regression tests, at least in Unix builds.
Peter Eisentraut and Tom Lane
Per discussion, we should provide such functions to replace the lost
ability to discover AM properties by inspecting pg_am (cf commit
65c5fcd35). The added functionality is also meant to displace any code
that was looking directly at pg_index.indoption, since we'd rather not
believe that the bit meanings in that field are part of any client API
contract.
As future-proofing, define the SQL API to not assume that properties that
are currently AM-wide or index-wide will remain so unless they logically
must be; instead, expose them only when inquiring about a specific index
or even specific index column. Also provide the ability for an index
AM to override the behavior.
In passing, document pg_am.amtype, overlooked in commit 473b93287.
Andrew Gierth, with kibitzing by me and others
Discussion: <87mvl5on7n.fsf@news-spur.riddles.org.uk>
Apparently that's not obvious to everybody, so let's belabor the point.
In passing, document that DROP POLICY has CASCADE/RESTRICT options (which
it does, per gram.y) but they do nothing (I assume, anyway). Also update
some long-obsolete commentary in gram.y.
Discussion: <20160805104837.1412.84915@wrigleys.postgresql.org>
The decision to reuse values of parameters from a previous connection
has been based on whether the new target is a conninfo string. Add this
means of overriding that default. This feature arose as one component
of a fix for security vulnerabilities in pg_dump, pg_dumpall, and
pg_upgrade, so back-patch to 9.1 (all supported versions). In 9.3 and
later, comment paragraphs that required update had already-incorrect
claims about behavior when no connection is open; fix those problems.
Security: CVE-2016-5424
If ANALYZE found no repeated non-null entries in its sample, it set the
column's stadistinct value to -1.0, intending to indicate that the entries
are all distinct. But what this value actually means is that the number
of distinct values is 100% of the table's rowcount, and thus it was
overestimating the number of distinct values by however many nulls there
are. This could lead to very poor selectivity estimates, as for example
in a recent report from Andreas Joseph Krogh. We should discount the
stadistinct value by whatever we've estimated the nulls fraction to be.
(That is what will happen if we choose to use a negative stadistinct for
a column that does have repeated entries, so this code path was just
inconsistent.)
In addition to fixing the stadistinct entries stored by several different
ANALYZE code paths, adjust the logic where get_variable_numdistinct()
forces an "all distinct" estimate on the basis of finding a relevant unique
index. Unique indexes don't reject nulls, so there's no reason to assume
that the null fraction doesn't apply.
Back-patch to all supported branches. Back-patching is a bit of a judgment
call, but this problem seems to affect only a few users (else we'd have
identified it long ago), and it's bad enough when it does happen that
destabilizing plan choices in a worse direction seems unlikely.
Patch by me, with documentation wording suggested by Dean Rasheed
Report: <VisenaEmail.26.df42f82acae38a58.156463942b8@tc7-visena>
Discussion: <16143.1470350371@sss.pgh.pa.us>
This is required for the result to be a legal tsvector value.
Noted while fooling with Andreas Seltenreich's ts_delete() crash.
Discussion: <87invhoj6e.fsf@credativ.de>
The help message for pg_basebackup specifies that the numbers 0 through 9
are accepted as valid values of -Z option. But, previously -Z 0 was rejected
as an invalid compression level.
Per discussion, it's better to make pg_basebackup treat 0 as valid
compression level meaning no compression, like pg_dump.
Back-patch to all supported versions.
Reported-By: Jeff Janes
Reviewed-By: Amit Kapila
Discussion: CAMkU=1x+GwjSayc57v6w87ij6iRGFWt=hVfM0B64b1_bPVKRqg@mail.gmail.com
This text was added by commit ff213239c, and not long thereafter obsoleted
by commit 4adc2f72a (which made the test depend on NBuffers instead); but
nobody noticed the need for an update. Commit 9563d5b5e adds some further
dependency on maintenance_work_mem, but the existing verbiage seems to
cover that with about as much precision as we really want here. Let's
just take it all out rather than leaving ourselves open to more errors of
omission in future. (That solution makes this change back-patchable, too.)
Noted by Peter Geoghegan.
Discussion: <CAM3SWZRVANbj9GA9j40fAwheQCZQtSwqTN1GBTVwRrRbmSf7cg@mail.gmail.com>
The docs failed to explain that LIKE INCLUDING INDEXES would not preserve
the names of indexes and associated constraints. Also, it wasn't mentioned
that EXCLUDE constraints would be copied by this option. The latter
oversight seems enough of a documentation bug to justify back-patching.
In passing, do some minor copy-editing in the same area, and add an entry
for LIKE under "Compatibility", since it's not exactly a faithful
implementation of the standard's feature.
Discussion: <20160728151154.AABE64016B@smtp.hushmail.com>
The description of udt_privileges view contained an incorrect copy-pasted word.
Back-patch to 9.2 where udt_privileges view was added.
Author: Alexander Law
The SQL standard appears to specify that IS [NOT] NULL's tests of field
nullness are non-recursive, ie, we shouldn't consider that a composite
field with value ROW(NULL,NULL) is null for this purpose.
ExecEvalNullTest got this right, but eval_const_expressions did not,
leading to weird inconsistencies depending on whether the expression
was such that the planner could apply constant folding.
Also, adjust the docs to mention that IS [NOT] DISTINCT FROM NULL can be
used as a substitute test if a simple null check is wanted for a rowtype
argument. That motivated reordering things so that IS [NOT] DISTINCT FROM
is described before IS [NOT] NULL. In HEAD, I went a bit further and added
a table showing all the comparison-related predicates.
Per bug #14235. Back-patch to all supported branches, since it's certainly
undesirable that constant-folding should change the semantics.
Report and patch by Andrew Gierth; assorted wordsmithing and revised
regression test cases by me.
Report: <20160708024746.1410.57282@wrigleys.postgresql.org>
These functions were added in commits fbe5a3fb7 and a104a017f,
but commit 45639a052 removed their only callers. Put the related
code in foreign.c back to the way it was in 9.5, to avoid pointless
cross-version diffs.
Etsuro Fujita
Patch: <d674a3f1-6b63-519c-ef3f-f3188ed6a178@lab.ntt.co.jp>
9.4 added a second description of GET DIAGNOSTICS that was totally
independent of the existing one, resulting in each description lying to the
extent that it claimed the set of status items it described was complete.
Fix that, and do some minor markup improvement.
Also some other small fixes per bug #14258 from Dilian Palauzov.
Discussion: <20160718181437.1414.40802@wrigleys.postgresql.org>
brin-extensibility-inclusion-table was confused in places about the
difference between strategy 4 (RTOverRight) and strategy 5 (RTRight).
Alexander Law
To ensure that "make installcheck" can be used safely against an existing
installation, we need to be careful about what global object names
(database, role, and tablespace names) we use; otherwise we might
accidentally clobber important objects. There's been a weak consensus that
test databases should have names including "regression", and that test role
names should start with "regress_", but we didn't have any particular rule
about tablespace names; and neither of the other rules was followed with
any consistency either.
This commit moves us a long way towards having a hard-and-fast rule that
regression test databases must have names including "regression", and that
test role and tablespace names must start with "regress_". It's not
completely there because I did not touch some test cases in rolenames.sql
that test creation of special role names like "session_user". That will
require some rethinking of exactly what we want to test, whereas the intent
of this patch is just to hit all the cases in which the needed renamings
are cosmetic.
There is no enforcement mechanism in this patch either, but if we don't
add one we can expect that the tests will soon be violating the convention
again. Again, that's not such a cosmetic change and it will require
discussion. (But I did use a quick-hack enforcement patch to find these
cases.)
Discussion: <16638.1468620817@sss.pgh.pa.us>
For some reason this option wasn't discussed at all in client-auth.sgml.
Document it there, and be more explicit about its relationship to the
"cert" authentication method. Per gripe from Srikanth Venkatesh.
I failed to resist the temptation to do some minor wordsmithing in the
same area, too.
Discussion: <20160713110357.1410.30407@wrigleys.postgresql.org>
Clarify that the reason for recommending that pg_temp be put last is to
prevent temporary tables from capturing unqualified table names. Per
discussion with Albe Laurenz.
Discussion: <A737B7A37273E048B164557ADEF4A58B5386C6E1@ntex2010i.host.magwien.gv.at>
Add display of proparallel (parallel-safety) when the server is >= 9.6,
and display of proacl (access privileges) for all server versions.
Minor tweak of column ordering to keep related columns together.
Michael Paquier
Discussion: <CAB7nPqTR3Vu3xKOZOYqSm-+bSZV0kqgeGAXD6w5GLbkbfd5Q6w@mail.gmail.com>
temp_file_limit is a per-process limit, not a per-session limit across
all cooperating parallel processes; change wording accordingly, per a
suggestion from Tom Lane.
Also, document under max_parallel_workers_per_gather the fact that each
process involved in a parallel query may use as many resources as a
separate session. Caveat emptor.
Per a complaint from Peter Geoghegan.
Per discussion on pgsql-hackers, conninfo is better as the column name
because it's more commonly used in PostgreSQL.
Catalog version bumped due to the change of pg_proc.
Author: Michael Paquier
Document that index storage is dependent on the operating system's
collation library ordering, and any change in that ordering can create
invalid indexes.
Discussion: 20160617154311.GB19359@momjian.us
Backpatch-through: 9.1
In the previous design, the GetForeignUpperPaths FDW callback hook was
called before we got around to labeling upper relations with the proper
consider_parallel flag; this meant that any upper paths created by an FDW
would be marked not-parallel-safe. While that's probably just as well
right now, we aren't going to want it to be true forever. Hence, abandon
the idea that FDWs should be allowed to inject upper paths before the core
code has gotten around to creating the relevant upper relation. (Well,
actually they still can, but it's on their own heads how well it works.)
Instead, adopt the same API already designed for create_upper_paths_hook:
we call GetForeignUpperPaths after each upperrel has been created and
populated with the paths the core planner knows how to make.
Commit b1a9bad9e7 introduced a stats view to provide insight into the
running WAL receiver, but neglected to include the connection string in
it, as reported by Michaël Paquier. This commit fixes that omission.
(Any security-sensitive information is not disclosed).
While at it, close the mild security hole that we were exposing the
password in the connection string in shared memory. This isn't
user-accessible, but it still looks like a good idea to avoid having the
cleartext password in memory.
Author: Michaël Paquier, Álvaro Herrera
Review by: Vik Fearing
Discussion: https://www.postgresql.org/message-id/CAB7nPqStg4M561obo7ryZ5G+fUydG4v1Ajs1xZT1ujtu+woRag@mail.gmail.com
Fix some now-obsolete statements that were overlooked in commits
6734a1cac, 3dbbd0f02, 028350f61. Document the behavior of <0>.
Also do a little bit of rearranging and copy-editing for clarity.
Add a section to xaggr.sgml, as we have done in the past for other
extensions to the aggregation functionality. Assorted wordsmithing
and other minor improvements.
David Rowley and Tom Lane
The original specification for this called for the deserialization function
to have signature "deserialize(serialtype) returns transtype", which is a
security violation if transtype is INTERNAL (which it always would be in
practice) and serialtype is not (which ditto). The patch blithely overrode
the opr_sanity check for that, which was sloppy-enough work in itself,
but the indisputable reason this cannot be allowed to stand is that CREATE
FUNCTION will reject such a signature and thus it'd be impossible for
extensions to create parallelizable aggregates.
The minimum fix to make the signature type-safe is to add a second, dummy
argument of type INTERNAL. But to lock it down a bit more and make misuse
of INTERNAL-accepting functions less likely, let's get rid of the ability
to specify a "serialtype" for an aggregate and just say that the only
useful serialtype is BYTEA --- which, in practice, is the only interesting
value anyway, due to the usefulness of the send/recv infrastructure for
this purpose. That means we only have to allow "serialize(internal)
returns bytea" and "deserialize(bytea, internal) returns internal" as
the signatures for these support functions.
In passing fix bogus signature of int4_avg_combine, which I found thanks
to adding an opr_sanity check on combinefunc signatures.
catversion bump due to removing pg_aggregate.aggserialtype and adjusting
signatures of assorted built-in functions.
David Rowley and Tom Lane
Discussion: <27247.1466185504@sss.pgh.pa.us>
If there's anyplace in our SGML docs that explains this behavior, I can't
find it right at the moment. Add an explanation in "Dependency Tracking"
which seems like the authoritative place for such a discussion. Per
gripe from Michelle Schwan.
While at it, update this section's example of a dependency-related
error message: they last looked like that in 8.3. And remove the
explanation of dependency updates from pre-7.3 installations, which
is probably no longer worth anybody's brain cells to read.
The bogus error message example seems like an actual documentation bug,
so back-patch to all supported branches.
Discussion: <20160620160047.5792.49827@wrigleys.postgresql.org>
Dilian Palauzov pointed out in bug #14201 that the docs failed to mention
the possibility of %R producing '(' due to an unmatched parenthesis.
He proposed just adding that in the same style as the other options were
listed; but it seemed to me that the sentence was already nearly
unintelligible, so I rewrote it a bit more extensively.
Report: <20160619121113.5789.68274@wrigleys.postgresql.org>
This requires some core changes as well so that we can properly
WAL-log the truncation. Specifically, it changes the format of the
XLOG_SMGR_TRUNCATE WAL record, so bump XLOG_PAGE_MAGIC.
Patch by me, reviewed but not fully endorsed by Andres Freund.
If you really want to vacuum every single page in the relation,
regardless of apparent visibility status or anything else, you can use
this option. In previous releases, this behavior could be achieved
using VACUUM (FREEZE), but because we can now recognize all-frozen
pages as not needing to be frozen again, that no longer works. There
should be no need for routine use of this option, but maybe bugs or
disaster recovery will necessitate its use.
Patch by me, reviewed by Andres Freund.
The main point of doing this is to allow the cutoff to be set very small,
even zero, to allow parallel-query behavior to be tested on relatively
small tables such as we typically use in the regression tests. But it
might be of use to users too. The number-of-workers scaling behavior in
create_plain_partial_paths() is pretty ad-hoc and subject to change, so
we won't expose anything about that, but the notion of not considering
parallel query at all for tables below size X seems reasonably stable.
Amit Kapila, per a suggestion from me
Discussion: <17170.1465830165@sss.pgh.pa.us>
The new pg_check_visible() and pg_check_frozen() functions can be used to
verify that the visibility map bits for a relation's data pages match the
actual state of the tuples on those pages.
Amit Kapila and Robert Haas, reviewed (in earlier versions) by Andres
Freund. Additional testing help by Thomas Munro.
While beneficial, both for throughput and average/worst case latency, in
a significant number of workloads, there are other workloads in which
backend_flush_after can cause significant performance regressions in
comparison to < 9.6 releases. The regression is most likely when the hot
data set is bigger than shared buffers, but significantly smaller than
the operating system's page cache.
I personally think that the benefit of enabling backend flush control is
considerably bigger than the potential downsides, but a fair argument
can be made that not regressing is more important than improving
performance/latency. As the latter is the consensus, change the default
to 0.
The other settings introduced in 428b1d6b2 do not have the same
potential for regressions, so leave them enabled.
Benchmarks leading up to changing the default have been performed by
Mithun Cy, Ashutosh Sharma and Robert Haas.
Discussion: CAD__OuhPmc6XH=wYRm_+Q657yQE88DakN4=Ybh2oveFasHkoeA@mail.gmail.com
Document these as "nearest integer >= argument" and "nearest integer <=
argument", which will hopefully be less confusing than the old formulation.
New wording is from Matlab via Dean Rasheed.
I changed the pg_description entries as well as the SGML docs. In the
back branches, this will only affect installations initdb'd in the future,
but it should be harmless otherwise.
Discussion: <CAEZATCW3yzJo-NMSiQs5jXNFbTsCEftZS-Og8=FvFdiU+kYuSA@mail.gmail.com>
This terminology provoked widespread complaints. So, instead, rename
the GUC max_parallel_degree to max_parallel_workers_per_gather
(leaving room for a possible future GUC max_parallel_workers that acts
as a system-wide limit), and rename the parallel_degree reloption to
parallel_workers. Rename structure members to match.
These changes create a dump/restore hazard for users of PostgreSQL
9.6beta1 who have set the reloption (or applied the GUC using ALTER
USER or ALTER DATABASE).
Fix grammar, improve examples, etc.
I did not attempt to document the current behavior concerning distance-zero
matches, because I think that's broken and needs to change, so I'm not
going to use up brain cells figuring out how to explain how it works now.
One way or the other, there's still more to write here.
Commit 6820094d1 mixed up types of parent object (table) with type of
sub-object being commented on. Noticed while fixing docs for
COMMENT ON ACCESS METHOD.
Backpatch to 9.5, like that commit.
It was previously suggested that "esoteric" operations such as creating
a new access method would require direct manipulation of the system
catalogs, but that example has gone away, and I can't think of a new one
to replace it, so just put in some weasel wording.
Mostly these are just comments but there are a few in documentation
and a handful in code and tests. Hopefully this doesn't cause too much
unnecessary pain for backpatching. I relented from some of the most
common like "thru" for that reason. The rest don't seem numerous
enough to cause problems.
Thanks to Kevin Lyda's tool https://pypi.python.org/pypi/misspellings
Per discussion, this is a more understandable and future-proof way of
exposing the setting to users. On-disk, we can still store it in words,
so as to not break on-disk compatibility with beta1.
Along the way, clean up the code associated with Bloom reloptions.
Provide explicit macros for default and maximum lengths rather than
having magic numbers buried in multiple places in the code. Drop
the adjustBloomOptions() code altogether: it was useless in view of
the fact that reloptions.c already performed default-substitution and
range checking for the options. Rename a couple of macros and types
for more clarity.
Discussion: <23767.1464926580@sss.pgh.pa.us>
Oracle recommends using VARCHAR2 not VARCHAR, allegedly because they might
someday change VARCHAR to be spec-compliant about distinguishing null from
empty string. (I'm not holding my breath, though.) Our examples of PL/SQL
code were using VARCHAR, which while not wrong is missing the pedagogical
opportunity to talk about converting Oracle type names to Postgres. So
switch the examples to use VARCHAR2, and add some text about what to do
with common Oracle type names like VARCHAR2 and NUMBER. (There is probably
more to be said here, but those are the ones I'm sure about offhand.)
Per suggestion from rapg12@gmail.com.
Discussion: <20160521140046.22591.24672@wrigleys.postgresql.org>
Correct obsolete install instructions, as noted by Daniel Gustafsson.
Clarify the test code's prerequisites.
Discussion: <88E617F2-7721-4C4E-84F4-886A2041C1D0@yesql.se>
David Johnston pointed out that the original text here had been obsoleted
by SQL:2008, which allowed ORDER BY in subqueries. We could weaken the
text to describe ORDER-BY-in-subqueries as an optional SQL feature that's
possibly unportable; but then the exact same statements would apply to
the alternative it's being compared to (ORDER-BY-in-aggregate-calls).
So really that would be pretty useless; let's just take out the sentence
entirely. Instead, point out the hazard that any extra processing in the
upper query might cause the subquery output order to be destroyed.
Discussion: <CAKFQuwbAX=iO9QbpN7_jr+BnUWm9FYX8WbEPUvG0p+nZhp6TZg@mail.gmail.com>
We didn't have any real user documentation about how index-only scans
work or how to design indexes to exploit them. Remedy that.
Per gripe from David Johnston.
In the documentation for nextval(), point out explicitly that INSERT ...
ON CONFLICT will call nextval() if needed for the insertion case, whether
or not it ends up following the ON CONFLICT path. This seems to be a
matter of some confusion, cf bug #14126, so let's be clear about it.
Also mention the issue in the CREATE SEQUENCE reference page, since that
is another place where people might expect such things to be covered.
Minor wording improvements nearby, as well.
Back-patch to 9.5 where ON CONFLICT was introduced.
Describe compat_realm = 0 as "disabled" not "enabled", per discussion
with Christian Ullrich. I failed to resist the temptation to do some
other minor copy-editing in the same area.
Hash indexes are not WAL-logged, and so do not maintain the LSN of
index pages. Since the "snapshot too old" feature counts on
detecting error conditions using the LSN of a table and all indexes
on it, this makes it impossible to safely do early vacuuming on any
table with a hash index, so add this to the tests for whether the
xid used to vacuum a table can be adjusted based on
old_snapshot_threshold.
While at it, add a paragraph to the docs for old_snapshot_threshold
which specifically mentions this and other aspects of the feature
which may otherwise surprise users.
Problem reported and patch reviewed by Amit Kapila