In 9.5 and master there is no need to support legacy truncation. This is
just committed separately to make it easier to backpatch the WAL logged
multixact truncation to 9.3 and 9.4 if we later decide to do so.
I bumped master's magic from 0xD086 to 0xD088 and 9.5's from 0xD085 to
0xD087 to avoid 9.5 reusing a value that has been in use on master while
keeping the numbers increasing between major versions.
Discussion: 20150621192409.GA4797@alap3.anarazel.de
Backpatch: 9.5
The fact that multixact truncations are not WAL logged has caused a fair
share of problems. Amongst others it requires to do computations during
recovery while the database is not in a consistent state, delaying
truncations till checkpoints, and handling members being truncated, but
offset not.
We tried to put bandaids on lots of these issues over the last years,
but it seems time to change course. Thus this patch introduces WAL
logging for multixact truncations.
This allows:
1) to perform the truncation directly during VACUUM, instead of delaying it
to the checkpoint.
2) to avoid looking at the offsets SLRU for truncation during recovery,
we can just use the master's values.
3) simplify a fair amount of logic to keep in memory limits straight,
this has gotten much easier
During the course of fixing this a bunch of additional bugs had to be
fixed:
1) Data was not purged from memory the member's SLRU before deleting
segments. This happened to be hard or impossible to hit due to the
interlock between checkpoints and truncation.
2) find_multixact_start() relied on SimpleLruDoesPhysicalPageExist - but
that doesn't work for offsets that haven't yet been flushed to
disk. Add code to flush the SLRUs to fix. Not pretty, but it feels
slightly safer to only make decisions based on actual on-disk state.
3) find_multixact_start() could be called concurrently with a truncation
and thus fail. Via SetOffsetVacuumLimit() that could lead to a round
of emergency vacuuming. The problem remains in
pg_get_multixact_members(), but that's quite harmless.
For now this is going to only get applied to 9.5+, leaving the issues in
the older branches in place. It is quite possible that we need to
backpatch at a later point though.
For the case this gets backpatched we need to handle that an updated
standby may be replaying WAL from a not-yet upgraded primary. We have to
recognize that situation and use "old style" truncation (i.e. looking at
the SLRUs) during WAL replay. In contrast to before, this now happens in
the startup process, when replaying a checkpoint record, instead of the
checkpointer. Doing truncation in the restartpoint is incorrect, they
can happen much later than the original checkpoint, thereby leading to
wraparound. To avoid "multixact_redo: unknown op code 48" errors
standbys would have to be upgraded before primaries.
A later patch will bump the WAL page magic, and remove the legacy
truncation codepaths. Legacy truncation support is just included to make
a possible future backpatch easier.
Discussion: 20150621192409.GA4797@alap3.anarazel.de
Reviewed-By: Robert Haas, Alvaro Herrera, Thomas Munro
Backpatch: 9.5 for now
This replaces ill-fated commit 5ddc72887a,
which was reverted because it broke active uses of FK cache entries. In
this patch, we still do nothing more to invalidatable cache entries than
mark them as needing revalidation, so we won't break active uses. To keep
down the overhead of InvalidateConstraintCacheCallBack(), keep a list of
just the currently-valid cache entries. (The entries are large enough that
some added space for list links doesn't seem like a big problem.) This
would still be O(N^2) when there are many valid entries, though, so when
the list gets too long, just force the "sinval reset" behavior to remove
everything from the list. I set the threshold at 1000 entries, somewhat
arbitrarily. Possibly that could be fine-tuned later. Another item for
future study is whether it's worth adding reference counting so that we
could safely remove invalidated entries. As-is, problem cases are likely
to end up with large and mostly invalid FK caches.
Like the previous attempt, backpatch to 9.3.
Jan Wieck and Tom Lane
(Third time's the charm, I hope.)
Additional testing disclosed that this code could mangle already-localized
output from the "money" datatype. We can't very easily skip applying it
to "money" values, because the logic is tied to column right-justification
and people expect "money" output to be right-justified. Short of
decoupling that, we can fix it in what should be a safe enough way by
testing to make sure the string doesn't contain any characters that would
not be expected in plain numeric output.
On closer inspection, those seemingly redundant atoi() calls were not so
much inefficient as just plain wrong: the author of this code either had
not read, or had not understood, the POSIX specification for localeconv().
The grouping field is *not* a textual digit string but separate integers
encoded as chars.
We'll follow the existing code as well as the backend's cash.c in only
honoring the first group width, but let's at least honor it correctly.
This doesn't actually result in any behavioral change in any of the
locales I have installed on my Linux box, which may explain why nobody's
complained; grouping width 3 is close enough to universal that it's barely
worth considering other cases. Still, wrong is wrong, so back-patch.
This code did the wrong thing entirely for numbers with an exponent
but no decimal point (e.g., '1e6'), as reported by Jeff Janes in
bug #13636. More generally, it made lots of unverified assumptions
about what the input string could possibly look like. Rearrange so
that it only fools with leading digits that it's directly verified
are there, and an immediately adjacent decimal point. While at it,
get rid of some useless inefficiencies, like converting the grouping
count string to integer over and over (and over).
This has been broken for a long time, so back-patch to all supported
branches.
Previously, a function call appearing at the top level of WHERE had a
hard-wired selectivity estimate of 0.3333333, a kludge conveniently dated
in the source code itself to July 1992. The expectation at the time was
that somebody would soon implement estimator support functions analogous
to those for operators; but no such code has appeared, nor does it seem
likely to in the near future. We do have an alternative solution though,
at least for immutable functions on single relations: creating an
expression index on the function call will allow ANALYZE to gather stats
about the function's selectivity. But the code in clause_selectivity()
failed to make use of such data even if it exists.
Refactor so that that will happen. I chose to make it try this technique
for any clause type for which clause_selectivity() doesn't have a special
case, not just functions. To avoid adding unnecessary overhead in the
common case where we don't learn anything new, make selfuncs.c provide an
API that hooks directly to examine_variable() and then var_eq_const(),
rather than the previous coding which laboriously constructed an OpExpr
only so that it could be expensively deconstructed again.
I preserved the behavior that the default estimate for a function call
is 0.3333333. (For any other expression node type, it's 0.5, as before.)
I had originally thought to make the default be 0.5 across the board, but
changing a default estimate that's survived for twenty-three years seems
like something not to do without a lot more testing than I care to put
into it right now.
Per a complaint from Jehan-Guillaume de Rorthais. Back-patch into 9.5,
but not further, at least for the moment.
The comments here stated that this was just in case we ever had an
ALTER OPERATOR command that could remap an operator to a different
function. But those comments have been here for a long time, and no
such command has come about. In the absence of such a feature,
forcing the pg_proc OID to be looked up again each time we reread a
stored rule or similar is just a waste of cycles. Moreover, parallel
query needs a way to reread the exact same node tree that was written
out, not one that has been slightly stomped on. So just get rid of
this for now.
Per discussion with Tom Lane.
Previously pg_controldata didn't report newestCommitTs and this was
an oversight in commit 73c986a.
Also this patch changes pg_resetxlog so that it uses the same sentences
as pg_controldata does, regarding oldestCommitTs and newestCommitTs,
for the sake of consistency.
Back-patch to 9.5 where track_commit_timestamp was added.
Euler Taveira
The old minimum values are rather large, making it time consuming to
test related behaviour. Additionally the current limits, especially for
multixacts, can be problematic in space-constrained systems. 10000000
multixacts can contain a lot of members.
Since there's no good reason for the current limits, lower them a good
bit. Setting them to 0 would be a bad idea, triggering endless vacuums,
so still retain a limit.
While at it fix autovacuum_multixact_freeze_max_age to refer to
multixact.c instead of varsup.c.
Reviewed-By: Robert Haas
Discussion: CA+TgmoYmQPHcrc3GSs7vwvrbTkbcGD9Gik=OztbDGGrovkkEzQ@mail.gmail.com
Backpatch: back to 9.0 (in parts)
Previously, ANALYZE simply ignored columns of datatypes that have neither
a btree nor hash opclass (which means they have no recognized equality
operator). Without a notion of equality, we can't identify most-common
values nor estimate the number of distinct values. But we can still
count nulls and compute the average physical column width, and those
stats might be of value. Moreover there are some tools out there that
don't work so well if rows are missing from pg_statistic. So let's
add suitable logic for this case.
While this is arguably a bug fix, it also has the potential to change
query plans, and the gain seems not worth taking a risk of that in
stable branches. So back-patch into 9.5 but not further.
Oleksandr Shulgin, rewritten a bit by me.
For parallel query, we need to be able to pass a Plan to a worker, so
that it knows what it's supposed to do. We could invent our own way
of serializing plans for that purpose, but piggybacking on the
existing node infrastructure seems like a much better idea.
Initially, we'll probably only support a limited number of nodes
within parallel workers, but this commit adds support for everything
in plannodes.h except CustomScan, because doing it all at once seems
easier than doing it piecemeal, and it makes testing this code easier,
too. CustomScan is excluded because making that work requires a
larger rework of that facility.
Amit Kapila, reviewed and slightly revised by me.
It's declared as being an array of bool, but it's printed
differently from the way bool and arrays of bool are handled
elsewhere.
Patch by Amit Kapila. Anomaly noted independently by Amit Kapila
and KaiGai Kohei.
Commit e956808328 introduces adding pages
to FSM for ordinary insert, but autoanalyze was able just cleanup
pending list without adding to FSM.
Also fix double call of IndexFreeSpaceMapVacuum() during ginvacuumcleanup()
Report from Fujii Masao
Patch by me
Review by Jeff Janes
This logic was missing from ExplainPreScanNode, from which I derived
planstate_tree_walker. But it shouldn't be missing, especially not
from a generic walker function, so add it.
KaiGai Kohei
mul_var() postpones propagating carries until it risks overflow in its
internal digit array. However, the logic failed to account for the
possibility of overflow in the carry propagation step, allowing wrong
results to be generated in corner cases. We must slightly reduce the
when-to-propagate-carries threshold to avoid that.
Discovered and fixed by Dean Rasheed, with small adjustments by me.
This has been wrong since commit d72f6c7503,
so back-patch to all supported branches.
This commit's parent made superfluous the bit's sole usage. Referential
integrity checks have long run as the subject table's owner, and that
now implies RLS bypass. Safe use of the bit was tricky, requiring
strict control over the SQL expressions evaluating therein. Back-patch
to 9.5, where the bit was introduced.
Based on a patch by Stephen Frost.
Every query of a single ENABLE ROW SECURITY table has two meanings, with
the row_security GUC selecting between them. With row_security=force
available, every function author would have been advised to either set
the GUC locally or test both meanings. Non-compliance would have
threatened reliability and, for SECURITY DEFINER functions, security.
Authors already face an obligation to account for search_path, and we
should not mimic that example. With this change, only BYPASSRLS roles
need exercise the aforementioned care. Back-patch to 9.5, where the
row_security GUC was introduced.
Since this narrows the domain of pg_db_role_setting.setconfig and
pg_proc.proconfig, one might bump catversion. A row_security=force
setting in one of those columns will elicit a clear message, so don't.
RemoveLocalLock() must consider the possibility that LockAcquireExtended()
failed to palloc the initial space for a locallock's lockOwners array.
I had evidently meant to cope with this hazard when the code was originally
written (commit 1785acebf2), but missed that
the pfree needed to be protected with an if-test. Just to make sure things
are left in a clean state, reset numLockOwners as well.
Per low-memory testing by Andreas Seltenreich. Back-patch to all supported
branches.
BSD find is not very smart and ends up writing double slashes into the
output in those cases. Also, xgettext is not very smart and splits the
file names incorrectly in those cases, resulting in slightly incorrect
file names being written into the POT file.
The shm_mq mechanism was built to send error (and notice) messages and
tuples between backends. However, shm_mq itself only deals in raw
bytes. Since commit 2bd9e412f9, we have
had infrastructure for one message to redirect protocol messages to a
queue and for another backend to parse them and do useful things with
them. This commit introduces a somewhat analogous facility for tuples
by adding a new type of DestReceiver, DestTupleQueue, which writes
each tuple generated by a query into a shm_mq, and a new
TupleQueueFunnel facility which reads raw tuples out of the queue and
reconstructs the HeapTuple format expected by the executor.
The TupleQueueFunnel abstraction supports reading from multiple tuple
streams at the same time, but only in round-robin fashion. Someone
could imaginably want other policies, but this should be good enough
to meet our short-term needs related to parallel query, and we can
always extend it later.
This also makes one minor addition to the shm_mq API that didn'
seem worth breaking out as a separate patch.
Extracted from Amit Kapila's parallel sequential scan patch. This
code was originally written by me, and then it was revised by Amit,
and then it was revised some more by me.
These functions have been looking up type info for every row they
process. Instead of doing that we only look them up the first time
through and stash the information in the aggregate state object.
Affects json_agg, json_object_agg, jsonb_agg and jsonb_object_agg.
There is plenty more work to do in making these more efficient,
especially the jsonb functions, but this is a virtually cost free
improvement that can be done right away.
Backpatch to 9.5 where the jsonb variants were introduced.
After an internal failure in shortest() or longest() while pinning down the
exact location of a match, find() forgot to free the DFA structure before
returning. This is pretty unlikely to occur, since we just successfully
ran the "search" variant of the DFA; but it could happen, and it would
result in a session-lifespan memory leak since this code uses malloc()
directly. Problem seems to have been aboriginal in Spencer's library,
so back-patch all the way.
In passing, correct a thinko in a comment I added awhile back about the
meaning of the "ntree" field.
I happened across these issues while comparing our code to Tcl's version
of the library.
Use IsBinaryCoercible() method instead of custom
is_expected_type/is_text_type functions which was introduced when tsearch2
was moved into core.
Per report by David E. Wheeler
Analysis by Tom Lane
Patch by me
This setting contains extra configuration for the temp instance, as used
in pg_regress' --temp-config flag.
Backpatch to 9.2 where test.sh was introduced.
ExplainPreScanNode knows how to iterate over a generic tree of plan
states; factor that logic out into a separate walker function so that
other code, such as upcoming patches for parallel query, can also use
it.
Patch by me, reviewed by Tom Lane.
Commit 013ebc0a7b introduces microvacuum for
GiST, deletetion of tuple marked LP_DEAD uses IndexPageMultiDelete while
recovery code uses IndexPageTupleDelete in loop. This causes a difference
in offset numbers of tuples to delete. Patch introduces usage of
IndexPageMultiDelete in GiST except gistplacetopage() where only one tuple is
deleted at once. That also slightly improve performance, because
IndexPageMultiDelete is more effective.
Patch changes WAL format, so bump wal page magic.
Bug report from Jeff Janes
Diagnostic and patch by Anastasia Lubennikova and me
Commit 924bcf4f16 introduced a framework
for parallel computation in PostgreSQL that makes most but not all
built-in functions safe to execute in parallel mode. In order to have
parallel query, we'll need to be able to determine whether that query
contains functions (either built-in or user-defined) that cannot be
safely executed in parallel mode. This requires those functions to be
labeled, so this patch introduces an infrastructure for that. Some
functions currently labeled as safe may need to be revised depending on
how pending issues related to heavyweight locking under paralllelism
are resolved.
Parallel plans can't be used except for the case where the query will
run to completion. If portal execution were suspended, the parallel
mode restrictions would need to remain in effect during that time, but
that might make other queries fail. Therefore, this patch introduces
a framework that enables consideration of parallel plans only when it
is known that the plan will be run to completion. This probably needs
some refinement; for example, at bind time, we do not know whether a
query run via the extended protocol will be execution to completion or
run with a limited fetch count. Having the client indicate its
intentions at bind time would constitute a wire protocol break. Some
contexts in which parallel mode would be safe are not adjusted by this
patch; the default is not to try parallel plans except from call sites
that have been updated to say that such plans are OK.
This commit doesn't introduce any parallel paths or plans; it just
provides a way to determine whether they could potentially be used.
I'm committing it on the theory that the remaining parallel sequential
scan patches will also get committed to this release, hopefully in the
not-too-distant future.
Robert Haas and Amit Kapila. Reviewed (in earlier versions) by Noah
Misch.
Sync our regex code with upstream changes since last time we did this,
which was Tcl 8.5.11 (see commit 08fd6ff37f).
The only functional change here is to disbelieve that an octal escape is
three digits long if it would exceed \377. That's a bug fix, but it's
a minor one and could change the interpretation of working regexes, so
don't back-patch.
In addition to that, s/INFINITY/DUPINF/ to eliminate the risk of collisions
with <math.h>'s macro, and s/LOCAL/NOPROP/ because that also seems like
an unnecessarily collision-prone macro name.
There were some other cosmetic changes in their copy that I did not adopt,
notably a rather half-hearted attempt at renaming some of the C functions
in a more verbose style. (I'm not necessarily against the concept, but
renaming just a few functions in the package is not an improvement.)
This patch adds an option to replace the "time since pgbench run
started" with a Unix epoch timestamp in the progress report so that,
for instance, it is easier to compare timelines with pgsql log
Fabien COELHO <coelho@cri.ensmp.fr>
For the UPDATE/DELETE RETURNING case, filter the records which are not
visible to the user through ALL or SELECT policies from those considered
for the UPDATE or DELETE. This is similar to how the GRANT system
works, which prevents RETURNING unless the caller has SELECT rights on
the relation.
Per discussion with Robert, Dean, Tom, and Kevin.
Back-patch to 9.5 where RLS was introduced.
This refactors rewrite/rowsecurity.c to simplify the handling of the
default deny case (reducing the number of places where we check for and
add the default deny policy from three to one) by splitting up the
retrival of the policies from the application of them.
This also allowed us to do away with the policy_id field. A policy_name
field was added for WithCheckOption policies and is used in error
reporting, when available.
Patch by Dean Rasheed, with various mostly cosmetic changes by me.
Back-patch to 9.5 where RLS was introduced to avoid unnecessary
differences, since we're still in alpha, per discussion with Robert.
Commit 5ddc72887a does not actually work
because it will happily blow away ri_constraint_cache entries that are
in active use in outer call levels. In any case, it's a very ugly,
brute-force solution to the problem of limiting the cache size.
Revert until it can be redesigned.
This patch changes the log message which is logged when the server
successfully renames backup_label file to *.old but fails to rename
tablespace_map file during the shutdown. Previously the WARNING
message "online backup mode was not canceled" was logged in that case.
However this message is confusing because the backup mode is treated
as canceled whenever backup_label is successfully renamed. So this
commit makes the server log the message "online backup mode canceled"
in that case.
Also this commit changes errdetail messages so that they follow the
error message style guide.
Back-patch to 9.5 where tablespace_map file is introduced.
Original patch by Amit Kapila, heavily modified by me.
Patch provides command line option --strict-names which requires that at
least one table/schema should present for each -t/-n option.
Pavel Stehule <pavel.stehule@gmail.com>
To prevent perverse results, we now only return the other operand if
it's not scalar, and if both operands are of the same kind (array or
object).
Original bug complaint and patch from Oskari Saarenmaa, extended by me
to cover the cases of different kinds of jsonb.
Backpatch to 9.5 where jsonb_concat was introduced.
Modify pg_dump to restore postgres/template1 databases to non-default
tablespaces by switching out of the database to be moved, then switching
back.
Also, to fix potentially cases where the old/new tablespaces might not
match, fix pg_upgrade to process new/old tablespaces separately in all
cases.
Report by Marti Raudsepp
Patch by Marti Raudsepp, me
Backpatch through 9.0
I think this particular branch is actually dead, but the analysis to
prove that is not trivial, so instead take the weasel way.
Reported by Jinyu Zhang
Backpatch to 9.5, where BRIN was introduced.
Commit 45ba424f improved foreign key lookups during bulk updates
when the FK value does not change. When restoring a schema dump
from a database with many (say 100,000) foreign keys, this cache
would grow very big and every ALTER TABLE command was causing an
InvalidateConstraintCacheCallBack(), which uses a sequential hash
table scan. This could cause a severe performance regression in
restoring a schema dump (including during pg_upgrade).
The patch uses a heuristic method of detecting when the hash table
should be destroyed and recreated.
InvalidateConstraintCacheCallBack() adds the current size of the
hash table to a counter. When that sum reaches 1,000,000, the hash
table is flushed. This fixes the regression without noticeable
harm to the bulk update use case.
Jan Wieck
Backpatch to 9.3 where the performance regression was introduced.
Naming the individual lwlocks seems like something that may be useful
for other types of debugging, monitoring, or instrumentation output,
but this commit just implements it for the specific case of
trace_lwlocks.
Patch by me, reviewed by Amit Kapila and Kyotaro Horiguchi
The "typo" alleged in commit 1e460d4bd was actually a comment that was
correct when written, but I missed updating it in commit b5282aa89.
Use a slightly less specific (and hopefully more future-proof) description
of what is collected. Back-patch to 9.2 where that commit appeared, and
revert the comment to its then-entirely-correct state before that.
The list-wrangling here was done wrong, allowing the same state to get
put into the list twice. The following loop then would clone it twice.
The second clone would wind up with no inarcs, so that there was no
observable misbehavior AFAICT, but a useless state in the finished NFA
isn't an especially good thing.
Mark index tuple as dead if it's pointed by kill_prior_tuple during
ordinary (search) scan and remove it during insert process if there is no
enough space for new tuple to insert. This improves select performance
because index will not return tuple marked as dead and improves insert
performance because it reduces number of page split.
Anastasia Lubennikova <a.lubennikova@postgrespro.ru> with
minor editorialization by me
This commit makes postmaster forcibly remove the files signaling
a standby promotion request. Otherwise, the existence of those files
can trigger a promotion too early, whether a user wants that or not.
This removal of files is usually unnecessary because they can exist
only during a few moments during a standby promotion. However
there is a race condition: if pg_ctl promote is executed and creates
the files during a promotion, the files can stay around even after
the server is brought up to new master. Then, if new standby starts
by using the backup taken from that master, the files can exist
at the server startup and should be removed in order to avoid
an unexpected promotion.
Back-patch to 9.1 where promote signal file was introduced.
Problem reported by Feike Steenbergen.
Original patch by Michael Paquier, modified by me.
Discussion: 20150528100705.4686.91426@wrigleys.postgresql.org
Even views considered "simple" enough to be automatically updatable may
have mulitple relations involved (eg: in a where clause). We need to
make sure and lock those relations when rewriting the query.
Back-patch to 9.3 where updatable views were added.
Pointed out by Andres, patch thanks to Dean Rasheed.
This was forgotten in 8a3631f (commit that originally added the parameter)
and 0ca9907 (commit that added the documentation later that year).
Back-patch to all supported versions.
Per discussion, nowadays it is possible to have tablespaces that have
wildly different I/O characteristics from others. Setting different
effective_io_concurrency parameters for those has been measured to
improve performance.
Author: Julien Rouhaud
Reviewed by: Andres Freund
If a transaction never reaches the standby, later tests find unexpected
cluster state. A "tail-copy: query result matches" test failure has
been the usual symptom. Among the buildfarm members having run this
test suite, most have exhibited that symptom at least once. Back-patch
to 9.5, where pg_rewind was introduced.
Michael Paquier, reported by Christoph Berg.
Commit f828654e introduced the 'n' option, but it invoked
gettimeofday() independently of the 'm' option. If both options were
in use (or multiple 'n' options), or if 'n' was in use along with
csvlog, then the reported times could be different for the same log
message.
To fix, initialize a global variable with gettimeofday() once per log
message, and use that for both formats.
Don't bother coordinating the time for the 't' option, which has much
lower resolution.
Per complaint by Alvaro Herrera.
Cleanup process could be called by ordinary insert/update and could take a lot
of time. Add vacuum_delay_point() to make this process interruptable. Under
vacuum this call will also throttle a vacuum process to decrease system load,
called from insert/update it will not throttle, and that reduces a latency.
Backpatch for all supported branches.
Jeff Janes <jeff.janes@gmail.com>
Add pages deleted from GIN's pending list during cleanup to free space map
immediately. Clean up process could be initiated by ordinary insert but adding
page to FSM might occur only at vacuum. On some workload like never-vacuumed
insert-only tables it could cause a huge bloat.
Jeff Janes <jeff.janes@gmail.com>
Since 6fcd885 it is possible to immediately reserve WAL when creating a
slot via pg_create_physical_replication_slot(). Extend the replication
protocol to allow that as well.
Although, in contrast to the SQL interface, it is possible to update the
reserved location via the replication interface, it is still useful
being able to reserve upon creation there. Otherwise the logic in
ReplicationSlotReserveWal() has to be repeated in slot employing
clients.
Author: Michael Paquier
Discussion: CAB7nPqT0Wc1W5mdYGeJ_wbutbwNN+3qgrFR64avXaQCiJMGaYA@mail.gmail.com
RESERV. RESERV is meant for tokens like "now" and having them in that
category throws errors like these when used as an input date:
stark=# SELECT 'doy'::timestamptz;
ERROR: unexpected dtype 33 while parsing timestamptz "doy"
LINE 1: SELECT 'doy'::timestamptz;
^
stark=# SELECT 'dow'::timestamptz;
ERROR: unexpected dtype 32 while parsing timestamptz "dow"
LINE 1: SELECT 'dow'::timestamptz;
^
Found by LLVM's Libfuzzer
This has been broken since 9.3 (commit 82b1b213ca to be exact),
which suggests that nobody is any longer using a Windows build system that
doesn't provide a symlink emulation. Still, it's wrong on its own terms,
so repair.
YUriy Zhuravlev
Remove the code in plpgsql that suppressed the innermost line of CONTEXT
for messages emitted by RAISE commands. That was never more than a quick
backwards-compatibility hack, and it's pretty silly in cases where the
RAISE is nested in several levels of function. What's more, it violated
our design theory that verbosity of error reports should be controlled
on the client side not the server side.
To alleviate the resulting noise increase, introduce a feature in libpq
and psql whereby the CONTEXT field of messages can be suppressed, either
always or only for non-error messages. Printing CONTEXT for errors only
is now their default behavior.
The actual code changes here are pretty small, but the effects on the
regression test outputs are widespread. I had to edit some of the
alternative expected outputs by hand; hopefully the buildfarm will soon
find anything I fat-fingered.
In passing, fix up (again) the output line counts in psql's various
help displays. Add some commentary about how to verify them.
Pavel Stehule, reviewed by Petr Jelínek, Jeevan Chalke, and others
If the number of heap blocks is not multiples of pages per range, the
summarizing produces wrong summary information for the last brin index
tuple while vacuuming.
Problem reported by Tatsuo Ishii and fixed by Amit Langote.
Discussion at "[HACKERS] BRIN INDEX value (message id :20150903.174935.1946402199422994347.t-ishii@sraoss.co.jp)
Backpatched to 9.5 in which brin index was added.
Formerly, we treated only portals created in the current subtransaction as
having failed during subtransaction abort. However, if the error occurred
while running a portal created in an outer subtransaction (ie, a cursor
declared before the last savepoint), that has to be considered broken too.
To allow reliable detection of which ones those are, add a bookkeeping
field to struct Portal that tracks the innermost subtransaction in which
each portal has actually been executed. (Without this, we'd end up
failing portals containing functions that had called the subtransaction,
thereby breaking plpgsql exception blocks completely.)
In addition, when we fail an outer-subtransaction Portal, transfer its
resources into the subtransaction's resource owner, so that they're
released early in cleanup of the subxact. This fixes a problem reported by
Jim Nasby in which a function executed in an outer-subtransaction cursor
could cause an Assert failure or crash by referencing a relation created
within the inner subtransaction.
The proximate cause of the Assert failure is that AtEOSubXact_RelationCache
assumed it could blow away a relcache entry without first checking that the
entry had zero refcount. That was a bad idea on its own terms, so add such
a check there, and to the similar coding in AtEOXact_RelationCache. This
provides an independent safety measure in case there are still ways to
provoke the situation despite the Portal-level changes.
This has been broken since subtransactions were invented, so back-patch
to all supported branches.
Tom Lane and Michael Paquier
Post-commit review by Andres Freund discovered a couple of concurrency
bugs in the original patch: specifically, if the leader cleared a
follower's XID before it reached PGSemaphoreLock, the semaphore would be
left in the wrong state; and if another process did PGSemaphoreUnlock
for some unrelated reason, we might resume execution before the fact
that our XID was cleared was globally visible.
Also, improve the wording of some comments, rename nextClearXidElem
to firstClearXidElem in PROC_HDR for clarity, and drop some volatile
qualifiers that aren't necessary.
Amit Kapila, reviewed and slightly revised by me.
The setting values of some parameters including max_worker_processes
must be equal to or higher than the values on the master. However,
previously max_worker_processes was not listed as such parameter
in the document. So this commit adds it to that list.
Back-patch to 9.4 where max_worker_processes was added.
Most suites already did so via start_test_server(), but the pg_rewind,
pg_ctl and pg_controldata suites ran a postmaster or initdb with fsync
enabled. This halves the pg_rewind suite's runtime on buildfarm member
tern. It makes tern and that machine's other buildfarm members less
vulnerable to noise failures from postmaster startup overrunning the 60s
pg_ctl timeout. Back-patch to 9.5, where pg_rewind was introduced.
Currently, in-memory posting list during GIN build process is limited 1GB
because of using repalloc. The patch replaces call of repalloc to repalloc_huge.
It increases limit of posting list from 180 millions
(1GB / sizeof(ItemPointerData)) to 4 billions limited by maxcount/count fields
in GinEntryAccumulator and subsequent calls. Check added.
Also, fix accounting of allocatedMemory during build to prevent integer
overflow with maintenance_work_mem > 4GB.
Robert Abraham <robert.abraham86@googlemail.com> with additions by me
Previously, if one background worker registered another background
worker and set bgw_notify_pid while for the second background worker,
it would not receive notifications from the postmaster unless, at the
time the "parent" was registered, BGWORKER_BACKEND_DATABASE_CONNECTION
was set.
To fix, instead instead of including only those background workers that
requested database connections in the postmater's BackendList, include
them all. There doesn't seem to be any reason not do this, and indeed
it removes a significant amount of duplicated code. The other option
is to make PostmasterMarkPIDForWorkerNotify look at BackgroundWorkerList
in addition to BackendList, but that adds more code duplication instead
of getting rid of it.
Patch by me. Review and testing by Ashutosh Bapat.
Some googling turned up multiple sources saying that older versions of icc
do not accept gcc-compatible asm blocks on IA64, though asm does work on
x86[_64]. This is apparently fixed as of icc version 12.0 or so, but that
doesn't help us much; if we have to carry the extra implementation anyway,
we may as well just use it for icc rather than add a compiler version test.
Hence, revert commit 2c713d6ea2 (though I
separated the icc code from the gcc code completely, producing what seems
cleaner code). Document the state of affairs more explicitly, both in
s_lock.h and postgres.c, and make some cosmetic adjustments around the
IA64 code in s_lock.h.
The atomics headers were written under the impression that icc doesn't
handle gcc-style asm blocks, but this is demonstrably false on x86_[64],
because s_lock.h has done it that way for more than a decade. (The jury is
still out on whether this also works on ia64, so I'm leaving ia64-related
code alone for the moment.) Treat gcc and icc the same in these headers.
This is less code and it should improve the results for icc, because we
hadn't gotten around to providing icc-specific implementations for most
of the atomics.
Intel's icc is generally able to swallow asm blocks written for gcc.
We have a few places that don't seem to know that, though. Experiment
with removing the special case for icc in ia64_get_bsp(); if the buildfarm
likes this, I'll try more cleanup. This is a good test case because it
involves a "stop" notation that seems like it might not be very portable.
Remove configure's checks for HAVE_POSIX_SIGNALS, HAVE_SIGPROCMASK, and
HAVE_SIGSETJMP. These APIs are required by the Single Unix Spec v2
(POSIX 1997), which we generally consider to define our minimum required
set of Unix APIs. Moreover, no buildfarm member has reported not having
them since 2012 or before, which means that even if the code is still live
somewhere, it's untested --- and we've made plenty of signal-handling
changes of late. So just take these APIs as given and save the cycles for
configure probes for them.
However, we can't remove as much C code as I'd hoped, because the Windows
port evidently still uses the non-POSIX code paths for signal masking.
Since we're largely emulating these BSD-style APIs for Windows anyway, it
might be a good thing to switch over to POSIX-like notation and thereby
remove a few more #ifdefs. But I'm not in a position to code or test that.
In the meantime, we can at least make things a bit more transparent by
testing for WIN32 explicitly in these places.
C89 requires <signal.h> to define sig_atomic_t, and there is no evidence
in the buildfarm that any supported platforms don't comply. Remove the
configure test to stop wasting build cycles on a purely historical issue.
(Once upon a time, we cared about supporting C89-compliant compilers on
machines with pre-C89 system headers, but that use-case has been dead for
quite a few years.)
I have some other fixes planned in this area, but let's start with this
to see if the buildfarm produces any surprising results.
On recent AIX it's necessary to configure gcc to use the native assembler
(because the GNU assembler hasn't been updated to handle AIX 6+). This
caused PG builds to fail with assembler syntax errors, because we'd try
to compile s_lock.h's gcc asm fragment for PPC, and that assembly code
relied on GNU-style local labels. We can't substitute normal labels
because it would fail in any file containing more than one inlined use of
tas(). Fortunately, that code is stable enough, and the PPC ISA is simple
enough, that it doesn't seem like too much of a maintenance burden to just
hand-code the branch offsets, removing the need for any labels.
Note that the AIX assembler only accepts "$" for the location counter
pseudo-symbol. The usual GNU convention is "."; but it appears that all
versions of gas for PPC also accept "$", so in theory this patch will not
break any other PPC platforms.
This has been reported by a few people, but Steve Underwood gets the credit
for being the first to pursue the problem far enough to understand why it
was failing. Thanks also to Noah Misch for additional testing.
During fireRIRrules(), get_row_security_policies can add to
securityQuals and withCheckOptions. Make sure to lock any relations
added at that point and before firing RIR rules on those expressions.
Back-patch to 9.5 where RLS was added.
Rather than consulting TransactionIdIsInProgress to see if an in-doubt
transaction is still running, consult XidInMVCCSnapshot. That requires
the same or fewer cycles as TransactionIdIsInProgress, and what's far
more important, it does not access shared data structures (at least in the
no-subxip-overflow case) so it incurs no contention. Furthermore, we would
have had to check XidInMVCCSnapshot anyway before deciding that we were
allowed to see the tuple.
There should never be a case where XidInMVCCSnapshot says a transaction is
done while TransactionIdIsInProgress says it's still running. The other
way around is quite possible though. The result of that difference is that
HeapTupleSatisfiesMVCC will no longer set hint bits on tuples whose source
transactions recently finished but are still running according to our
snapshot. The main cost of delaying the hint-bit setting is that repeated
visits to a just-committed tuple, by transactions none of which have
snapshots new enough to see the source transaction as done, will each
execute TransactionIdIsCurrentTransactionId, which they need not have done
before. However, that's normally just a small overhead, and no contention
costs are involved; so it seems well worth the benefit of removing
TransactionIdIsInProgress calls during the life of the source transaction.
The core idea for this patch is due to Jeff Janes, who also did the legwork
proving its performance benefits. His original proposal was to swap the
order of TransactionIdIsInProgress and XidInMVCCSnapshot calls in some
cases within HeapTupleSatisfiesMVCC. That was a bit messy though.
The idea that we could dispense with calling TransactionIdIsInProgress
altogether was mine, as is the final patch.
Until 9.4, pg_controldata output was all aligned. At some point
during 9.5 development, a new item was added, namely
"Current track_commit_timestamp setting:" which is two characters
too long to be aligned with the rest of the output. Fix this by
removing the noise word "Current" and adding the requisite number
of padding spaces. Since the six preceding items are also similar
in nature, remove "Current" and pad those as well in order to
maintain overall consistency. Backpatch to 9.5 where new offending
item was added.
We had a report from Stefan Kaltenbrunner of a case in which postmaster
log files overran available disk space because multiple backends spewed
enormous context stats dumps upon hitting an out-of-memory condition.
Given the lack of similar reports, this isn't a common problem, but it
still seems worth doing something about. However, we don't want to just
blindly truncate the output, because that might prevent diagnosis of OOM
problems. What seems like a workable compromise is to limit the dump to
100 child contexts per parent, and summarize the space used within any
additional child contexts. That should help because practical cases where
the dump gets long will typically be huge numbers of siblings under the
same parent context; while the additional debugging value from seeing
details about individual siblings beyond 100 will not be large, we hope.
Anyway it doesn't take much code or memory space to do this, so let's try
it like this and see how things go.
Since the summarization mechanism requires passing totals back up anyway,
I took the opportunity to add a "grand total" line to the end of the
printout.
The results of the KNN-search test cases were indeterminate, as they asked
the system to sort pairs of points that are exactly equidistant from the
query reference point. It's a bit surprising that we've seen no
platform-specific failures from this in the buildfarm. Perhaps IEEE-float
math is well enough standardized that no such failures will ever occur on
supported platforms ... but since this entire regression test has yet to be
shipped in any non-alpha release, that seems like an unduly optimistic
assumption. Tweak the queries so that the correct output is uniquely
defined.
(The other queries in this test are also underdetermined; but it looks like
they are regurgitating index rows in insertion order, so for the moment
assume that that behavior is stable enough.)
Per Greg Stark's experiments with VAX. Back-patch to 9.5 where this test
script was introduced.
Previously, convert_one_string_to_scalar() would examine up to 20 bytes of
the input string, producing a scalar conversion with theoretical precision
far greater than is of any possible use considering the other limitations
on the accuracy of the resulting selectivity estimate. (I think this
choice might pre-date the caller-level logic that strips any common prefix
of the strings; before that, there could have been value in scanning the
strings far enough to use all the precision available in a double.)
Aside from wasting cycles to little purpose, this choice meant that the
"denom" variable could grow to as much as 256^21 = 3.74e50, which could
overflow in some non-IEEE float arithmetics. While we don't really support
any machines with non-IEEE arithmetic anymore, this still seems like quite
an unnecessary platform dependency. Limit the scan to 12 bytes instead,
thus limiting "denom" to 256^13 = 2.03e31, a value more likely to be
computable everywhere.
Per testing by Greg Stark, which showed overflow failures in our standard
regression tests on VAX.
Since the distances used in this algorithm are small integers (not more
than the size of the U set, in fact), there is no good reason to use float
arithmetic for them. Use short ints instead: they're smaller, faster, and
require no special portability assumptions.
Per testing by Greg Stark, which disclosed that the code got into an
infinite loop on VAX for lack of IEEE-style float infinities. We don't
really care all that much whether Postgres can run on a VAX anymore,
but there seems sufficient reason to change this code anyway.
In passing, make a few other small adjustments to make the code match
usual Postgres coding style a bit better.
For no obvious reason, spi_printtup() was coded to enlarge the tuple
pointer table by just 256 slots at a time, rather than doubling the size at
each reallocation, as is our usual habit. For very large SPI results, this
makes for O(N^2) time spent in repalloc(), which of course soon comes to
dominate the runtime. Use the standard doubling approach instead.
This is a longstanding performance bug, so back-patch to all active
branches.
Neil Conway
With a bit of tweaking of the compile namestack data structure, we can
verify at compile time whether a CONTINUE or EXIT is legal. This is
surely better than leaving it to runtime, both because earlier is better
and because we can issue a proper error pointer. Also, we can get rid
of the ad-hoc old way of detecting the problem, which only took care of
CONTINUE not EXIT.
Jim Nasby, adjusted a bit by me
Having the roles remain after the test ends up causing repeated 'make
installcheck' runs to fail and may be risky from a security perspective
also, so remove them at the end of the test.
The code had bugs that would cause crashes if NULL was passed as that
argument (originally intended to mean not to bother returning its
value), and after inspection it turns out that nothing seems interested
in the case that *ts is NULL anyway. Therefore, remove the partial
checks intended to support that case.
Author: Michael Paquier
though I didn't include a proposed Assert.
Backpatch to 9.5.
PLyString_ToComposite() blithely overwrote proc->result.out.d, even though
for a composite result type the other union variant proc->result.out.r is
the one that should be valid. This could result in a crash if out.r had
in fact been filled in (proc->result.is_rowtype == 1) and then somebody
later attempted to use that data; as per bug #13579 from Paweł Michalak.
Just to add insult to injury, it didn't work for RECORD results anyway,
because record_in() would refuse the case.
Fix by doing the I/O function lookup in a local PLyTypeInfo variable,
as we were doing already in PLyObject_ToComposite(). This is not a great
technique because any fn_extra data allocated by the input function will
be leaked permanently (thanks to using TopMemoryContext as fn_mcxt).
But that's a pre-existing issue that is much less serious than a crash,
so leave it to be fixed separately.
This bug would be a potential security issue, except that plpython is
only available to superusers and the crash requires coding the function
in a way that didn't work before today's patches.
Add regression test cases covering all the supported methods of converting
composite results.
Back-patch to 9.1 where the faulty coding was introduced.
If we have the typmod that identifies a registered record type, there's no
reason that record_in() should refuse to perform input conversion for it.
Now, in direct SQL usage, record_in() will always be passed typmod = -1
with type OID RECORDOID, because no typmodin exists for type RECORD, so the
case can't arise. However, some InputFunctionCall users such as PLs may be
able to supply the right typmod, so we should allow this to support them.
Note: the previous coding and comment here predate commit 59c016aa9f.
There has been no case since 8.1 in which the passed type OID wouldn't be
valid; and if it weren't, this error message wouldn't be apropos anyway.
Better to let lookup_rowtype_tupdesc complain about it.
Back-patch to 9.1, as this is necessary for my upcoming plpython fix.
I'm committing it separately just to make it a bit more visible in the
commit history.
To avoid confusion, rename CreatePolicyStmt's 'cmd' to 'cmd_name',
parse_policy_command's 'cmd' to 'polcmd', and AlterPolicy's 'cmd_datum'
to 'polcmd_datum', per discussion with Noah and as a follow-up to his
correction of copynodes/equalnodes handling of the CreatePolicyStmt
'cmd' field.
Back-patch to 9.5 where the CreatePolicyStmt was introduced, as we
are still only in alpha.
When reworking bypassrls in AlterRole to operate the same way the other
attribute handling is done, I missed that the variable was incorrectly a
bool rather than an int. This meant that on platforms with an unsigned
char, we could end up with incorrect behavior during ALTER ROLE.
Pointed out by Andres thanks to tests he did changing our bool to be the
one from stdbool.h which showed this and a number of other issues.
Add regression tests to test CREATE/ALTER role for the various role
attributes. Arrange to leave roles behind for testing pg_dumpall, but
none which have the LOGIN attribute.
Back-patch to 9.5 where the AlterRole bug exists.
Commit 8cce08f168 used a left-shift
on a literal of 1 that could (in large allocations) be shifted by
31 or more bits. This was assigned to a local variable that was
already declared to be a long to protect against overruns of int,
but the literal in this shift needs to be declared long to allow it
to work correctly in some compilers.
Backpatch to 9.5, where the bug was introduced.
Report and patch by KaiGai Kohei, slighly modified based on
discussion.
plpgsql's error location context messages ("PL/pgSQL function fn-name line
line-no at stmt-type") would misreport a CONTINUE statement as being an
EXIT, and misreport a MOVE statement as being a FETCH. These are clear
bugs that have been there a long time, so back-patch to all supported
branches.
In addition, in 9.5 and HEAD, change the description of EXECUTE from
"EXECUTE statement" to just plain EXECUTE; there seems no good reason why
this statement type should be described differently from others that have
a well-defined head keyword. And distinguish GET STACKED DIAGNOSTICS from
plain GET DIAGNOSTICS. These are a bit more of a judgment call, and also
affect existing regression-test outputs, so I did not back-patch into
stable branches.
Pavel Stehule and Tom Lane
My expanded-objects patch (commit 1dc5ebc907) included code to make
plpgsql pass expanded-object variables as R/W pointers to certain functions
that are trusted for modifying such variables in-place. However, that
optimization got broken by commit 6c82d8d1fd, which arranged to share
a single ParamListInfo across most expressions evaluated by a plpgsql
function. We don't want a R/W pointer to be passed to other functions
just because we decided one function was safe! Fortunately, the breakage
was in the other direction, of never passing a R/W pointer at all, because
we'd always have pre-initialized the shared array slot with a R/O pointer.
So it was still functionally correct, but we were back to O(N^2)
performance for repeated use of "a := a || x". To fix, force an unshared
param array to be used when the R/W param optimization is active.
Commit 6c82d8d1fd is in HEAD only, so no need for a back-patch.
DO blocks use private simple_eval_estates to avoid intra-transaction memory
leakage, cf commit c7b849a89. I had forgotten about that while writing
commit 0fc94a5ba, but it means that expression execution trees created
within a DO block disappear immediately on exiting the DO block, and hence
can't safely be linked into plpgsql's session-wide cast hash table.
To fix, give a DO block a private cast hash table to go with its private
simple_eval_estate. This is less efficient than one could wish, since
DO blocks can no longer share any cast lookup work with other plpgsql
execution, but it shouldn't be too bad; in any case it's no worse than
what happened in DO blocks before commit 0fc94a5ba.
Per bug #13571 from Feike Steenbergen. Preliminary analysis by
Oleksandr Shulgin.
This fixes a bunch of somewhat pedantic warnings with new
compilers. Since by far the majority of other functions definitions use
the (void) style it just seems to be consistent to do so as well in the
remaining few places.
It was a bool, even though it should be CEOUC_WAIT_MODE. That's unlikely
to have a negative effect with the current definition of bool (char),
but it's definitely wrong.
Discussion: 20150812084351.GD8470@awork2.anarazel.de
Backpatch: 9.5, where ON CONFLICT was merged
Since a179232047 (vacuumdb: enable parallel mode) -1 has been assigned
to a boolean. That can, justifiedly, trigger compiler warnings. There's
also no need for ternary logic, result was only ever set to 0 or -1. So
don't.
Discussion: 20150812084351.GD8470@awork2.anarazel.de
Backpatch: 9.5
Doing so doesn't work if bool is a macro rather than a typedef.
Although c.h spends some effort to support configurations where bool is
a preexisting macro, help_config.c has existed this way since
2003 (b700a6), and there have not been any reports of
problems. Backpatch anyway since this is as riskless as it gets.
Discussion: 20150812084351.GD8470@awork2.anarazel.de
Backpatch: 9.0-master
Mistakenly relreplident was stored as a bool. That works today as c.h
typedefs bool to a char, but isn't very future proof.
Discussion: 20150812084351.GD8470@awork2.anarazel.de
Backpatch: 9.4 where replica identity was introduced.
This fixes presentation of non-ASCII messages to the Windows event log
and console in rare cases involving Korean locale. Processes like the
postmaster and checkpointer, but not processes attached to databases,
were affected. Back-patch to 9.4, where MessageEncoding was introduced.
The problem exists in all supported versions, but this change has no
effect in the absence of the code recognizing PG_UHC MessageEncoding.
Noticed while investigating bug #13427 from Dmitri Bourlatchkov.
Commit 49c817eab7 replaced with a hard
error the dubious pg_do_encoding_conversion() behavior when outside a
transaction. Reintroduce the historic soft failure locally within
pgwin32_message_to_UTF16(). This fixes errors when writing messages in
less-common encodings to the Windows event log or console. Back-patch
to 9.4, where the aforementioned commit first appeared.
Per bug #13427 from Dmitri Bourlatchkov.
Reduce lock levels down to ShareUpdateExclusiveLock for all autovacuum-related
relation options when setting them using ALTER TABLE.
Add infrastructure to allow varying lock levels for relation options in later
patches. Setting multiple options together uses the highest lock level required
for any option. Works for both main and toast tables.
Fabrízio Mello, reviewed by Michael Paquier, mild edit and additional regression
tests from myself
The error message wording for AttributeError has changed in Python 3.5.
For the plpython_error test, add a new expected file. In the
plpython_subtransaction test, we didn't really care what the exception
is, only that it is something coming from Python. So use a generic
exception instead, which has a message that doesn't vary across
versions.
This time, instead of using a core isolation test, put it on its own
test module; this way it can require the pageinspect module to be
present before running.
The module's Makefile is loosely modeled after test_decoding's, so that
it's easy to add further tests for either pg_regress or isolationtester
later.
Backpatch to 9.5.
In commit 95f4e59c32 I added a regression test case that examined
the plan of a query on system catalogs. That isn't a terribly great idea
because the catalogs tend to change from version to version, or even
within a version if someone makes an unrelated regression-test change that
populates the catalogs a bit differently. Usually I try to make planner
test cases rely on test tables that have not changed since Berkeley days,
but I got sloppy in this case because the submitted crasher example queried
the catalogs and I didn't spend enough time on rewriting it. But it was a
problem waiting to happen, as I was rudely reminded when I tried to port
that patch into Salesforce's Postgres variant :-(. So spend a little more
effort and rewrite the query to not use any system catalogs. I verified
that this version still provokes the Assert if 95f4e59c32866716's code fix
is reverted.
I also removed the EXPLAIN output from the test, as it turns out that the
assertion occurs while considering a plan that isn't the one ultimately
selected anyway; so there's no value in risking any cross-platform
variation in that printout.
Back-patch to 9.2, like the previous patch.
These are emitted by the new ax_pthread.m4 script version. They are not
used for anything in PostgreSQL, but let's keep the generated header file
up-to-date.
Andres Freund
One of the changes I made in commit 8703059c6b turns out not to have
been such a good idea: we still need the exception in join_is_legal() that
allows a join if both inputs already overlap the RHS of the special join
we're checking. Otherwise we can miss valid plans, and might indeed fail
to find a plan at all, as in recent report from Andreas Seltenreich.
That code was added way back in commit c17117649b, but I failed to
include a regression test case then; my bad. Put it back with a better
explanation, and a test this time. The logic does end up a bit different
than before though: I now believe it's appropriate to make this check
first, thereby allowing such a case whether or not we'd consider the
previous SJ(s) to commute with this one. (Presumably, we already decided
they did; but it was confusing to have this consideration in the middle
of the code that was handling the other case.)
Back-patch to all active branches, like the previous patch.
In some corner cases, it is possible for the BRIN index relation to be
extended by brin_getinsertbuffer but the new page not be used
immediately for anything by its callers; when this happens, the page is
initialized and the FSM is updated (by brin_getinsertbuffer) with the
info about that page, but these actions are not WAL-logged. A later
index insert/update can use the page, but since the page is already
initialized, the initialization itself is not WAL-logged then either.
Replay of this sequence of events causes recovery to fail altogether.
There is a related corner case within brin_getinsertbuffer itself, in
which we extend the relation to put a new index tuple there, but later
find out that we cannot do so, and do not return the buffer; the page
obtained from extension is not even initialized. The resulting page is
lost forever.
To fix, shuffle the code so that initialization is not the
responsibility of brin_getinsertbuffer anymore, in normal cases;
instead, the initialization is done by its callers (brin_doinsert and
brin_doupdate) once they're certain that the page is going to be used.
When either those functions determine that the new page cannot be used,
before bailing out they initialize the page as an empty regular page,
enter it in FSM and WAL-log all this. This way, the page is usable for
future index insertions, and WAL replay doesn't find trying to insert
tuples in pages whose initialization didn't make it to the WAL. The
same strategy is used in brin_getinsertbuffer when it cannot return the
new page.
Additionally, add a new step to vacuuming so that all pages of the index
are scanned; whenever an uninitialized page is found, it is initialized
as empty and WAL-logged. This closes the hole that the relation is
extended but the system crashes before anything is WAL-logged about it.
We also take this opportunity to update the FSM, in case it has gotten
out of date.
Thanks to Heikki Linnakangas for finding the problem that kicked some
additional analysis of BRIN page assignment code.
Backpatch to 9.5, where BRIN was introduced.
Discussion: https://www.postgresql.org/message-id/20150723204810.GY5596@postgresql.org
In 4b4b680c I passed a buffer index number (starting from 0) instead of
a proper Buffer id (which start from 1 for shared buffers) in two
places.
This wasn't noticed so far as one of those locations isn't compiled at
all (PrintPinnedBufs) and the other one (InvalidBuffer) requires a
unlikely, but possible, set of circumstances to trigger a symptom.
To reduce the likelihood of such incidents a bit also convert existing
open coded mappings from buffer descriptors to buffer ids with
BufferDescriptorGetBuffer().
Author: Qingqing Zhou
Reported-By: Qingqing Zhou
Discussion: CAJjS0u2ai9ooUisKtkV8cuVUtEkMTsbK8c7juNAjv8K11zeCQg@mail.gmail.com
Backpatch: 9.5 where the private ref count infrastructure was introduced
newnfa() failed to set the regex error state when malloc() fails.
Several places in regcomp.c failed to check for an error after calling
subre(). Each of these mistakes could lead to null-pointer-dereference
crashes in memory-starved backends.
Report and patch by Andreas Seltenreich. Back-patch to all branches.
Until now we computed these Param ID sets at the end of subquery_planner,
but that approach depends on subquery_planner returning a concrete Plan
tree. We would like to switch over to returning one or more Paths for a
subquery, and in that representation the necessary details aren't fully
fleshed out (not to mention that we don't really want to do this work for
Paths that end up getting discarded). Hence, refactor so that we can
compute the param ID sets at the end of planning, just before
set_plan_references is run.
The main change necessary to make this work is that we need to capture
the set of outer-level Param IDs available to the current query level
before exiting subquery_planner, since the outer levels' plan_params lists
are transient. (That's not going to pose a problem for returning Paths,
since all the work involved in producing that data is part of expression
preprocessing, which will continue to happen before Paths are produced.)
On the plus side, this change gets rid of several existing kluges.
Eventually I'd like to get rid of SS_finalize_plan altogether in favor of
doing this work during set_plan_references, but that will require some
complex rejiggering because SS_finalize_plan needs to visit subplans and
initplans before the main plan. So leave that idea for another day.
When creating a physical slot it's often useful to immediately reserve
the current WAL position instead of only doing after the first feedback
message arrives. That e.g. allows slots to guarantee that all the WAL
for a base backup will be available afterwards.
Logical slots already have to reserve WAL during creation, so generalize
that logic into being usable for both physical and logical slots.
Catversion bump because of the new parameter.
Author: Gurjeet Singh
Reviewed-By: Andres Freund
Discussion: CABwTF4Wh_dBCzTU=49pFXR6coR4NW1ynb+vBqT+Po=7fuq5iCw@mail.gmail.com
pg_dump produced fairly silly GRANT/REVOKE commands when dumping types from
pre-9.2 servers, and when dumping functions or procedural languages from
pre-7.3 servers. Those server versions lack the typacl, proacl, and/or
lanacl columns respectively, and pg_dump substituted default values that
were in fact incorrect. We ended up revoking all the owner's own
privileges for the object while granting all privileges to PUBLIC.
Of course the owner would then have those privileges again via PUBLIC, so
long as she did not try to revoke PUBLIC's privileges; which may explain
the lack of field reports. Nonetheless this is pretty silly behavior.
The stakes were raised by my recent patch to make pg_dump dump shell types,
because 9.2 and up pg_dump would proceed to emit bogus GRANT/REVOKE
commands for a shell type if dumping from a pre-9.2 server; and the server
will not accept GRANT/REVOKE commands for a shell type. (Perhaps it
should, but that's a topic for another day.) So the resulting dump script
wouldn't load without errors.
The right thing to do is to act as though these objects have default
privileges (null ACL entries), which causes pg_dump to print no
GRANT/REVOKE commands at all for them. That fixes the silly results
and also dodges the problem with shell types.
In passing, modify getProcLangs() to be less creatively different about
how to handle missing columns when dumping from older server versions.
Every other data-acquisition function in pg_dump does that by substituting
appropriate default values in the version-specific SQL commands, and I see
no reason why this one should march to its own drummer. Its use of
"SELECT *" was likewise not conformant with anyplace else, not to mention
it's not considered good SQL style for production queries.
Back-patch to all supported versions. Although 9.0 and 9.1 pg_dump don't
have the issue with typacl, they are more likely than newer versions to be
used to dump from ancient servers, so we ought to fix the proacl/lanacl
issues all the way back.
Commit 85e5e222b1 turns out not to have taken
care of all cases of the partially-evaluatable-PlaceHolderVar problem found
by Andreas Seltenreich's fuzz testing. I had set it up to check for risky
PHVs only in the event that we were making a star-schema-based exception to
the param_source_rels join ordering heuristic. However, it turns out that
the problem can occur even in joins that satisfy the param_source_rels
heuristic, in which case allow_star_schema_join() isn't consulted.
Refactor so that we check for risky PHVs whenever the proposed join has
any remaining parameterization.
Back-patch to 9.2, like the previous patch (except for the regression test
case, which only works back to 9.3 because it uses LATERAL).
Note that this discovery implies that problems of this sort could've
occurred in 9.2 and up even before the star-schema patch; though I've not
tried to prove that experimentally.
Several versions of the perl that comes with the Msys DTK have been
found to have a bug that fails to recognize a ' before a multiline $ in
some circumstances. To work around the problem, use a character class
for the '. Another solution would have been to use \n instead of $, but
that would have changed the test semantics very slightly.
Commit 2834855cb added a not-very-carefully-thought-out isolation test
to check a BRIN index bug fix. The test depended on the availability
of the pageinspect contrib module, which meant it did not work in
several common testing scenarios such as "make check-world". It's not
clear whether we want a core test depending on a contrib module like
that, but in any case, failing to deal with the possibility that the
module isn't present in the installation-under-test is not acceptable.
Remove that test pending some better solution.
There's no reason not to expose both restart_lsn and confirmed_flush
since they have rather distinct meanings. The former is the oldest WAL
still required and valid for both physical and logical slots, whereas
the latter is the location up to which a logical slot's consumer has
confirmed receiving data. Most of the time a slot will require older
WAL (i.e. restart_lsn) than the confirmed
position (i.e. confirmed_flush_lsn).
Author: Marko Tiikkaja, editorialized by me
Discussion: 559D110B.1020109@joh.to
XLogRecPtr was compared with InvalidTransactionId instead of
InvalidXLogRecPtr. As both are defined to the same value this doesn't
cause any actual problems, but it's still wrong.
Backpatch: 9.4-master, bug was introduced in 9.4
Immediately starting to stream after --create-slot is inconvenient in a
number of situations (e.g. when configuring a slot for use in
recovery.conf) and it's easy to just call pg_receivexlog twice in the
rest of the cases.
Author: Michael Paquier
Discussion: CAB7nPqQ9qEtuDiKY3OpNzHcz5iUA+DUX9FcN9K8GUkCZvG7+Ew@mail.gmail.com
Backpatch: 9.5, where the option was introduced
The allowed syntax for OVERLAPS, viz "row OVERLAPS row", is sufficiently
constrained that we don't actually need a precedence declaration for
OVERLAPS; indeed removing this declaration does not change the generated
gram.c file at all. Let's remove it to avoid confusion about whether
OVERLAPS has precedence or not. If we ever generalize what we allow for
OVERLAPS, we might need to put back a precedence declaration for it,
but we might want some other level than what it has today --- and leaving
the declaration there would just risk confusion about whether that would
be an incompatible change.
Likewise, remove OVERLAPS from the documentation's precedence table.
Per discussion with Noah Misch. Back-patch to 9.5 where we hacked up some
nearby precedence decisions.
commit 9043Fe390f4f0b4586cfe59cbd22314b9c3e2957 broke multibyte
regression tests because the commit removes the warning message when
temporary hash indexes is created, which has been added by commit
07af523870.
Back patched to 9.5 stable tree.
In de6fd1c8 I moved the the work around from 53f73879 into the aix
template. The previous location was removed in the former commit, and I
thought that it would be nice to emit a warning when running configure.
That didn't turn out to work because at the point the template is
included we don't know whether we're compiling a 32/64 bit binary and
it's possible to install compilers for both on a 64 bit kernel/OS.
So go back to a less ambitious approach and define
PG_FORCE_DISABLE_INLINE in port/aix.h, without emitting a warning. We
could try a more fancy approach, but it doesn't seem worth it.
This requires moving the check for PG_FORCE_DISABLE_INLINE in c.h to
after including the system headers included from therein which isn't
perfect, as it seems slightly more robust to include all system headers
in a similar environment. Oh well.
Discussion: 20150807132000.GC13310@awork2.anarazel.de
A removed check in ba3deeefb made all threads but the main one busy-loop
when -P was used. All threads computed the time to the next time the
progress report should be printed, but only the main thread did so and
re-scheduled it only for the future.
Reported-By: Jesper Pedersen
Discussion: 55C4E190.3050104@redhat.com
A new test case from Andreas Seltenreich showed that we were still a bit
confused about removing PlaceHolderVars during join removal. Specifically,
remove_rel_from_query would remove a PHV that was used only underneath
the removable join, even if the place where it's used was the join partner
relation and not the join clause being deleted. This would lead to a
"too late to create a new PlaceHolderInfo" error later on. We can defend
against that by checking ph_eval_at to see if the PHV could possibly be
getting used at some partner rel.
Also improve some nearby LATERAL-related logic. I decided that the check
on ph_lateral needed to take precedence over the check on ph_needed, in
case there's a lateral reference underneath the join being considered.
(That may be impossible, but I'm not convinced of it, and it's easy enough
to defend against the case.) Also, I realized that remove_rel_from_query's
logic for updating LateralJoinInfos is dead code, because we don't build
those at all until after join removal.
Back-patch to 9.3. Previous versions didn't have the LATERAL issues, of
course, and they also didn't attempt to remove PlaceHolderInfos during join
removal. (I'm starting to wonder if changing that was really such a great
idea.)
Some frontend code like e.g. pg_xlogdump or pg_resetxlog, has to use
backend headers. Unfortunately until now that code includes most of the
locking code. It's generally not nice to expose such low level details,
but de6fd1c898 made that a hard problem. We fall back to defining
'inline' away if the compiler doesn't support it - that can cause linker
errors like on buildfarm animal pademelon if a inline function
references backend only code.
To fix that problem separate definitions from lock.h that are required
from frontend code into lockdefs.h and use it in the relevant
places. I've only removed the minimal amount of necessary definitions
for now - it might turn out that we want more for other reasons.
To avoid such details being exposed again put some checks against being
included from frontend code into atomics.h, lock.h, lwlock.h and
s_lock.h. It's otherwise fairly easy to indirectly include these
headers.
Discussion: 20150806070902.GE12214@awork2.anarazel.de
Amit reviewed the replication origins patch and made some good
points. Address them. This fixes typos in error messages, docs and
comments and adds a missing error check (although in a
should-never-happen scenario).
Discussion: CAA4eK1JqUBVeWWKwUmBPryFaje4190ug0y-OAUHWQ6tD83V4xg@mail.gmail.com
Backpatch: 9.5, where replication origins were introduced.
Commit 9e7e29c75a introduced an Assert that
join removal didn't reduce the eval_at set of any PlaceHolderVar to empty.
At first glance it looks like join_is_removable ensures that's true --- but
actually, the loop in join_is_removable skips PlaceHolderVars that are not
referenced above the join due to be removed. So, if we don't want any
empty eval_at sets, the right thing to do is to delete any now-unreferenced
PlaceHolderVars from the data structure entirely.
Per fuzz testing by Andreas Seltenreich. Back-patch to 9.3 where the
aforesaid Assert was added.
Formerly, this function would always return "true" for an appendrel child
relation, because it would think that the appendrel parent was a potential
join target for the child. In principle that should only lead to some
inefficiency in planning, but fuzz testing by Andreas Seltenreich disclosed
that it could lead to "could not find pathkey item to sort" planner errors
in odd corner cases. Specifically, we would think that all columns of a
child table's multicolumn index were interesting pathkeys, causing us to
generate a MergeAppend path that sorts by all the columns. However, if any
of those columns weren't actually used above the level of the appendrel,
they would not get added to that rel's targetlist, which would result in
being unable to resolve the MergeAppend's sort keys against its targetlist
during createplan.c.
Backpatch to 9.3. In older versions, columns of an appendrel get added
to its targetlist even if they're not mentioned above the scan level,
so that the failure doesn't occur. It might be worth back-patching this
fix to older versions anyway, but I'll refrain for the moment.
Further testing revealed that commit f69b4b9495 was still a few
bricks shy of a load: minor tweaking of the previous test cases resulted
in the same wrong-outer-join-order problem coming back. After study
I concluded that my previous changes in make_outerjoininfo() were just
accidentally masking the problem, and should be reverted in favor of
forcing syntactic join order whenever an upper outer join's predicate
doesn't mention a lower outer join's LHS. This still allows the
chained-outer-joins style that is the normally optimizable case.
I also tightened things up some more in join_is_legal(). It seems to me
on review that what's really happening in the exception case where we
ignore a mismatched special join is that we're allowing the proposed join
to associate into the RHS of the outer join we're comparing it to. As
such, we should *always* insist that the proposed join be a left join,
which eliminates a bunch of rather dubious argumentation. The case where
we weren't enforcing that was the one that was already known buggy anyway
(it had a violatable Assert before the aforesaid commit) so it hardly
deserves a lot of deference.
Back-patch to all active branches, like the previous patch. The added
regression test case failed in all branches back to 9.1, and I think it's
only an unrelated change in costing calculations that kept 9.0 from
choosing a broken plan.
If some, but not all, of the length word has already been read, and the
next attempt to read sees exactly the number of bytes needed to complete
the length word, or fewer, then we'll incorrectly read less than all of
the available data.
Antonin Houska
When a write transaction commits, it must clear its XID advertised via
the ProcArray, which requires that we hold ProcArrayLock in exclusive
mode in order to prevent concurrent processes running GetSnapshotData
from seeing inconsistent results. When many processes try to commit
at once, ProcArrayLock must change hands repeatedly, with each
concurrent process trying to commit waking up to acquire the lock in
turn. To make things more efficient, when more than one backend is
trying to commit a write transaction at the same time, have just one
of them acquire ProcArrayLock in exclusive mode and clear the XIDs of
all processes in the group. Benchmarking reveals that this is much
more efficient at very high client counts.
Amit Kapila, heavily revised by me, with some review also from Pavan
Deolasee.
Commit e5550d5fec added some new
tests for ALTER TABLE which involved table scans. When
default_transaction_isolation = 'serializable' these acquire
relation-level SIReadLocks. The test results didn't cope with
that. Add SIReadLock as the minimum lock level for purposes of
these tests.
This could also be fixed by excluding this type of lock from the
my_locks view, but it would be a bug for SIReadLock to show up for
a relation which was not otherwise locked, so do it this way to
allow that sort of condition to cause a regression test failure.
There is some question whether we could avoid taking SIReadLocks
during these operations, but confirming the safety of that and
figuring out how to avoid the locks is not trivial, and would be
a separate patch.
Backpatch to 9.4 where the new tests were added.
pg_resetxlog.h contained two superfluous includes, origin.h superfluously
depended on logical.h, and pg_xlogdump's rmgrdesc.h only indirectly
included origin.h.
Backpatch: 9.5, where replication origins were introduced.
A few of the discrepancies had semantic significance, but I did not
track down the resulting user-visible bugs, if any. Back-patch to 9.5,
where all but one discrepancy appeared. The _equalCreateEventTrigStmt()
situation dates to 9.3 but does not affect semantics.
catversion bump due to readfuncs.c field order changes.
Commit 0ffc201a51 included this object
unconditionally. Being unprepared for that, most external, single-file
modules failed to build. This better aligns the GNU make build system
with the heuristic in the MSVC build's Project::AddDirResourceFile().
In-tree, installed modules set PGFILEDESC, so they will see no change.
Also, under PGXS, omit the nonfunctioning rule to build win32ver.rc.
Back-patch to 9.5, where the aforementioned commit first appeared.
Older versions have rmtree but not remove_tree. The one-argument forms
of these are equivalent, so replace remove_tree with rmtree. This allows
the tests to be run on oldish Msys systems.
For correctness of summarization results, it is critical that the
snapshot used during the summarization scan is able to see all tuples
that are live to all transactions -- including tuples inserted or
deleted by in-progress transactions. Otherwise, it would be possible
for a transaction to insert a tuple, then idle for a long time while a
concurrent transaction executes summarization of the range: this would
result in the inserted value not being considered in the summary.
Previously we were trying to use a MVCC snapshot in conjunction with
adding a "placeholder" tuple in the index: the snapshot would see all
committed tuples, and the placeholder tuple would catch insertions by
any new inserters. The hole is that prior insertions by transactions
that are still in progress by the time the MVCC snapshot was taken were
ignored.
Kevin Grittner reported this as a bogus error message during vacuum with
default transaction isolation mode set to repeatable read (because the
error report mentioned a function name not being invoked during), but
the problem is larger than that.
To fix, tweak IndexBuildHeapRangeScan to have a new mode that behaves
the way we need using SnapshotAny visibility rules. This change
simplifies the BRIN code a bit, mainly by removing large comments that
were mistaken. Instead, rely on the SnapshotAny semantics to provide
what it needs. (The business about a placeholder tuple needs to remain:
that covers the case that a transaction inserts a a tuple in a page that
summarization already scanned.)
Discussion: https://www.postgresql.org/message-id/20150731175700.GX2441@postgresql.org
In passing, remove a couple of unused declarations from brin.h and
reword a comment to be proper English. This part submitted by Kevin
Grittner.
Backpatch to 9.5, where BRIN was introduced.
Per the discussion in optimizer/README, it's unsafe to reassociate anything
into or out of the RHS of a SEMI or ANTI join. An example from Piotr
Stefaniak showed that join_is_legal() wasn't sufficiently enforcing this
rule, so lock it down a little harder.
I couldn't find a reasonably simple example of the optimizer trying to
do this, so no new regression test. (Piotr's example involved the random
search in GEQO accidentally trying an invalid case and triggering a sanity
check way downstream in clause selectivity estimation, which did not seem
like a sequence of events that would be useful to memorialize in a
regression test as-is.)
Back-patch to all active branches.
So far we have worked around the fact that some very old compilers do
not support 'inline' functions by only using inline functions
conditionally (or not at all). Since such compilers are very rare by
now, we have decided to rely on inline functions from 9.6 onwards.
To avoid breaking these old compilers inline is defined away when not
supported. That'll cause "function x defined but not used" type of
warnings, but since nobody develops on such compilers anymore that's
ok.
This change in policy will allow us to more easily employ inline
functions.
I chose to remove code previously conditional on PG_USE_INLINE as it
seemed confusing to have code dependent on a define that's always
defined.
Blacklisting of compilers, like in c53f73879f, now has to be done
differently. A platform template can define PG_FORCE_DISABLE_INLINE to
force inline to be defined empty.
Discussion: 20150701161447.GB30708@awork2.anarazel.de
Previously the message erroneously printed the same LSN twice as the
assignment to the start_lsn variable was before the message. Correct
that.
Reported-By: Marko Tiikkaja
Author: Marko Tiikkaja
Backpatch: 9.5, where logical decoding was introduced
I appear to accidentally have switched the comments for
pg_atomic_write_u32 and pg_atomic_read_u32 around. Also fix some minor
typos I found while fixing.
Noticed-By: Amit Kapila
Backpatch: 9.5
Per discussion, it really ought to do this. The original choice to
exclude shell types was probably made in the dark ages before we made
it harder to accidentally create shell types; but that was in 7.3.
Also, cause the standard regression tests to leave a shell type behind,
for convenience in testing the case in pg_dump and pg_upgrade.
Back-patch to all supported branches.
The tuplesort/tuplestore memory management logic assumed that the chunk
allocation overhead for its memtuples array could not increase when
increasing the array size. This is and always was true for tuplesort,
but we (I, I think) blindly copied that logic into tuplestore.c without
noticing that the assumption failed to hold for the much smaller array
elements used by tuplestore. Given rather small work_mem, this could
result in an improper complaint about "unexpected out-of-memory situation",
as reported by Brent DeSpain in bug #13530.
The easiest way to fix this is just to increase tuplestore's initial
array size so that the assumption holds. Rather than relying on magic
constants, though, let's export a #define from aset.c that represents
the safe allocation threshold, and make tuplestore's calculation depend
on that.
Do the same in tuplesort.c to keep the logic looking parallel, even though
tuplesort.c isn't actually at risk at present. This will keep us from
breaking it if we ever muck with the allocation parameters in aset.c.
Back-patch to all supported versions. The error message doesn't occur
pre-9.3, not so much because the problem can't happen as because the
pre-9.3 tuplestore code neglected to check for it. (The chance of
trouble is a great deal larger as of 9.3, though, due to changes in the
array-size-increasing strategy.) However, allowing LACKMEM() to become
true unexpectedly could still result in less-than-desirable behavior,
so let's patch it all the way back.
In commit b514a7460d, I changed the planner
so that it would allow nestloop paths to remain partially parameterized,
ie the inner relation might need parameters from both the current outer
relation and some upper-level outer relation. That's fine so long as we're
talking about distinct parameters; but the patch also allowed creation of
nestloop paths for cases where the inner relation's parameter was a
PlaceHolderVar whose eval_at set included the current outer relation and
some upper-level one. That does *not* work.
In principle we could allow such a PlaceHolderVar to be evaluated at the
lower join node using values passed down from the upper relation along with
values from the join's own outer relation. However, nodeNestloop.c only
supports simple Vars not arbitrary expressions as nestloop parameters.
createplan.c is also a few bricks shy of being able to handle such cases;
it misplaces the PlaceHolderVar parameters in the plan tree, which is why
the visible symptoms of this bug are "plan should not reference subplan's
variable" and "failed to assign all NestLoopParams to plan nodes" planner
errors.
Adding the necessary complexity to make this work doesn't seem like it
would be repaid in significantly better plans, because in cases where such
a PHV exists, there is probably a corresponding join order constraint that
would allow a good plan to be found without using the star-schema exception.
Furthermore, adding complexity to nodeNestloop.c would create a run-time
penalty even for plans where this whole consideration is irrelevant.
So let's just reject such paths instead.
Per fuzz testing by Andreas Seltenreich; the added regression test is based
on his example query. Back-patch to 9.2, like the previous patch.
It must be possible to multiply wal_buffers by XLOG_BLCKSZ without
overflowing int, or calculations in StartupXLOG will go badly wrong
and crash the server. Avoid that by imposing a maximum value on
wal_buffers. This will be just under 2GB, assuming the usual value
for XLOG_BLCKSZ.
Josh Berkus, per an analysis by Andrew Gierth.
If there are two different aggregates in the query with same inputs, and
the aggregates have the same initial condition and transition function,
only calculate the state value once, and only call the final functions
separately. For example, AVG(x) and SUM(x) aggregates have the same
transition function, which accumulates the sum and number of input tuples.
For a query like "SELECT AVG(x), SUM(x) FROM x", we can therefore
accumulate the state function only once, which gives a nice speedup.
David Rowley, reviewed and edited by me.
Only remove the default deny policy when a permissive policy exists
(either from the hook or defined by the user). If only restrictive
policies exist then no rows will be visible, as restrictive policies
shouldn't make rows visible. To address this requirement, a single
"USING (true)" permissive policy can be created.
Update the test_rls_hooks regression tests to create the necessary
"USING (true)" permissive policy.
Back-patch to 9.5 where RLS was added.
Per discussion with Dean.
If tablespace_map file is present without backup_label file, there is
no use of such file. There is no harm in retaining it, but it is better
to get rid of the map file so that we don't have any redundant file
in data directory and it will avoid any sort of confusion. It seems
prudent though to just rename the file out of the way rather than
delete it completely, also we ignore any error that occurs in rename
operation as even if map file is present without backup_label file,
it is harmless.
Back-patch to 9.5 where tablespace_map file was introduced.
Amit Kapila, reviewed by Robert Haas, Alvaro Herrera and me.
pg_xlog is often a symlink, typically to a different filesystem. Don't
get confused and comlain about by that, and just always pretend that it's a
normal directory, even if it's really a symlink.
Also add a test case for this.
Backpatch to 9.5.
Since commit 01f6bb4b2, TestLib.pm has exported path to tmp_check directory,
so let's use that also for the pg_rewind test clusters etc.
Also, in master, the $tempdir_short variable has not been used since commit
13d856e17, which moved the initdb-running code to TestLib.pm.
Backpatch to 9.5.
It's against project policy to use elog() for user-facing errors, or to
omit an errcode() selection for errors that aren't supposed to be "can't
happen" cases. Fix all the violations of this policy that result in
ERRCODE_INTERNAL_ERROR log entries during the standard regression tests,
as errors that can reliably be triggered from SQL surely should be
considered user-facing.
I also looked through all the files touched by this commit and fixed
other nearby problems of the same ilk. I do not claim to have fixed
all violations of the policy, just the ones in these files.
In a few places I also changed existing ERRCODE choices that didn't
seem particularly appropriate; mainly replacing ERRCODE_SYNTAX_ERROR
by something more specific.
Back-patch to 9.5, but no further; changing ERRCODE assignments in
stable branches doesn't seem like a good idea.
The Msys DTK perl, which is required to run TAP tests under Msys as a
native perl won't recognize the correct virtual paths, has its osname
recorded in the Config module as 'msys' instead of 'MSWin32'. To avoid
having to repeat the test a variable is created that is true iff the
osname is either of these values, and is then used everywhere that
matters.
As in commit 0a52d378b0, avoid doing something that has undefined
results according to the C standard, even though in practice there does
not seem to be any problem with it.
This fixes two places in numeric.c that demonstrably could call memcpy()
with such arguments. I looked through that file and didn't see any other
places with similar hazards; this is not to claim that there are not such
places in other files.
Per report from Piotr Stefaniak. Back-patch to 9.5 which is where the
previous commit was added. We're more or less setting a precedent that
we will not worry about this type of issue in pre-9.5 branches unless
someone demonstrates a problem in the field.
Commit c9b0cbe98b accidentally broke the
order of operations during postmaster shutdown: it resulted in removing
the per-socket lockfiles after, not before, postmaster.pid. This creates
a race-condition hazard for a new postmaster that's started immediately
after observing that postmaster.pid has disappeared; if it sees the
socket lockfile still present, it will quite properly refuse to start.
This error appears to be the explanation for at least some of the
intermittent buildfarm failures we've seen in the pg_upgrade test.
Another problem, which has been there all along, is that the postmaster
has never bothered to close() its listen sockets, but has just allowed them
to close at process death. This creates a different race condition for an
incoming postmaster: it might be unable to bind to the desired listen
address because the old postmaster is still incumbent. This might explain
some odd failures we've seen in the past, too. (Note: this is not related
to the fact that individual backends don't close their client communication
sockets. That behavior is intentional and is not changed by this patch.)
Fix by adding an on_proc_exit function that closes the postmaster's ports
explicitly, and (in 9.3 and up) reshuffling the responsibility for where
to unlink the Unix socket files. Lock file unlinking can stay where it
is, but teach it to unlink the lock files in reverse order of creation.
If a call to WaitForXLogInsertionsToFinish() returned a value in the middle
of a page, and another backend then started to insert a record to the same
page, and then you called WaitXLogInsertionsToFinish() again, the second
call might return a smaller value than the first call. The problem was in
GetXLogBuffer(), which always updated the insertingAt value to the
beginning of the requested page, not the actual requested location. Because
of that, the second call might return a xlog pointer to the beginning of
the page, while the first one returned a later position on the same page.
XLogFlush() performs two calls to WaitXLogInsertionsToFinish() in
succession, and holds WALWriteLock on the second call, which can deadlock
if the second call to WaitXLogInsertionsToFinish() blocks.
Reported by Spiros Ioannou. Backpatch to 9.4, where the more scalable
WALInsertLock mechanism, and this bug, was introduced.
LWLockAttemptLock pointlessly read the lock's state in every loop
iteration, even though pg_atomic_compare_exchange_u32() returns the old
value. Instead do that only once before the loop iteration.
Additionally there's no need to have the expected_state variable,
old_state mostly had the same value anyway.
Noticed-By: Heikki Linnakangas
Backpatch: 9.5, no reason to let the branches diverge at this point
The lwlock scalability work introduced two race conditions into the
lwlock variable support provided for xlog.c. First, and harmlessly on
most platforms, it set/read the variable without the spinlock in some
places. Secondly, due to the removal of the spinlock, it was possible
that a backend missed changes to the variable's state if it changed in
the wrong moment because checking the lock's state, the variable's state
and the queuing are not protected by a single spinlock acquisition
anymore.
To fix first move resetting the variable's from LWLockAcquireWithVar to
WALInsertLockRelease, via a new function LWLockReleaseClearVar. That
prevents issues around waiting for a variable's value to change when a
new locker has acquired the lock, but not yet set the value. Secondly
re-check that the variable hasn't changed after enqueing, that prevents
the issue that the lock has been released and already re-acquired by the
time the woken up backend checks for the lock's state.
Reported-By: Jeff Janes
Analyzed-By: Heikki Linnakangas
Reviewed-By: Heikki Linnakangas
Discussion: 5592DB35.2060401@iki.fi
Backpatch: 9.5, where the lwlock scalability went in
An outer join clause that didn't actually reference the RHS (perhaps only
after constant-folding) could confuse the join order enforcement logic,
leading to wrong query results. Also, nested occurrences of such things
could trigger an Assertion that on reflection seems incorrect.
Per fuzz testing by Andreas Seltenreich. The practical use of such cases
seems thin enough that it's not too surprising we've not heard field
reports about it.
This has been broken for a long time, so back-patch to all active branches.
Per complaint from Peter Holzer. It's useful to cover this special case,
since for a boolean variable "foo", earlier parts of the planner will have
reduced variants like "foo = true" to just "foo", and thus we may fail
to recognize the applicability of a partial index with predicate
"foo IS NOT NULL".
Back-patch to 9.5, but not further; given the lack of previous complaints
this doesn't seem like behavior to change in stable branches.
In many cases, we can implement a semijoin as a plain innerjoin by first
passing the righthand-side relation through a unique-ification step.
However, one of the cases where this does NOT work is where the RHS has
a LATERAL reference to the LHS; that makes the RHS dependent on the LHS
so that unique-ification is meaningless. joinpath.c understood this,
and so would not generate any join paths of this kind ... but join_is_legal
neglected to check for the case, so it would think that we could do it.
The upshot would be a "could not devise a query plan for the given query"
failure once we had failed to generate any join paths at all for the bogus
join pair.
Back-patch to 9.3 where LATERAL was added.
The PGXS-case directory does not exist in the non-PGXS case, and vice
versa. Add one or the other, not both. This is essentially cosmetic.
It makes Makefile.win32 more like the similar Makefile.global code.
Responsibility was formerly split between Makefile.global and pgxs.mk.
As a result of commit b58233c71b, in the
PGXS case, these variables were unset while parsing Makefile.global and
callees. Inclusion of Makefile.custom did not work from PGXS, and the
subtle difference seemed like a recipe for future bugs. Back-patch to
9.4, where that commit first appeared.
They are marked stable, but since they act on instantaneous state and it
is possible to consult state of transactions as they commit, the results
could change mid-query. They need to be marked volatile, and this
commit does so.
There would normally be a catversion bump here, but this is so much a
niche feature and I don't believe there's real damage from the incorrect
marking, that I refrained.
Backpatch to 9.5, where commit timestamps where introduced.
Per note from Fujii Masao.
The code was assuming that any NULL value in scan keys was due to IS
NULL or IS NOT NULL, but it turns out to be possible to get them with
other operators too, if they are used in contrived-enough ways. Easiest
way out of the problem seems to check explicitely for the IS NOT NULL
flag, instead of assuming it must be set if the IS NULL flag is not set,
when a null scan key is found; if neither flag is set, follow the lead
of other index AMs and assume that all indexable operators must be
strict, and thus the query is never satisfiable.
Also, add a comment to try and lure some future hacker into improving
analysis of scan keys in brin.
Per report from Andreas Seltenreich; diagnosis by Tom Lane.
Backpatch to 9.5.
Discussion: http://www.postgresql.org/message-id/20646.1437919632@sss.pgh.pa.us
When retrieving policies, if not working on the root target relation,
we actually want the relation's SELECT policies, regardless of
the top level query command type. For example in UPDATE t1...FROM t2
we need to apply t1's UPDATE policies and t2's SELECT policies.
Previously top level query command type was applied to all relations,
which was wrong. Add some regression coverage to ensure we don't
violate this principle in the future.
Report and patch by Dean Rasheed. Cherry picked from larger refactoring
patch and tweaked by me. Back-patched to 9.5 where RLS was introduced.
Although I think on all modern machines floating division by zero
results in Infinity not SIGFPE, we still don't want infinities
running around in the planner's costing estimates; too much risk
of that leading to insane behavior.
grouping_planner() failed to consider the possibility that final_rel
might be known dummy and hence have zero rowcount. (I wonder if it
would be better to set a rows estimate of 1 for dummy relations?
But at least in the back branches, changing this convention seems
like a bad idea, so I'll leave that for another day.)
Make certain that get_variable_numdistinct() produces a nonzero result.
The case that can be shown to be broken is with stadistinct < 0.0 and
small ntuples; we did not prevent the result from rounding to zero.
For good luck I applied clamp_row_est() to all the nonconstant return
values.
In ExecChooseHashTableSize(), Assert that we compute positive nbuckets
and nbatch. I know of no reason to think this isn't the case, but it
seems like a good safety check.
Per reports from Piotr Stefaniak. Back-patch to all active branches.
When we loop back to the top of doCustom after processing a backslash
command, we must reset the "now" timestamp, because that's used to
calculate the time spent executing the previous command.
Report and fix by Fabien Coelho. Backpatch to 9.5, where this was broken.
The reverted changes did not narrow the semantic gap between the MSVC
build system and the GNU make build system. For targets old and new
that run multiple suites (contribcheck, modulescheck, tapcheck), restore
vcregress.pl to mimicking "make -k" rather than the "make -S" default.
Lack of "-k" would be more burdensome than lack of "-S". Keep changes
reflecting contemporary changes to the GNU make build system, and keep
updates to Makefile parsing. Keep the loss of --psqldir in "check" and
"ecpgcheck" targets; it had been a no-op when used alongside
--temp-install. No log message mentioned any of the reverted changes.
Based on a germ by Michael Paquier. Back-patch to 9.5.
This code relied on knowing exactly where in the source tree temporary
installations might appear. A reasonable hacker may not think to update
this code when adding use of a temporary installation, making it
fragile. Observe that commit 9fa8b0ee90
broke it unnoticed, and commit dcae5facca
fixed it unnoticed. Back-patch to 9.5 only; use of temporary
installations is unlikely to change in released versions.
Policy USING and WITH CHECK expressions were using EXPR_KIND_WHERE for
parse analysis, which results in inappropriate ERROR messages when
the expression contains unsupported constructs such as aggregates.
Create a new ParseExprKind called EXPR_KIND_POLICY and tailor the
related messages to fit.
Reported by Noah Misch. Reviewed by Dean Rasheed, Alvaro Herrera,
and Robert Haas. Back-patch to 9.5 where RLS was introduced.
A Salesforce colleague of mine griped that the regression tests don't
exercise EvalPlanQualFetchRowMarks() and allied routines. Which is
a fair complaint. Add test cases that go through the REFERENCE and COPY
code paths. Unfortunately we don't have sufficient infrastructure right
now to exercise the FDW code path in the isolation tests, but this is
surely better than before.
AlterPolicy() and CreatePolicy() lacked their respective hook invocations.
Noted by Noah Misch, review by Dean Rasheed. Back-patch to 9.5 where
RLS was introduced.
On Windows, use listen_address=127.0.0.1 to allow TCP connections. We were
already using "pg_regress --config-auth" to set up HBA appropriately. The
standard_initdb helper function now sets up the server's
unix_socket_directories or listen_addresses in the config file, so that
they don't need to be specified in the pg_ctl command line anymore. That
way, the pg_ctl invocations in test programs don't need to differ between
Windows and Unix.
Add another helper function to configure the server's pg_hba.conf to allow
replication connections. The configuration is done similarly to "pg_regress
--config-auth": trust on domain sockets on Unix, and SSPI authentication on
Windows.
Replace calls to "cat" and "touch" programs with built-in perl code, as
those programs don't normally exist on Windows.
Add instructions in the docs on how to install IPC::Run on Windows. Adjust
vcregress.pl to not replace PERL5LIB completely in vcregress.pl, because
otherwise cannot install IPC::Run in a non-standard location easily.
Michael Paquier, reviewed by Noah Misch, some additional tweaking by me.
This option specifies a replication slot for WAL streaming (-X stream),
so that there can be continuous replication slot use between WAL
streaming during the base backup and the start of regular streaming
replication.
Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
When DefineQueryRewrite() is about to convert a table to a view, it checks
the table for features unavailable to views. For example, it rejects tables
having triggers. It omits to reject tables having relrowsecurity or a
pg_policy record. Fix that. To faciliate the repair, invent
relation_has_policies() which indicates the presence of policies on a
relation even when row security is disabled for that relation.
Reported by Noah Misch. Patch by me, review by Stephen Frost. Back-patch
to 9.5 where RLS was introduced.
CreatePolicy() and AlterPolicy() omit to create a pg_shdepend entry for
each role in the TO clause. Fix this by creating a new shared dependency
type called SHARED_DEPENDENCY_POLICY and assigning it to each role.
Reported by Noah Misch. Patch by me, reviewed by Alvaro Herrera.
Back-patch to 9.5 where RLS was introduced.
The previous code resulted in memory access beyond the path bounds. The
cure is to move it into a code branch that checks the value of lex_level
is within the correct bounds.
Bug reported and diagnosed by Piotr Stefaniak.
Don't print a WARNING if we get ESRCH from a kill() that's attempting
to cancel an autovacuum worker. It's possible (and has been seen in the
buildfarm) that the worker is already gone by the time we are able to
execute the kill, in which case the failure is harmless. About the only
plausible reason for reporting such cases would be to help debug corrupted
lock table contents, but this is hardly likely to be the most important
symptom if that happens. Moreover issuing a WARNING might scare users
more than is warranted.
Also, since sending a signal to an autovacuum worker is now entirely a
routine thing, and the worker will log the query cancel on its end anyway,
reduce the message saying we're doing that from LOG to DEBUG1 level.
Very minor cosmetic cleanup as well.
Since the main practical reason for doing this is to avoid unnecessary
buildfarm failures, back-patch to all active branches.
As pointed out by Tom, since HEAD has progressed beyond 9.5 in terms of
its catalog, we need to be sure catversion of HEAD is advanced beyond
that of 9.5. Corrects my mistake in the pg_stats view commit cfa928ff.
The pg_stats view is supposed to be restricted to only show rows
about tables the user can read. However, it sometimes can leak
information which could not otherwise be seen when row level security
is enabled. Fix that by not showing pg_stats rows to users that would
be subject to RLS on the table the row is related to. This is done
by creating/using the newly introduced SQL visible function,
row_security_active().
Along the way, clean up three call sites of check_enable_rls(). The second
argument of that function should only be specified as other than
InvalidOid when we are checking as a different user than the current one,
as in when querying through a view. These sites were passing GetUserId()
instead of InvalidOid, which can cause the function to return incorrect
results if the current user has the BYPASSRLS privilege and row_security
has been set to OFF.
Additionally fix a bug causing RI Trigger error messages to unintentionally
leak information when RLS is enabled, and other minor cleanup and
improvements. Also add WITH (security_barrier) to the definition of pg_stats.
Bumped CATVERSION due to new SQL functions and pg_stats view definition.
Back-patch to 9.5 where RLS was introduced. Reported by Yaroslav.
Patch by Joe Conway and Dean Rasheed with review and input by
Michael Paquier and Stephen Frost.
While postgres' use of SSL renegotiation is a good idea in theory, it
turned out to not work well in practice. The specification and openssl's
implementation of it have lead to several security issues. Postgres' use
of renegotiation also had its share of bugs.
Additionally OpenSSL has a bunch of bugs around renegotiation, reported
and open for years, that regularly lead to connections breaking with
obscure error messages. We tried increasingly complex workarounds to get
around these bugs, but we didn't find anything complete.
Since these connection breakages often lead to hard to debug problems,
e.g. spuriously failing base backups and significant latency spikes when
synchronous replication is used, we have decided to change the default
setting for ssl renegotiation to 0 (disabled) in the released
backbranches and remove it entirely in 9.5 and master.
Author: Andres Freund
Discussion: 20150624144148.GQ4797@alap3.anarazel.de
Backpatch: 9.5 and master, 9.0-9.4 get a different patch
This code was originally written as part of parallel query effort, but
it seems to have independent value, because if we make one decision
about where to get a PGPROC when we allocate and then put it back on a
different list at backend-exit time, bad things happen. This isn't
just a theoretical risk; we fixed an actual problem of this type in
commit e280c630a8.
join_clause_is_movable_into() is approximate, in the sense that it might
sometimes return "false" when actually it would be valid to push the given
join clause down to the specified level. This is okay ... but there was
an Assert in get_joinrel_parampathinfo() that's only safe if the answers
are always exact. Comment out the Assert, and add a bunch of commentary
to clarify what's going on.
Per fuzz testing by Andreas Seltenreich. The added regression test is
a pretty silly query, but it's based on his crasher example.
Back-patch to 9.2 where the faulty logic was introduced.