The page splitting code would go into infinite recursion if you try to
insert an index tuple that doesn't fit even on an empty page.
Per analysis and suggested fix by Andrew Gierth. Fixes bug #11555, reported
by Bryan Seitz (analysis happened over IRC). Backpatch to all supported
versions.
Testing by Amit Kapila, Andres Freund, and myself, with and without
other patches that also aim to improve scalability, seems to indicate
that this change is a significant win over the current value and over
smaller values such as 64. It's not clear how high we can push this
value before it starts to have negative side-effects elsewhere, but
going this far looks OK.
As of commit a87c72915 (which later got backpatched as far as 9.1),
we're explicitly supporting the notion that append relations can be
nested; this can occur when UNION ALL constructs are nested, or when
a UNION ALL contains a table with inheritance children.
Bug #11457 from Nelson Page, as well as an earlier report from Elvis
Pranskevichus, showed that there were still nasty bugs associated with such
cases: in particular the EquivalenceClass mechanism could try to generate
"join" clauses connecting an appendrel child to some grandparent appendrel,
which would result in assertion failures or bogus plans.
Upon investigation I concluded that all current callers of
find_childrel_appendrelinfo() need to be fixed to explicitly consider
multiple levels of parent appendrels. The most complex fix was in
processing of "broken" EquivalenceClasses, which are ECs for which we have
been unable to generate all the derived equality clauses we would like to
because of missing cross-type equality operators in the underlying btree
operator family. That code path is more or less entirely untested by
the regression tests to date, because no standard opfamilies have such
holes in them. So I wrote a new regression test script to try to exercise
it a bit, which turned out to be quite a worthwhile activity as it exposed
existing bugs in all supported branches.
The present patch is essentially the same as far back as 9.2, which is
where parameterized paths were introduced. In 9.0 and 9.1, we only need
to back-patch a small fragment of commit 5b7b5518d, which fixes failure to
propagate out the original WHERE clauses when a broken EC contains constant
members. (The regression test case results show that these older branches
are noticeably stupider than 9.2+ in terms of the quality of the plans
generated; but we don't really care about plan quality in such cases,
only that the plan not be outright wrong. A more invasive fix in the
older branches would not be a good idea anyway from a plan-stability
standpoint.)
I left the GUC in place for the beta period, so that people could experiment
with different values. No-one's come up with any data that a different value
would be better under some circumstances, so rather than try to document to
users what the GUC, let's just hard-code the current value, 8.
DetermineSleepTime() was previously called without blocked
signals. That's not good, because it allows signal handlers to
interrupt its workings.
DetermineSleepTime() was added in 9.3 with the addition of background
workers (da07a1e856), where it only read from
BackgroundWorkerList.
Since 9.4, where dynamic background workers were added (7f7485a0cd),
the list is also manipulated in DetermineSleepTime(). That's bad
because the list now can be persistently corrupted if modified by both
a signal handler and DetermineSleepTime().
This was discovered during the investigation of hangs on buildfarm
member anole. It's unclear whether this bug is the source of these
hangs or not, but it's worth fixing either way. I have confirmed that
it can cause crashes.
It luckily looks like this only can cause problems when bgworkers are
actively used.
Discussion: 20140929193733.GB14400@awork2.anarazel.de
Backpatch to 9.3 where background workers were introduced.
As noted in http://bugs.debian.org/763098 there is a conflict between
postgres' definition of CACHE_LINE_SIZE and the definition by various
*bsd platforms. It's debatable who has the right to define such a
name, but postgres' use was only introduced in 375d8526f2 (9.4), so
it seems like a good idea to rename it.
Discussion: 20140930195756.GC27407@msg.df7cb.de
Per complaint of Christoph Berg in the above email, although he's not
the original bug reporter.
Backpatch to 9.4 where the define was introduced.
Per discussion, revert the commit which added 'ignore_nulls' to
row_to_json. This capability would be better added as an independent
function rather than being bolted on to row_to_json. Additionally,
the implementation didn't address complex JSON objects, and so was
incomplete anyway.
Pointed out by Tom and discussed with Andrew and Robert.
The original design used an array of offsets into the variable-length
portion of a JSONB container. However, such an array is basically
uncompressible by simple compression techniques such as TOAST's LZ
compressor. That's bad enough, but because the offset array is at the
front, it tended to trigger the give-up-after-1KB heuristic in the TOAST
code, so that the entire JSONB object was stored uncompressed; which was
the root cause of bug #11109 from Larry White.
To fix without losing the ability to extract a random array element in O(1)
time, change this scheme so that most of the JEntry array elements hold
lengths rather than offsets. With data that's compressible at all, there
tend to be fewer distinct element lengths, so that there is scope for
compression of the JEntry array. Every N'th entry is still an offset.
To determine the length or offset of any specific element, we might have
to examine up to N preceding JEntrys, but that's still O(1) so far as the
total container size is concerned. Testing shows that this cost is
negligible compared to other costs of accessing a JSONB field, and that
the method does largely fix the incompressible-data problem.
While at it, rearrange the order of elements in a JSONB object so that
it's "all the keys, then all the values" not alternating keys and values.
This doesn't really make much difference right at the moment, but it will
allow providing a fast path for extracting individual object fields from
large JSONB values stored EXTERNAL (ie, uncompressed), analogously to the
existing optimization for substring extraction from large EXTERNAL text
values.
Bump catversion to denote the incompatibility in on-disk format.
We will need to fix pg_upgrade to disallow upgrading jsonb data stored
with 9.4 betas 1 and 2.
Heikki Linnakangas and Tom Lane
Andres pointed out that there was an extra ';' in equalPolicies, which
made me realize that my prior testing with CLOBBER_CACHE_ALWAYS was
insufficient (it didn't always catch the issue, just most of the time).
Thanks to that, a different issue was discovered, specifically in
equalRSDescs. This change corrects eqaulRSDescs to return 'true' once
all policies have been confirmed logically identical. After stepping
through both functions to ensure correct behavior, I ran this for
about 12 hours of CLOBBER_CACHE_ALWAYS runs of the regression tests
with no failures.
In addition, correct a few typos in the documentation which were pointed
out by Thom Brown (thanks!) and improve the policy documentation further
by adding a flushed out usage example based on a unix passwd file.
Lastly, clean up a few comments in the regression tests and pg_dump.h.
Several upcoming performance/scalability improvements require atomic
operations. This new API avoids the need to splatter compiler and
architecture dependent code over all the locations employing atomic
ops.
For several of the potential usages it'd be problematic to maintain
both, a atomics using implementation and one using spinlocks or
similar. In all likelihood one of the implementations would not get
tested regularly under concurrency. To avoid that scenario the new API
provides a automatic fallback of atomic operations to spinlocks. All
properties of atomic operations are maintained. This fallback -
obviously - isn't as fast as just using atomic ops, but it's not bad
either. For one of the future users the atomics ontop spinlocks
implementation was actually slightly faster than the old purely
spinlock using implementation. That's important because it reduces the
fear of regressing older platforms when improving the scalability for
new ones.
The API, loosely modeled after the C11 atomics support, currently
provides 'atomic flags' and 32 bit unsigned integers. If the platform
efficiently supports atomic 64 bit unsigned integers those are also
provided.
To implement atomics support for a platform/architecture/compiler for
a type of atomics 32bit compare and exchange needs to be
implemented. If available and more efficient native support for flags,
32 bit atomic addition, and corresponding 64 bit operations may also
be provided. Additional useful atomic operations are implemented
generically ontop of these.
The implementation for various versions of gcc, msvc and sun studio have
been tested. Additional existing stub implementations for
* Intel icc
* HUPX acc
* IBM xlc
are included but have never been tested. These will likely require
fixes based on buildfarm and user feedback.
As atomic operations also require barriers for some operations the
existing barrier support has been moved into the atomics code.
Author: Andres Freund with contributions from Oskari Saarenmaa
Reviewed-By: Amit Kapila, Robert Haas, Heikki Linnakangas and Álvaro Herrera
Discussion: CA+TgmoYBW+ux5-8Ja=Mcyuy8=VXAnVRHp3Kess6Pn3DMXAPAEA@mail.gmail.com,
20131015123303.GH5300@awork2.anarazel.de,
20131028205522.GI20248@awork2.anarazel.de
We removed a similar ban on this in json_object recently, but the ban in
datum_to_json was left, which generate4d sprutious errors in othee json
generators, notable json_build_object.
Along the way, add an assertion that datum_to_json is not passed a null
key. All current callers comply with this rule, but the assertion will
catch any possible future misbehaviour.
Previously, we used an lwlock that was held from the time we began
seeking a candidate buffer until the time when we found and pinned
one, which is disastrous for concurrency. Instead, use a spinlock
which is held just long enough to pop the freelist or advance the
clock sweep hand, and then released. If we need to advance the clock
sweep further, we reacquire the spinlock once per buffer.
This represents a significant increase in atomic operations around
buffer eviction, but it still wins on many workloads. On others, it
may result in no gain, or even cause a regression, unless the number
of buffer mapping locks is also increased. However, that seems like
material for a separate commit. We may also need to consider other
methods of mitigating contention on this spinlock, such as splitting
it into multiple locks or jumping the clock sweep hand more than one
buffer at a time, but those, too, seem like separate improvements.
Patch by me, inspired by a much larger patch from Amit Kapila.
Reviewed by Andres Freund.
Some compilers don't automatically search the current directory for
included files. 9cc2c182fc fixed that for builds from tarballs by
adding an include to the source directory. But that doesn't work when
the scanner is generated in the VPATH directory. Use the same search
path as the other parsers in the tree.
One compiler that definitely was affected is solaris' sun cc.
Backpatch to 9.1 which introduced using an actual parser for
replication commands.
Address a few typos in the row security update, pointed out
off-list by Adam Brightwell. Also include 'ALL' in the list
of commands supported, for completeness.
Buildfarm member tick identified an issue where the policies in the
relcache for a relation were were being replaced underneath a running
query, leading to segfaults while processing the policies to be added
to a query. Similar to how TupleDesc RuleLocks are handled, add in a
equalRSDesc() function to check if the policies have actually changed
and, if not, swap back the rsdesc field (using the original instead of
the temporairly built one; the whole structure is swapped and then
specific fields swapped back). This now passes a CLOBBER_CACHE_ALWAYS
for me and should resolve the buildfarm error.
In addition to addressing this, add a new chapter in Data Definition
under Privileges which explains row security and provides examples of
its usage, change \d to always list policies (even if row security is
disabled- but note that it is disabled, or enabled with no policies),
rework check_role_for_policy (it really didn't need the entire policy,
but it did need to be using has_privs_of_role()), and change the field
in pg_class to relrowsecurity from relhasrowsecurity, based on
Heikki's suggestion. Also from Heikki, only issue SET ROW_SECURITY in
pg_restore when talking to a 9.5+ server, list Bypass RLS in \du, and
document --enable-row-security options for pg_dump and pg_restore.
Lastly, fix a number of minor whitespace and typo issues from Heikki,
Dimitri, add a missing #include, per Peter E, fix a few minor
variable-assigned-but-not-used and resource leak issues from Coverity
and add tab completion for role attribute bypassrls as well.
This function created new Vars with varno different from varnoold, which
is a condition that should never prevail before setrefs.c does the final
variable-renumbering pass. The created Vars could not be seen as equal()
to normal Vars, which among other things broke equivalence-class processing
for them. The consequences of this were indeed visible in the regression
tests, in the form of failure to propagate constants as one would expect.
I stumbled across it while poking at bug #11457 --- after intentionally
disabling join equivalence processing, the security-barrier regression
tests started falling over with fun errors like "could not find pathkey
item to sort", because of failure to match the corrupted Vars to normal
ones.
When the number of allowed iterations is limited (either a "?" quantifier
or a bound expression), the last sub-match has to reach to the end of the
target string. The previous coding here first tried the shortest possible
match (one character, usually) and then gave up and back-tracked if that
didn't work, typically leading to failure to match overall, as shown in
bug #11478 from Christoph Berg. The minimum change to fix that would be to
not decrement k before "goto backtrack"; but that would be a pretty stupid
solution, because we'd laboriously try each possible sub-match length
before finally discovering that only ending at the end can work. Instead,
force the sub-match endpoint limit up to the end for even the first
shortest() call if we cannot have any more sub-matches after this one.
Bug introduced in my rewrite that added the iterdissect logic, commit
173e29aa5d. The shortest-first search code
was too closely modeled on the longest-first code, which hasn't got this
issue since it tries a match reaching to the end to start with anyway.
Back-patch to all affected branches.
Per discussion in bug #11350, log ALTER SYSTEM commands at the
log_statement=ddl level, rather than at the log_statement=all level.
Pointed out by Tomonari Katsumata.
Back-patch to 9.4 where ALTER SYSTEM was introduced.
While withCheckOption exprs had been handled in many cases by
happenstance, they need to be handled during set_plan_references and
more specifically down in set_plan_refs for ModifyTable plan nodes.
This is to ensure that the opfuncid's are set for operators referenced
in the withCheckOption exprs.
Identified as an issue by Thom Brown
Patch by Dean Rasheed
Back-patch to 9.4, where withCheckOption was introduced.
For the reason outlined in df4077cda2 also remove volatile qualifiers
from xlog.c. Some of these uses of volatile have been added after
noticing problems back when spinlocks didn't imply compiler
barriers. So they are a good test - in fact removing the volatiles
breaks when done without the barriers in spinlocks present.
Several uses of volatile remain where they are explicitly used to
access shared memory without locks. These locations are ok with
slightly out of date data, but removing the volatile might lead to the
variables never being reread from memory. These uses could also be
replaced by barriers, but that's a separate change of doubtful value.
Now that spinlocks (hopefully!) act as compiler barriers, as of commit
0709b7ee72, this should be safe. This
serves as a demonstration of the new coding style, and may be optimized
better on some machines as well.
There are four weaknesses in728f152e07f998d2cb4fe5f24ec8da2c3bda98f2:
* append_init() in heapdesc.c was ugly and required that rm_identify
return values are only valid till the next call. Instead just add a
couple more switch() cases for the INIT_PAGE cases. Now the returned
value will always be valid.
* a couple rm_identify() callbacks missed masking xl_info with
~XLR_INFO_MASK.
* pg_xlogdump didn't map a NULL rm_identify to UNKNOWN or a similar
string.
* append_init() was called when id=NULL - which should never actually
happen. But it's better to be careful.
Testing reveals that that doing a memcmp() before the strcoll() costs
practically nothing, at least on the systems we tested, and it speeds
up sorts containing many equal strings significatly.
Peter Geoghegan. Review by myself and Heikki Linnakangas. Comments
rewritten by me.
Building on the updatable security-barrier views work, add the
ability to define policies on tables to limit the set of rows
which are returned from a query and which are allowed to be added
to a table. Expressions defined by the policy for filtering are
added to the security barrier quals of the query, while expressions
defined to check records being added to a table are added to the
with-check options of the query.
New top-level commands are CREATE/ALTER/DROP POLICY and are
controlled by the table owner. Row Security is able to be enabled
and disabled by the owner on a per-table basis using
ALTER TABLE .. ENABLE/DISABLE ROW SECURITY.
Per discussion, ROW SECURITY is disabled on tables by default and
must be enabled for policies on the table to be used. If no
policies exist on a table with ROW SECURITY enabled, a default-deny
policy is used and no records will be visible.
By default, row security is applied at all times except for the
table owner and the superuser. A new GUC, row_security, is added
which can be set to ON, OFF, or FORCE. When set to FORCE, row
security will be applied even for the table owner and superusers.
When set to OFF, row security will be disabled when allowed and an
error will be thrown if the user does not have rights to bypass row
security.
Per discussion, pg_dump sets row_security = OFF by default to ensure
that exports and backups will have all data in the table or will
error if there are insufficient privileges to bypass row security.
A new option has been added to pg_dump, --enable-row-security, to
ask pg_dump to export with row security enabled.
A new role capability, BYPASSRLS, which can only be set by the
superuser, is added to allow other users to be able to bypass row
security using row_security = OFF.
Many thanks to the various individuals who have helped with the
design, particularly Robert Haas for his feedback.
Authors include Craig Ringer, KaiGai Kohei, Adam Brightwell, Dean
Rasheed, with additional changes and rework by me.
Reviewers have included all of the above, Greg Smith,
Jeff McCormick, and Robert Haas.
This is primarily useful for the upcoming pg_xlogdump --stats feature,
but also allows to remove some duplicated code in the rmgr_desc
routines.
Due to the separation and harmonization, the output of dipsplayed
records changes somewhat. But since this isn't enduser oriented
content that's ok.
It's potentially desirable to further change pg_xlogdump's display of
records. It previously wasn't possible to show the record type
separately from the description forcing it to be in the last
column. But that's better done in a separate commit.
Author: Abhijit Menon-Sen, slightly editorialized by me
Reviewed-By: Álvaro Herrera, Andres Freund, and Heikki Linnakangas
Discussion: 20140604104716.GA3989@toroid.org
This new GUC context option allows GUC parameters to have the combined
properties of PGC_BACKEND and PGC_SUSET, ie, they don't change after
session start and non-superusers can't change them. This is a more
appropriate choice for log_connections and log_disconnections than their
previous context of PGC_BACKEND, because we don't want non-superusers
to be able to affect whether their sessions get logged.
Note: the behavior for log_connections is still a bit odd, in that when
a superuser attempts to set it from PGOPTIONS, the setting takes effect
but it's too late to enable or suppress connection startup logging.
It's debatable whether that's worth fixing, and in any case there is
a reasonable argument for PGC_SU_BACKEND to exist.
In passing, re-pgindent the files touched by this commit.
Fujii Masao, reviewed by Joe Conway and Amit Kapila
Since this makes the bucket headers use ~10x as much memory, properly
account for that memory when we figure out whether everything fits
in work_mem. This might result in some cases that previously used
only a single batch getting split into multiple batches, but it's
unclear as yet whether we need defenses against that case, and if so,
what the shape of those defenses should be.
It's worth noting that even in these edge cases, users should still be
no worse off than they would have been last week, because commit
45f6240a8f saved a big pile of memory
on exactly the same workloads.
Tomas Vondra, reviewed and somewhat revised by me.
Previously replication commands like IDENTIFY_COMMAND were not logged
even when log_statements is set to all. Some users who want to audit
all types of statements were not satisfied with this situation. To
address the problem, this commit adds new GUC log_replication_commands.
If it's enabled, all replication commands are logged in the server log.
There are many ways to allow us to enable that logging. For example,
we can extend log_statement so that replication commands are logged
when it's set to all. But per discussion in the community, we reached
the consensus to add separate GUC for that.
Reviewed by Ian Barwick, Robert Haas and Heikki Linnakangas.
The code that tried to split a page at 75/25 ratio, when appending to the
end of an index, was buggy in two ways. First, there was a silly typo that
caused it to just fill the left page as full as possible. But the logic as
it was intended wasn't correct either, and would actually have given a ratio
closer to 60/40 than 75/25.
Gaetano Mendola spotted the typo. Backpatch to 9.4, where this code was added.
The code for raising a NUMERIC value to an integer power wasn't very
careful about large powers. It got an outright wrong answer for an
exponent of INT_MIN, due to failure to consider overflow of the Abs(exp)
operation; which is fixable by using an unsigned rather than signed
exponent value after that point. Also, even though the number of
iterations of the power-computation loop is pretty limited, it's easy for
the repeated squarings to result in ridiculously enormous intermediate
values, which can take unreasonable amounts of time/memory to process,
or even overflow the internal "weight" field and so produce a wrong answer.
We can forestall misbehaviors of that sort by bailing out as soon as the
weight value exceeds what will fit in int16, since then the final answer
must overflow (if exp > 0) or underflow (if exp < 0) the packed numeric
format.
Per off-list report from Pavel Stehule. Back-patch to all supported
branches.
Provide an option to skip NULL values in a row when generating a JSON
object from that row with row_to_json. This can reduce the size of the
JSON object in cases where columns are NULL without really reducing the
information in the JSON object.
This also makes row_to_json into a single function with default values,
rather than having multiple functions. In passing, change array_to_json
to also be a single function with default values (we don't add an
'ignore_nulls' option yet- it's not clear that there is a sensible
use-case there, and it hasn't been asked for in any case).
Pavel Stehule
Instead of palloc'ing each HashJoinTuple individually, allocate 32kB chunks
and pack the tuples densely in the chunks. This avoids the AllocChunk
header overhead, and the space wasted by standard allocator's habit of
rounding sizes up to the nearest power of two.
This doesn't contain any planner changes, because the planner's estimate of
memory usage ignores the palloc overhead. Now that the overhead is smaller,
the planner's estimates are in fact more accurate.
Tomas Vondra, reviewed by Robert Haas.
The code I added in commit f343a880d5 was
careless about preserving AND/OR flatness: it could create a structure with
an OR node directly underneath another one. That breaks an assumption
that's fairly important for planning efficiency, not to mention triggering
various Asserts (as reported by Benjamin Smith). Add a trifle more logic
to handle the case properly.
Previously, they functioned as barriers against CPU reordering but not
compiler reordering, an odd API that required extensive use of volatile
everywhere that spinlocks are used. That's error-prone and has negative
implications for performance, so change it.
In theory, this makes it safe to remove many of the uses of volatile
that we currently have in our code base, but we may find that there are
some bugs in this effort when we do. In the long run, though, this
should make for much more maintainable code.
Patch by me. Review by Andres Freund.
This provides a convenient method of classifying input values into buckets
that are not necessarily equal-width. It works on any sortable data type.
The choice of function name is a bit debatable, perhaps, but showing that
there's a relationship to the SQL standard's width_bucket() function seems
more attractive than the other proposals.
Petr Jelinek, reviewed by Pavel Stehule
The xml type previously rejected "content" that is empty or consists
only of spaces. But the SQL/XML standard allows that, so change that.
The accepted values for XML "documents" are not changed.
Reviewed-by: Ali Akbar <the.apaan@gmail.com>
Now that ALTER TABLE .. ALL IN TABLESPACE has replaced the previous
ALTER TABLESPACE approach, it makes sense to move the calls down in
to ProcessUtilitySlow where the rest of ALTER TABLE is handled.
This also means that event triggers will support ALTER TABLE .. ALL
(which was the impetus for the original change, though it has other
good qualities also).
Álvaro Herrera
Back-patch to 9.4 as the original rework was.
Some Sparc CPUs can be run in various coherence models, ranging from
RMO (relaxed) over PSO (partial) to TSO (total). Solaris has always
run CPUs in TSO mode while in userland, but linux didn't use to and
the various *BSDs still don't. Unfortunately the sparc TAS/S_UNLOCK
were only correct under TSO. Fix that by adding the necessary memory
barrier instructions. On sparcv8+, which should be all relevant CPUs,
these are treated as NOPs if the current consistency model doesn't
require the barriers.
Discussion: 20140630222854.GW26930@awork2.anarazel.de
Will be backpatched to all released branches once a few buildfarm
cycles haven't shown up problems. As I've no access to sparc, this is
blindly written.
Every redo routine uses the same idiom to determine what to do to a page:
check if there's a backup block for it, and if not read, the buffer if the
block exists, and check its LSN. Refactor that into a common function,
XLogReadBufferForRedo, making all the redo routines shorter and more
readable.
This has no user-visible effect, and makes no changes to the WAL format.
Reviewed by Andres Freund, Alvaro Herrera, Michael Paquier.
This patch allows us to execute ALTER SYSTEM RESET command to
remove the configuration entry from postgresql.auto.conf.
Vik Fearing, reviewed by Amit Kapila and me.
68a2e52bba has introduced LWLockAcquireCommon() containing the
previous contents of LWLockAcquire() plus added functionality. The
latter then calls it, just like LWLockAcquireWithVar(). Because the
majority of callers don't need the added functionality, declare the
common code as inline. The compiler then can optimize away the unused
code. Doing so is also useful when looking at profiles, to
differentiate the users.
Backpatch to 9.4, the first branch to contain LWLockAcquireCommon().
Since the dawn of time (aka Postgres95) multiple pins of the same
buffer by one backend have been optimized not to modify the shared
refcount more than once. This optimization has always used a NBuffer
sized array in each backend keeping track of a backend's pins.
That array (PrivateRefCount) was one of the biggest per-backend memory
allocations, depending on the shared_buffers setting. Besides the
waste of memory it also has proven to be a performance bottleneck when
assertions are enabled as we make sure that there's no remaining pins
left at the end of transactions. Also, on servers with lots of memory
and a correspondingly high shared_buffers setting the amount of random
memory accesses can also lead to poor cpu cache efficiency.
Because of these reasons a backend's buffers pins are now kept track
of in a small statically sized array that overflows into a hash table
when necessary. Benchmarks have shown neutral to positive performance
results with considerably lower memory usage.
Patch by me, review by Robert Haas.
Discussion: 20140321182231.GA17111@alap3.anarazel.de
The list of posting lists it's dealing with can contain placeholders for
deleted posting lists. The placeholders are kept around so that they can
be WAL-logged, but we must be careful to not try to access them.
This fixes bug #11280, reported by Mårten Svantesson. Backpatch to 9.4,
where the compressed data leaf page code was added.
This reverts commit e23014f3d4.
As the side effect of the reverted commit, when the unit is
specified, the reloption was stored in the catalog with the unit.
This broke pg_dump (specifically, it prevented pg_dump from
outputting restorable backup regarding the reloption) and
turned the buildfarm red. Revert the commit until the fixed
version is ready.
This is useful to allow to set GUCs to values that include spaces;
something that wasn't previously possible. The primary case motivating
this is the desire to set default_transaction_isolation to 'repeatable
read' on a per connection basis, but other usecases like seach_path do
also exist.
This introduces a slight backward incompatibility: Previously a \ in
an option value would have been passed on literally, now it'll be
taken as an escape.
The relevant mailing list discussion starts with
20140204125823.GJ12016@awork2.anarazel.de.
This introduces an infrastructure which allows us to specify the units
like ms (milliseconds) in integer relation option, like GUC parameter.
Currently only autovacuum_vacuum_cost_delay reloption can accept
the units.
Reviewed by Michael Paquier
Previously, only a single-byte character was allowed as an
escape. This patch allows it to be a multi-byte character, though it
still must be a single character.
Reviewed by Heikki Linnakangas and Tom Lane.
If SELECT FOR UPDATE NOWAIT tries to lock a tuple that is concurrently
being updated, it might fail to honor its NOWAIT specification and block
instead of raising an error.
Fix by adding a no-wait flag to EvalPlanQualFetch which it can pass down
to heap_lock_tuple; also use it in EvalPlanQualFetch itself to avoid
blocking while waiting for a concurrent transaction.
Authors: Craig Ringer and Thomas Munro, tweaked by Álvaro
http://www.postgresql.org/message-id/51FB6703.9090801@2ndquadrant.com
Per Thomas Munro in the course of his SKIP LOCKED feature submission,
who also provided one of the isolation test specs.
Backpatch to 9.4, because that's as far back as it applies without
conflicts (although the bug goes all the way back). To that branch also
backpatch Thomas Munro's new NOWAIT test cases, committed in master by
Heikki as commit 9ee16b49f0 .
In some cases, not all Vars were being correctly marked as having been
modified for updatable security barrier views, which resulted in invalid
plans (eg: when security barrier views were created over top of
inheiritance structures).
In passing, be sure to update both varattno and varonattno, as _equalVar
won't consider the Vars identical otherwise. This isn't known to cause
any issues with updatable security barrier views, but was noticed as
missing while working on RLS and makes sense to get fixed.
Back-patch to 9.4 where updatable security barrier views were
introduced.
Use SECURITY_LOCAL_USERID_CHANGE while building temporary tables;
only escalate to SECURITY_RESTRICTED_OPERATION while potentially
running user-supplied code. The more secure mode was preventing
temp table creation. Add regression tests to cover this problem.
This fixes Bug #11208 reported by Bruno Emanuel de Andrade Silva.
Backpatch to 9.4, where the bug was introduced.
Prevent automatic oid assignment when in binary upgrade mode. Also
throw an error when contrib/pg_upgrade_support functions are called when
not in binary upgrade mode.
This prevent automatically-assigned oids from conflicting with later
pre-assigned oids coming from the old cluster. It also makes sure oids
are preserved in call important cases.
There's no point in setting up a context error callback when doing
conditional lock acquisition, because we never actually wait and so the
user wouldn't be able to see the context message anywhere. In fact,
this is more in line with what ConditionalXactLockTableWait is doing.
Backpatch to 9.4, where this was added.
The symbol was added by 71901ab6d; the original code was introduced by
6868ed749. Development of both overlapped which is why we apparently
failed to notice.
This is a (very slight) behavior change, so I'm not backpatching this to
9.4 for now, even though the symbol does exist there.
Event triggers want to know the OID of the interesting object created,
which is the main type. The array created as part of the operation is
just a subsidiary object which is not of much interest.
Other DDL commands are already returning the OID, which is required for
future additional event trigger work. This is merely making these
commands in line with the rest of utility command support.
Add a succint comment explaining why it's correct to change the
persistence in this way. Also s/loggedness/persistence/ because native
speakers didn't like the latter term.
Fabrízio and Álvaro
CheckConstraintFetch() leaked a cstring in the caller's context for each
CHECK constraint expression it copied into the relcache. Ordinarily that
isn't problematic, but it can be during CLOBBER_CACHE testing because so
many reloads can happen during a single query; so complicate the code
slightly to allow freeing the cstring after use. Per testing on buildfarm
member barnacle.
This is exactly like the leak fixed in AttrDefaultFetch() by commit
078b2ed291. (Yes, this time I did look for
other instances of the same coding pattern :-(.) Like that patch, no
back-patch, since it seems unlikely that there's any problem except under
very artificial test conditions.
BTW, it strikes me that both of these places would require further work
comparable to commit ab8c84db2f, if we ever
supported defaults or check constraints on system catalogs: they both
assume they are copying into an empty relcache data structure, and that
conceivably wouldn't be the case during recursive reloading of a system
catalog. This does not seem worth worrying about for the moment, since
there is no near-term prospect of supporting any such thing. So I'll
just note the possibility for the archives' sake.
This enables changing permanent (logged) tables to unlogged and
vice-versa.
(Docs for ALTER TABLE / SET TABLESPACE got shuffled in an order that
hopefully makes more sense than the original.)
Author: Fabrízio de Royes Mello
Reviewed by: Christoph Berg, Andres Freund, Thom Brown
Some tweaking by Álvaro Herrera
Cause the path extraction operators to return their lefthand input,
not NULL, if the path array has no elements. This seems more consistent
since the case ought to correspond to applying the simple extraction
operator (->) zero times.
Cause other corner cases in field/element/path extraction to return NULL
rather than failing. This behavior is arguably more useful than throwing
an error, since it allows an expression index using these operators to be
built even when not all values in the column are suitable for the
extraction being indexed. Moreover, we already had multiple
inconsistencies between the path extraction operators and the simple
extraction operators, as well as inconsistencies between the JSON and
JSONB code paths. Adopt a uniform rule of returning NULL rather than
throwing an error when the JSON input does not have a structure that
permits the request to be satisfied.
Back-patch to 9.4. Update the release notes to list this as a behavior
change since 9.3.
As 'ALTER TABLESPACE .. MOVE ALL' really didn't change the tablespace
but instead changed objects inside tablespaces, it made sense to
rework the syntax and supporting functions to operate under the
'ALTER (TABLE|INDEX|MATERIALIZED VIEW)' syntax and to be in
tablecmds.c.
Pointed out by Alvaro, who also suggested the new syntax.
Back-patch to 9.4.
jsonb's #> operator segfaulted (dereferencing a null pointer) if the RHS
was a zero-length array, as reported in bug #11207 from Justin Van Winkle.
json's #> operator returns NULL in such cases, so for the moment let's
make jsonb act likewise.
Also add a bunch of regression test queries memorializing the -> and #>
operators' behavior for this and other corner cases.
There is a good argument for changing some of these behaviors, as they
are not very consistent with each other, and throwing an error isn't
necessarily a desirable behavior for operators that are likely to be
used in indexes. However, everybody can agree that a core dump is the
Wrong Thing, and we need test cases even if we decide to change their
expected output later.
While the space is optional, it seems nicer to be consistent with what
you get if you do "SET search_path=...". SET always normalizes the
separator to be comma+space.
Christoph Martin
This reverts commit 083d29c65b.
The commit changed the code so that it causes an errors when
IDENTIFY_SYSTEM returns three columns. But which prevents us
from using the replication-related utilities against the server
with older version. This is not what we want. For that
compatibility, we allow the utilities to receive three columns
as the result of IDENTIFY_SYSTEM eventhough it actually returns
four columns in 9.4 or later.
Pointed out by Andres Freund.
5a991ef869 added new column into
the result of IDENTIFY_SYSTEM command. But it was not reflected into
several codes checking that result. Specifically though the number of
columns in the result was increased to 4, it was still compared with 3
in some replication codes.
Back-patch to 9.4 where the number of columns in IDENTIFY_SYSTEM
result was increased.
Report from Michael Paquier
In support of this, have the MSVC build follow GNU make in preferring
GNUmakefile over Makefile when a directory contains both.
Michael Paquier, reviewed by MauMau.