The original coding was
var->value = (Datum) state;
which is bogus, and then in commit 2f0f7b4bce
it was "corrected" to
var->value = PointerGetDatum(state);
which is a faithful translation but still wrong.
This seems purely cosmetic, though, so no need for a back-patch.
Pavel Stehule
distro version of perl.
David Wheeler and Alex Hunsaker.
Backpatch to 9.1 where it applies cleanly. A simple workaround is available for earlier
branches, and further effort doesn't seem warranted.
In the cases where the result of the called proc is negated, we should
explicitly test both inputs for empty, to ensure we'll never return "true"
for an unsatisfiable query. In other cases we can rely on the called proc
to say the right thing.
Same bug as reported by Thom Brown for check constraints on tables: the
constraint must be dumped separately from the domain, otherwise it is
restored before the data and thus prevents potentially-violating data
from being loaded in the first place.
Per Dean Rasheed
This adds some I/O stats to the logging of autovacuum (when the
operation takes long enough that log_autovacuum_min_duration causes it
to be logged), so that it is easier to tune. Notably, it adds buffer
I/O counts (hits, misses, dirtied) and read and write rate.
Authors: Greg Smith and Noah Misch
A simple thinko in ginRedoUpdateMetapage, namely failing to increment a
loop counter, led to inserting records into the last pending-list page in
the wrong order (the opposite of that intended). So far as I can tell,
this would not upset the code that eventually flushes pending items into
the main part of the GIN index. But it did break the code that searched
the pending list for matches, resulting in transient failure to find
matching entries during index lookups, as illustrated in bug #6307 from
Maksym Boguk.
Back-patch to 8.4 where the incorrect code was introduced.
This speeds up snapshot-taking and reduces ProcArrayLock contention.
Also, the PGPROC (and PGXACT) structures used by two-phase commit are
now allocated as part of the main array, rather than in a separate
array, and we keep ProcArray sorted in pointer order. These changes
are intended to minimize the number of cache lines that must be pulled
in to take a snapshot, and testing shows a substantial increase in
performance on both read and write workloads at high concurrencies.
Pavan Deolasee, Heikki Linnakangas, Robert Haas
The WITH [NO] DATA option was not supported, nor the ability to specify
replacement column names; the former limitation wasn't even documented, as
per recent complaint from Naoya Anzai. Fix by moving the responsibility
for supporting these options into the executor. It actually takes less
code this way ...
catversion bump due to change in representation of IntoClause, which might
affect stored rules.
exception handler. This was a regression in 9.1, when the capability
to catch specific SPI errors was added, so backpatch to 9.1.
Mika Eloranta, with some editing by Jan Urbański.
The original coding would not work for discrete ranges in which the
canonicalization rule is to produce symmetric boundaries (either [] or ()
style), as noted by Jeff Davis. Florian Pflug pointed out that we could
fix that by invoking the canonicalization function to see if the range
"between" the two given ranges normalizes to empty. This implementation
of Florian's idea is a tad slower than the original code, but only in the
case where there actually is a canonicalization function --- if not, it's
essentially the same logic as before.
Since range types can be created by non-superusers, we need to consider
their permissions. Ideally we'd check this when the type is used, not
when it's created, but that seems like much more trouble than it's worth.
The existing restriction that the support functions be immutable already
prevents most cases where an unauthorized call to a function might be
thought a security issue, and the fact that the user has no access to
the results of the system's calls to subtype_diff closes off the other
plausible reason for concern. So this check is basically pro-forma,
but let's make it anyway.
It's not clear that a per-datatype typanalyze function would be any more
useful than a generic typanalyze for ranges. What *is* clear is that
letting unprivileged users select typanalyze functions is a crash risk or
worse. So remove the option from CREATE TYPE AS RANGE, and instead put in
a generic typanalyze function for ranges. The generic function does
nothing as yet, but hopefully we'll improve that before 9.2 release.
Per discussion, the zero-argument forms aren't really worth the catalog
space (just write 'empty' instead). The one-argument forms have some use,
but they also have a serious problem with looking too much like functional
cast notation; to the point where in many real use-cases, the parser would
misinterpret what was wanted.
Committing this as a separate patch, with the thought that we might want
to revert part or all of it if we can think of some way around the cast
ambiguity.
Implement these tests directly instead of constructing a singleton range
and then applying range-contains. This saves a range serialize/deserialize
cycle as well as a couple of redundant bound-comparison steps, and adds
very little code on net.
Remove elem_contained_by_range from the GiST opclass: it doesn't belong
there because there is no way to use it in an index clause (where the
indexed column would have to be on the left). Its commutator is in the
opclass, and that's what counts.
In the normal course of events, this matters only if ALTER DEFAULT
PRIVILEGES has been used to revoke default INSERT permission. Whether
or not the new behavior is more or less likely to be what the user wants
when dealing only with the built-in privilege facilities is arguable,
but it's clearly better when using a loadable module such as sepgsql
that may use the hook in ExecCheckRTPerms to enforce additional
permissions checks.
KaiGai Kohei, reviewed by Albe Laurenz
Per discussion, relax the range input/construction rules so that the
only hard error is lower bound > upper bound. Cases where the lower
bound is <= upper bound, but the range nonetheless normalizes to empty,
are now permitted.
Fix core dump in range_adjacent when bounds are infinite. Marginal
cleanup of regression test cases, some more code commenting.
Fix up some infelicitous coding in DefineRange, and add some missing error
checks. Rearrange operator strategy number assignments for GiST anyrange
opclass so that they don't make such a mess of opr_sanity's table of
operator names associated with different strategy numbers. Assign
hopefully-temporary selectivity estimators to range operators that didn't
have one --- poor as the estimates are, they're still a lot better than the
default 0.5 estimate, and they'll shut up the opr_sanity test that wants to
see selectivity estimators on all built-in operators.
When the system is idle for awhile after activity, the "smoothed_alloc"
state variable in BgBufferSync converges slowly to zero. With standard
IEEE float arithmetic this results in several iterations with denormalized
values, which causes kernel traps and annoying log messages on some
poorly-designed platforms. There's no real need to track such small values
of smoothed_alloc, so we can prevent the kernel traps by forcing it to zero
as soon as it's too small to be interesting for our purposes. This issue
is purely cosmetic, since the iterations don't happen fast enough for the
kernel traps to pose any meaningful performance problem, but still it seems
worth shutting up the log messages.
The kernel log messages were previously reported by a number of people,
but kudos to Greg Matthews for tracking down exactly where they were coming
from.
When wal_level = 'hot_standby' we touched the last page of the
relation during a VACUUM, even if nothing else had happened.
That would alter the LSN of the last block and set the mtime
of the relation file unnecessarily. Noted by Thom Brown.
This gets rid of an impressive amount of duplicative code, with only
minimal behavior changes. DROP FOREIGN DATA WRAPPER now requires object
ownership rather than superuser privileges, matching the documentation
we already have. We also eliminate the historical warning about dropping
a built-in function as unuseful. All operations are now performed in the
same order for all object types handled by dropcmds.c.
KaiGai Kohei, with minor revisions by me
Use of anynonarray was a crude hack to get around ambiguity versus the
array inclusion operators of the same names. My previous patch to extend
the parser's type resolution heuristics makes that unnecessary, so use
the more general declaration instead. This eliminates a wart that these
operators couldn't be used with ranges over arrays, which are otherwise
supported just fine.
Also, mark range_before and range_after as commutator operators,
per discussion with Jeff Davis.
For a very long time, one of the parser's heuristics for resolving
ambiguous operator calls has been to assume that unknown-type literals are
of the same type as the other input (if it's known). However, this was
only used in the first step of quickly checking for an exact-types match,
and thus did not help in resolving matches that require coercion, such as
matches to polymorphic operators. As we add more polymorphic operators,
this becomes more of a problem. This patch adds another use of the same
heuristic as a last-ditch check before failing to resolve an ambiguous
operator or function call. In particular this will let us define the range
inclusion operator in a less limited way (to come in a follow-on patch).
A very long time ago, language names were specified as literals rather
than identifiers, so this code was added to do case-folding. But that
style has ben deprecated for many years so this isn't needed any more.
Language names will still be downcased when specified as unquoted
identifiers, but quoted identifiers or the old style using string
literals will be left as-is.
This gives a much better error message when the object of interest is
concurrently dropped and avoids needlessly failing when the object of
interest is concurrently dropped and recreated. It also improves the
behavior of two concurrent DROP IF EXISTS operations targeted at the
same object; as before, one will drop the object, but now the other
will emit the usual NOTICE indicating that the object does not exist,
instead of rolling back. As a fringe benefit, it's also slightly
less code.
Fix assorted infelicities, such as dependency on OIDs that aren't
hardwired, as well as outright misdeclaration of daterange_canonical(),
which resulted in crashes if you invoked it directly. Add some more
regression tests to try to catch similar mistakes in future.
This can change the meaning of queries, if the blank line happens to
occur in the middle of a quoted literal, as per complaint from Tomas Vondra.
Back-patch to all supported branches.
Move the responsibility for caching specialized information about range
types into the type cache, so that the catalog lookups only have to occur
once per session. Rearrange APIs a bit so that fn_extra caching is
actually effective in the GiST support code. (Use of OidFunctionCallN is
bad enough for performance in itself, but it also prevents the function
from exploiting fn_extra caching.)
The range I/O functions are still not very bright about caching repeated
lookups, but that seems like material for a separate patch.
Also, avoid unnecessary use of memcpy to fetch/store the range type OID and
flags, and don't use the full range_deserialize machinery when all we need
to see is the flags value.
Also fix API error in range_gist_penalty --- it was failing to set *penalty
for any case involving an empty range.
A range type whose element type has 'd' alignment must have 'd' alignment
itself, else there is no guarantee that the element value can be used
in-place. (Because range_deserialize uses att_align_pointer which forcibly
aligns the given pointer, violations of this rule did not lead to SIGBUS
but rather to garbage data being extracted, as in one of the added
regression test cases.)
Also, you can't put a toast pointer inside a range datum, since the
referenced value could disappear with the range datum still present.
For consistency with the handling of arrays and records, I also forced
decompression of in-line-compressed bound values. It would work to store
them as-is, but our policy is to avoid situations that might result in
double compression.
Add assorted regression tests for this, and bump catversion because of
fixes to built-in pg_type entries.
Also some marginal cleanup of inconsistent/unnecessary error checks.
Change range_lower and range_upper to return NULL rather than throwing an
error when the input range is empty or the relevant bound is infinite. Per
discussion, throwing an error seems likely to be unduly hard to work with.
Also, this is more consistent with the behavior of the constructors, which
treat NULL as meaning an infinite bound.
Change range_before, range_after, range_adjacent to return false rather
than throwing an error when one or both input ranges are empty.
The original definition is unnecessarily difficult to use, and also can
result in undesirable planner failures since the planner could try to
compare an empty range to something else while deriving statistical
estimates. (This was, in fact, the cause of repeatable regression test
failures on buildfarm member jaguar, as well as intermittent failures
elsewhere.)
Also tweak rangetypes regression test to not drop all the objects it
creates, so that the final state of the regression database contains
some rangetype objects for pg_dump testing.
No functional changes in this commit (except I could not resist the
temptation to re-word a couple of error messages). This is just manual
cleanup after pgindent to make the code look reasonably like other PG
code, in preparation for more detailed code review to come.
Previously we waited for wal_writer_delay before flushing WAL. Now
we also wake WALWriter as soon as a WAL buffer page has filled.
Significant effect observed on performance of asynchronous commits
by Robert Haas, attributed to the ability to set hint bits on tuples
earlier and so reducing contention caused by clog lookups.
This seems to have been just an oversight in previous foreign-table work.
A quick grep didn't turn up any other places where RELKIND_FOREIGN_TABLE
was obviously omitted.
One change noted by Alexander Soudakov, the other by me.
Back-patch to 9.1.
This adds the "auto" option to the \x command, which switches to the
expanded mode when the normal output would be wider than the screen.
reviewed by Noah Misch
If it turns out we've locked the wrong OID, release the old lock. In
most cases, it's pretty harmless to retain the extra lock, but this
seems tidier and avoids using lock table slots unnecessarily.
Per discussion with Tom Lane.
Previously, you'd get "function pg_catalog.pg_get_functiondef(integer) does
not exist", which is at best rather unprofessional-looking. Back-patch
to 8.4 where \ef was introduced.
Josh Kupershmidt
This reverts commit 0180bd6180.
contrib/userlock is gone, but user-level locking still exists,
and is exposed via the pg_advisory* family of functions.
If malloc(0) returns NULL, the binary search in findSecLabels() will
probably go into an infinite loop when there are no security labels,
because NULL-1 is greater than NULL after wraparound.
(We've seen this pathology before ... I wonder whether there's a way to
detect the class of bugs automatically?)
Diagnosis and patch by Steve Singer, cosmetic adjustments by me
Forgot to call RestoreBkpBlocks() in the redo-function, as pointed out by
Simon Riggs. In redo of a regular heap insert, it's taken care of in
heap_redo(), but this new record type uses the heap2 RM, and heap2_redo()
does not take care of that for you.
Also, failed to reset the vmbuffer and all_visibile_cleared local variables
after switching to a new buffer.
It used to be cleaned in maintainer-clean, but that is inconsistent
with other cleaning of NLS files in nls-global.mk, and it's also wrong
overall, because it's not part of the distribution tarball, which is
the base definition of the maintainer-clean target.
This greatly reduces the WAL volume, especially when the table is narrow.
The overhead of locking the heap page is also reduced. Reduced WAL traffic
also makes it scale a lot better, if you run multiple COPY processes at
the same time.
In particular, my previous patch expected the create_index test to run
before the inherit test; but this was only true in the serial schedule.
Rearrange this portion of the schedules to be more consistent.
Per buildfarm results.
Add PlaceHolderVar wrappers as needed to make UNION ALL sub-select output
expressions appear non-constant and distinct from each other. This makes
the world safe for add_child_rel_equivalences to do what it does. Before,
it was possible for that function to add identical expressions to different
EquivalenceClasses, which logically should imply merging such ECs, which
would be wrong; or to improperly add a constant to an EquivalenceClass,
drastically changing its behavior. Per report from Teodor Sigaev.
The only currently known consequence of this bug is "MergeAppend child's
targetlist doesn't match MergeAppend" planner failures in 9.1 and later.
I am suspicious that there may be other failure modes that could affect
older release branches; but in the absence of any hard evidence, I'll
refrain from back-patching further than 9.1.
a new macro, DatumGetInetPP(), that does not. This brings these macros
in line with other DatumGet*P() macros.
Backpatch to 8.3, where 1-byte header varlenas were introduced.
In a regular VACUUM, it's OK to skip pages for which a cleanup lock
isn't immediately available; the next VACUUM will deal with them. If
we're scanning the entire relation to advance relfrozenxid, we might
need to wait, but only if there are tuples on the page that actually
require freezing. These changes should greatly reduce the incidence
of of vacuum processes getting "stuck".
Simon Riggs and Robert Haas
Further experimentation reveals that my previous change didn't fix the
issue entirely: these tests would still fail at the spring-forward DST
transition. There doesn't seem to be any great value in testing this
specific issue for both timestamp and timestamptz, so just lose the
latter tests.
It was inadvertently changed to 201111111, which is a wrong date. Change it
to current date, and remove the comment that was supposed to remind me to
fix it before committing.
This assumption can be wrong when the toaster is passed a raw on-disk
tuple, because the tuple might pre-date an ALTER TABLE ADD COLUMN operation
that added columns without rewriting the table. In such a case the tuple's
natts value is smaller than what we expect from the tuple descriptor, and
so its t_hoff value could be smaller too. In fact, the tuple might not
have a null bitmap at all, and yet our current opinion of it is that it
contains some trailing nulls.
In such a situation, toast_insert_or_update did the wrong thing, because
to save a few lines of code it would use the old t_hoff value as the offset
where heap_fill_tuple should start filling data. This did not leave enough
room for the new nulls bitmap, with the result that the first few bytes of
data could be overwritten with null flag bits, as in a recent report from
Hubert Depesz Lubaczewski.
The particular case reported requires ALTER TABLE ADD COLUMN followed by
CREATE TABLE AS SELECT * FROM ... or INSERT ... SELECT * FROM ..., and
further requires that there be some out-of-line toasted fields in one of
the tuples to be copied; else we'll not reach the troublesome code.
The problem can only manifest in this form in 8.4 and later, because
before commit a77eaa6a95, CREATE TABLE AS or
INSERT/SELECT wouldn't result in raw disk tuples getting passed directly
to heap_insert --- there would always have been at least a junkfilter in
between, and that would reconstitute the tuple header with an up-to-date
t_natts and hence t_hoff. But I'm backpatching the tuptoaster change all
the way anyway, because I'm not convinced there are no older code paths
that present a similar risk.
I broke it in a previous commit because I neglected to install the
necessary incantations to have getopt() work on Windows.
Per red blots in buildfarm.
inline_set_returning_function failed to distinguish functions returning
generic RECORD (which require a column list in the RTE, as well as run-time
type checking) from those with multiple OUT parameters (which do not).
This prevented inlining from happening. Per complaint from Jay Levitt.
Back-patch to 8.4 where this capability was introduced.
This mode prints out the permutations that would be run by the given
spec file, in the same format used by the permutation lines in spec
files. This helps in building new spec files.
Author: Alexander Shulgin, with some tweaks by me
Instead of filling files as they appear, pre-pad the
WAL files received when streaming xlog the same way
that the server does. Data is streamed into a .partial
file which is then renamed()d into palce when it's complete,
but it will always be 16MB.
This also means that the starting position for pg_receivexlog
is now simply right after the last complete segment, and we
never need to deal with partial segments there.
Patch by me, review by Fujii Masao
If we use a PlaceHolderVar from the outer relation in an inner indexscan,
we need to reference the PlaceHolderVar as such as the value to be passed
in from the outer relation. The previous code effectively tried to
reconstruct the PHV from its component expression, which doesn't work since
(a) the Vars therein aren't necessarily bubbled up far enough, and (b) it
would be the wrong semantics anyway because of the possibility that the PHV
is supposed to have gone to null at some point before the current join.
Point (a) led to "variable not found in subplan target list" planner
errors, but point (b) would have led to silently wrong answers.
Per report from Roger Niederland.
If we have an inequality key that constrains the other end of the index,
it doesn't directly help us in doing the initial positioning ... but it
does imply a NOT NULL constraint on the index column. If the index stores
nulls at this end, we can use the implied NOT NULL condition for initial
positioning, just as if it had been stated explicitly. This avoids wasting
time when there are a lot of nulls in the column. This is the reverse of
the examples given in bugs #6278 and #6283, which were about failing to
stop early when we encounter nulls at the end of the indexscan.
As pointed out by Naoya Anzai, my previous try at this was a few bricks
shy of a load, because I had forgotten that the initial-positioning logic
might not try to skip over nulls at the end of the index the scan will
start from. We ought to fix that, because it represents an unnecessary
inefficiency, but first let's get the scan-stop logic back to a safe
state. With this patch, we preserve the performance benefit requested
in bug #6278 for the case of scanning forward into NULLs (in a NULLS
LAST index), but the reverse case of scanning backward across NULLs
when there's no suitable initial-positioning qual is still inefficient.
Previously, we skipped a checkpoint if no WAL had been written since
last checkpoint, though this does not appear in user documentation.
As of now, we skip a checkpoint until we have written at least one
enough WAL to switch the next WAL file. This greatly reduces the
level of activity and number of WAL messages generated by a very
low activity server. This is safe because the purpose of a checkpoint
is to act as a starting place for a recovery, in case of crash.
This patch maintains minimal WAL volume for replay in case of crash,
thus maintaining very low crash recovery time.
There was a timing window between when oldestActiveXid was derived
and when it should have been derived that only shows itself under
heavy load. Move code around to ensure correct timing of derivation.
No change to StartupSUBTRANS() code, which is where this failed.
Bug report by Chris Redekop
If the initial snapshot had overflowed then we can start whenever
the latest snapshot is empty, not overflowed or as we did already,
start when the xmin on primary was higher than xmax of our starting
snapshot, which proves we have full snapshot data.
Bug report by Chris Redekop
In assert-enabled builds, we assert during the shutdown sequence that
the queues have been properly emptied, and during process startup that
we are inheriting empty queues. In non-assert enabled builds, we just
save a few cycles.
This allows us to give correct syntax error pointers when complaining
about ungrouped variables in a join query with aggregates or GROUP BY.
It's pretty much irrelevant for the planner's use of the function, though
perhaps it might aid debugging sometimes.
If a tuple in a syscache contains an out-of-line toasted field, and we
try to fetch that field shortly after some other transaction has committed
an update or deletion of the tuple, there is a race condition: vacuum
could come along and remove the toast tuples before we can fetch them.
This leads to transient failures like "missing chunk number 0 for toast
value NNNNN in pg_toast_2619", as seen in recent reports from Andrew
Hammond and Tim Uckun.
The design idea of syscache is that access to stale syscache entries
should be prevented by relation-level locks, but that fails for at least
two cases where toasted fields are possible: ANALYZE updates pg_statistic
rows without locking out sessions that might want to plan queries on the
same table, and CREATE OR REPLACE FUNCTION updates pg_proc rows without
any meaningful lock at all.
The least risky fix seems to be an idea that Heikki suggested when we
were dealing with a related problem back in August: forcibly detoast any
out-of-line fields before putting a tuple into syscache in the first place.
This avoids the problem because at the time we fetch the parent tuple from
the catalog, we should be holding an MVCC snapshot that will prevent
removal of the toast tuples, even if the parent tuple is outdated
immediately after we fetch it. (Note: I'm not convinced that this
statement holds true at every instant where we could be fetching a syscache
entry at all, but it does appear to hold true at the times where we could
fetch an entry that could have a toasted field. We will need to be a bit
wary of adding toast tables to low-level catalogs that don't have them
already.) An additional benefit is that subsequent uses of the syscache
entry should be faster, since they won't have to detoast the field.
Back-patch to all supported versions. The problem is significantly harder
to reproduce in pre-9.0 releases, because of their willingness to flush
every entry in a syscache whenever the underlying catalog is vacuumed
(cf CatalogCacheFlushRelation); but there is still a window for trouble.
bgwriter is now a much less important process, responsible for page
cleaning duties only. checkpointer is now responsible for checkpoints
and so has a key role in shutdown. Later patches will correct doc
references to the now old idea that bgwriter performs checkpoints.
Has beneficial effect on performance at high write rates, but mainly
refactoring to more easily allow changes for power reduction by
simplifying previously tortuous code around required to allow page
cleaning and checkpointing to time slice in the same process.
Patch by me, Review by Dickson Guedes
The existing scan-direction-sensitive tests were overly complex, and
failed to stop the scan in cases where it's perfectly legitimate to do so.
Per bug #6278 from Maksym Boguk.
Back-patch to 8.3, which is as far back as the patch applies easily.
Doesn't seem worth sweating over a relatively minor performance issue in
8.2 at this late date. (But note that this was a performance regression
from 8.1 and before, so 8.2 is being left as an outlier.)
The POSIX spec defines locale fields for controlling the ordering of the
value, sign, and currency symbol in monetary output, but cash_out only
supported a small subset of these options. Fully implement p/n_sign_posn,
p/n_cs_precedes, and p/n_sep_by_space per spec. Fix up cash_in so that
it will accept all these format variants.
Also, make sure that thousands_sep is only inserted to the left of the
decimal point, as required by spec.
Per bug #6144 from Eduard Kracmar and discussion of bug #6277. This patch
includes some ideas from Alexander Lakhin's proposed patch, though it is
very different in detail.
Make sure that it considers all the possibilities that the old code did,
instead of trying only one possibility per character position. To keep the
runtime in bounds, instead tweak the character incrementers to not try
every possible multibyte character code. Remove unnecessary logic to
restore the old character value on failure. Additional comment and
formatting cleanup.
cash_out failed to handle multiple-byte thousands separators, as per bug
#6277 from Alexander Law. In addition, cash_in didn't handle that either,
nor could it handle multiple-byte positive_sign. Both routines failed to
support multiple-byte mon_decimal_point, which I did not think was worth
changing, but at least now they check for the possibility and fall back to
using '.' rather than emitting invalid output. Also, make cash_in handle
trailing negative signs, which formerly it would reject. Since cash_out
generates trailing negative signs whenever the locale tells it to, this
last omission represents a fail-to-reload-dumped-data bug. IMO that
justifies patching this all the way back.
This infrastructure doesn't in any way guarantee that the character
we produce will sort before the one we incremented; but it does at least
make it much more likely that we'll end up with something that is a valid
character, which improves our chances.
Kyotaro Horiguchi, with various adjustments by me.
We need not wait until the commit record is durably on disk, because
in the event of a crash the page we're updating with hint bits will
be gone anyway. Per off-list report from Heikki Linnakangas, this
can significantly degrade the performance of unlogged tables; I was
able to show a 2x speedup from this patch on a pgbench run with scale
factor 15. In practice, this will mostly help small, heavily updated
tables, because on larger tables you're unlikely to run into the same
row again before the commit record makes it out to disk.
Make sure ecpg/include/ is rebuilt before the other subdirectories,
so that ecpg_config.h is up to date. This is not likely to matter
during production builds, only development, so no back-patch.
one lock per backend or auxiliary process - the need for a lock for each
aux processes was not accounted for in NumLWLocks(). No-one noticed,
because the three locks needed for the three aux processes fit into the
few extra lwlocks we allocate for 3rd party modules that don't call
RequestAddinLWLocks() (NUM_USER_DEFINED_LWLOCKS, 4 by default).
The original implementation of ELSIF in plpgsql converted the construct
into nested simple IF statements. This was prone to stack overflow with
long ELSIF lists, in two different ways. First, it's difficult to generate
the parsetree without using right-recursion in the bison grammar, and
that's prone to parser stack overflow since nothing can be reduced until
the whole list has been read. Second, we'd recurse during execution, thus
creating an unnecessary risk of execution-time stack overflow. Rewrite
so that the ELSIF list is represented as a flat list, scanned via iteration
not recursion, and generated through left-recursion in the grammar.
Per a gripe from Håvard Kongsgård.
We should generally use left-recursion not right-recursion to parse lists.
Bison hasn't got any built-in way to check for this type of inefficiency,
and I didn't find anything on the net in a quick search, so I wrote a
little Perl script to do it. Add to src/tools/ so we don't have to
re-invent this wheel next time we wonder if we're doing anything stupid.
Currently, the only place that seems to need fixing is plpgsql's stmt_else
production, so the problem doesn't appear to be common enough to warrant
trying to include such a test in our standard build process. If we did
want to do that, we'd need a way to ignore some false positives, such as
a_expr := '-' a_expr
If the right-hand side of a semijoin is unique, then we can treat it like a
normal join (or another way to say that is: we don't need to explicitly
unique-ify the data before doing it as a normal join). We were recognizing
such cases when the RHS was a sub-query with appropriate DISTINCT or GROUP
BY decoration, but there's another way: if the RHS is a plain relation with
unique indexes, we can check if any of the indexes prove the output is
unique. Most of the infrastructure for that was there already in the join
removal code, though I had to rearrange it a bit. Per reflection about a
recent example in pgsql-performance.
Add option for parallel streaming of the transaction log while a
base backup is running, to get the logfiles before the server has
removed them.
Also add a tool called pg_receivexlog, which streams the transaction
log into files, creating a log archive without having to wait for
segments to complete, thus decreasing the window of data loss without
having to waste space using archive_timeout. This works best in
combination with archive_command - suggested usage docs etc coming later.
Use names like "RI_ConstraintTrigger_a_NNNN" for FK action triggers and
"RI_ConstraintTrigger_c_NNNN" for FK check triggers. This ensures the
action trigger fires first in self-referential cases where the very same
row update fires both an action and a check trigger. This change provides
a non-probabilistic solution for bug #6268, at the risk that it could break
client code that is making assumptions about the exact names assigned to
auto-generated FK triggers. Hence, change this in HEAD only. No need for
forced initdb since old triggers continue to work fine.
When a foreign-key constraint references another column of the same table,
row updates will queue both the PK's ON UPDATE action and the FK's CHECK
action in the same event. The ON UPDATE action must execute first, else
the CHECK will check a non-final state of the row and possibly throw an
inappropriate error, as seen in bug #6268 from Roman Lytovchenko.
Now, the firing order of multiple triggers for the same event is determined
by the sort order of their pg_trigger.tgnames, and the auto-generated names
we use for FK triggers are "RI_ConstraintTrigger_NNNN" where NNNN is the
trigger OID. So most of the time the firing order is the same as creation
order, and so rearranging the creation order fixes it.
This patch will fail to fix the problem if the OID counter wraps around or
adds a decimal digit (eg, from 99999 to 100000) while we are creating the
triggers for an FK constraint. Given the small odds of that, and the low
usage of self-referential FKs, we'll live with that solution in the back
branches. A better fix is to change the auto-generated names for FK
triggers, but it seems unwise to do that in stable branches because there
may be client code that depends on the naming convention. We'll fix it
that way in HEAD in a separate patch.
Back-patch to all supported branches, since this bug has existed for a long
time.
This allows different instances to use the eventlog with different
identifiers, by setting the event_source GUC, similar to how
syslog_ident works.
Original patch by MauMau, heavily modified by Magnus Hagander
Use the CommitDate not the AuthorDate, as the former is representative of
the order in which things went into the main repository, and the latter
isn't very; we now have instances where the AuthorDate is as much as a
month before the patch really went in. Also, get rid of the "commit order
inversions" heuristic, which turns out not to do anything very desirable.
Instead we just print commits in strict timestamp order, interpreting the
"timestamp" of a merged commit as its timestamp on the newest branch it
appears in. This fixes some cases where very ancient commits were being
printed relatively early in the report.
The uniqueness condition might fail to hold intra-transaction, and assuming
it does can give incorrect query results. Per report from Marti Raudsepp,
though this is not his proposed patch.
Back-patch to 9.0, where both these features were introduced. In the
released branches, add the new IndexOptInfo field to the end of the struct,
to try to minimize ABI breakage for third-party code that may be examining
that struct.
A transaction can export a snapshot with pg_export_snapshot(), and then
others can import it with SET TRANSACTION SNAPSHOT. The data does not
leave the server so there are not security issues. A snapshot can only
be imported while the exporting transaction is still running, and there
are some other restrictions.
I'm not totally convinced that we've covered all the bases for SSI (true
serializable) mode, but it works fine for lesser isolation modes.
Joachim Wieland, reviewed by Marko Tiikkaja, and rather heavily modified
by Tom Lane
No need to do "errcode(errcode_for_file_access())", just
"errcode_for_file_access()" is enough. The extra errcode() call is useless
but harmless, so there's no user-visible bug here. Nevertheless, backpatch
to 9.1 where this code were added.
Avoid possibly dumping core when pgstat_track_activity_query_size has a
less-than-default value; avoid uselessly searching for the query string
of a successfully-exited backend; don't bother putting out an ERRDETAIL if
we don't have a query to show; some other minor stylistic improvements.
Turns out that use of ShareUpdateExclusiveLock or ShareRowExclusiveLock
to protect DDL changes had gotten copied into several places that were
not touched by either of Simon's original patches for the feature, and
thus neither he nor I thought to revert them. (Indeed, it appears that
two of these uses were committed *after* the reversion, which just goes
to show that git merging is no panacea.) Change these places to use
AccessExclusiveLock again. If we ever manage to resurrect that feature,
we're going to have to think a bit harder about how to keep lock level
usage in sync for DDL operations that aren't within the AlterTable
infrastructure.
Two of these bugs are only in HEAD, but one is in the 9.1 branch too.
Alvaro found one of them, I found the other two.
To avoid minimize risk inside the postmaster, we subject this feature
to a number of significant limitations. We very much wish to avoid
doing any complex processing inside the postmaster, due to the
posssibility that the crashed backend has completely corrupted shared
memory. To that end, no encoding conversion is done; instead, we just
replace anything that doesn't look like an ASCII character with a
question mark. We limit the amount of data copied to 1024 characters,
and carefully sanity check the source of that data. While these
restrictions would doubtless be unacceptable in a general-purpose
logging facility, even this limited facility seems like an improvement
over the status quo ante.
Marti Raudsepp, reviewed by PDXPUG and myself
Essentially, the "IF EXISTS" portion was being ignored, and an error
thrown anyway if the opfamily did not exist.
I broke this in commit fd1843ff8979c0461fb3f1a9eab61140c977e32d; so
backpatch to 9.1.X.
Report and diagnosis by KaiGai Kohei.
There's no need to clamp the standby's xmin to be greater than
GetOldestXmin's result; if there were any such need this logic would be
hopelessly inadequate anyway, because it fails to account for
within-database versus cluster-wide values of GetOldestXmin. So get rid of
that, and just rely on sanity-checking that the xmin is not wrapped around
relative to the nextXid counter. Also, don't reset the walsender's xmin if
the current feedback xmin is indeed out of range; that just creates more
problems than we already had. Lastly, don't bother to take the
ProcArrayLock; there's no need to do that to set xmin.
Also improve the comments about this in GetOldestXmin itself.
Make it return empty strings when there are no more words to the left of
the current position, instead of sometimes returning NULL and other times
returning copies of the leftmost word. Also, fetch the words in one scan,
rather than the previous wasteful approach of starting from scratch for
each word. Make the code a bit harder to break when someone decides we
need more words of context, too. (There was actually a memory leak here,
because whoever added prev6_wd neglected to free it.)
extnamespace means something altogether different in this context.
Mostly by accident, this coding error (introduced in my commit
82a4a777d9) broke the buildfarm instead
of just silently doing the wrong thing.
This gets rid of a significant amount of duplicative code.
KaiGai Kohei, reviewed in earlier versions by Dimitri Fontaine, with
further review and cleanup by me.