- Really prepare statements
- Added more regression tests
- Added auto-prepare mode
- Use '$n' for positional variables, '?' is still possible via ecpg option
- Cleaned up the sources a little bit
First, we cannot assume that XLogAsyncCommitFlush guarantees hint bits will be
settable, because clog.c's inexact LSN bookkeeping results in windows where a
previously flushed transaction is considered unhintable because it shares an
LSN slot with a later unflushed transaction. But repair_frag requires
XMIN_COMMITTED to be correct so that it can distinguish tuples moved by the
current vacuum. Since not being able to set the bit is an uncommon corner
case, the most practical way of dealing with it seems to be to abandon
shrinking (ie, don't invoke repair_frag) when we find a non-dead tuple whose
XMIN_COMMITTED bit couldn't be set.
Second, it is possible for the same reason that a RECENTLY_DEAD tuple does not
get its XMAX_COMMITTED bit set during scan_heap. But by the time repair_frag
examines the tuple it might be possible to set the bit. We therefore must
take buffer content lock when calling HeapTupleSatisfiesVacuum a second time,
else we can get an Assert failure in SetBufferCommitInfoNeedsSave. This
latter bug is latent in existing releases, but I think it cannot actually
occur without async commit, since the first HeapTupleSatisfiesVacuum call
should always have set the bit. So I'm not going to back-patch it.
In passing, reduce the existing "cannot shrink relation" messages from NOTICE
to LOG level. The new message must be no higher than LOG if we don't want
unpredictable regression test failures, and consistency seems like a good
idea. Also arrange that only one such message is reported per VACUUM FULL;
in typical scenarios you could get spammed with many such messages, which
seems a bit useless.
enlarge the memory chunk in-place when it was feasible to do so. This turns
out to not work well at all for scenarios involving repeated cycles of
palloc/repalloc/pfree: the eventually freed chunks go into the wrong freelist
for the next initial palloc request, and so we consume memory indefinitely.
While that could be defended against, the number of cases where the
optimization can still be applied drops significantly, and adjusting the
initial sizes of StringInfo buffers makes it drop to almost nothing.
Seems better to just remove the extra complexity.
Per recent discussion and testing.
likewise increase the initial size of the scanner's literal buffer to 1024
(from 128). Instrumentation of the regression tests suggests that this
saves a useful amount of repalloc() traffic --- the number of calls occurring
during one set of tests drops from about 6900 to about 3900. The old sizes
were chosen in the late 90's with an eye to machines much smaller than
are common today.
regexp_split_to_table() within a single query. This is only a partial
solution, as it turns out that with enough matches per string these
functions can also tickle a repalloc() misbehavior. But fixing that
is a topic for a separate patch.
that cached compiled patterns will still be there when the function is next
called. Clean up looping logic, thereby fixing bug identified by Pavel
Stehule. Share setup code between the two functions, add some comments, and
avoid risky mixing of int and size_t variables. Clean up the documentation a
tad, and accept all the flag characters mentioned in table 9-19 rather than
just a subset.
constant flow of new connection requests could prevent the postmaster from
completing a shutdown or crash restart. This is done by labeling child
processes that are "dead ends", that is, we know that they were launched only
to tell a client that it can't connect. These processes are managed
separately so that they don't confuse us into thinking that we can't advance
to the next stage of a shutdown or restart sequence, until the very end
where we must wait for them to drain out so we can delete the shmem segment.
Per discussion of a misbehavior reported by Keaton Adams.
Since this code was baroque already, and my first attempt at fixing the
problem made it entirely impenetrable, I took the opportunity to rewrite it
in a state-machine style. That eliminates some duplicated code sections and
hopefully makes everything a bit clearer.
hash table is allocated in a child context of the agg node's memory
context, MemoryContextReset() will reset but *not* delete the child
context. Since ExecReScanAgg() proceeds to build a new hash table
from scratch (in a new sub-context), this results in leaking the
header for the previous memory context. Therefore, use
MemoryContextResetAndDeleteChildren() instead.
Credit: My colleague Sailesh Krishnamurthy at Truviso for isolating
the cause of the leak.
child memory contexts is indented two spaces to the right of its
parent context. This should make it easier to deduce the memory
context hierarchy from the output of MemoryContextStats().
between the setting of log_line_prefix and the setting of log_timezone. We
can't realistically set log_timezone any earlier than we do now, so the best
behavior seems to be to use GMT zone if any timestamps are to be logged during
early startup. Create a dummy zone variable with a minimal definition of GMT
(in particular it will never know about leap seconds), so that we can set it
up without reference to any external files.
as well as regular backends: if no regular backend launches before the autovac
launcher tries to start an autovac worker, the postmaster would get an Assert
fault due to calling PostmasterRandom before random_seed was initialized.
Cleanest solution seems to be to take the initialization of random_seed out
of ServerLoop and let PostmasterRandom do it for itself.
displayed in the postmaster log. This avoids Windows-specific problems with
localized time zone names that are in the wrong encoding, and generally seems
like a good idea to forestall other potential platform-dependent issues.
To preserve the existing behavior that all backends will log in the same time
zone, create a new GUC variable log_timezone that can only be changed on a
system-wide basis, and reference log-related calculations to that zone instead
of the TimeZone variable.
This fixes the issue reported by Hiroshi Saito that timestamps printed by
xlog.c startup could be improperly localized on Windows. We still need a
simpler patch for that problem in the back branches, however.
not bothering to initialize is_autovacuum for regular backends, meaning there
was a significant chance of the postmaster prematurely sending them SIGTERM
during database shutdown. Also, leaving the cancel key unset for an autovac
worker meant that any client could send it SIGINT, which doesn't sound
especially good either.
so that we will be able to create a cookie for all processes for CSVlogs.
It is set wherever MyProcPid is set. Take the opportunity to remove the now
unnecessary session-only restriction on the %s and %c escapes in log_line_prefix.
before reporting a transaction committed. Data consistency is still
guaranteed (unlike setting fsync = off), but a crash may lose the effects
of the last few transactions. Patch by Simon, some editorialization by Tom.
clauses in which one side or the other references both sides of the join
cannot be removed as redundant, because that expression won't have been
constrained below the join. Per report from Sergey Burladyan.
CVS HEAD does not contain this bug due to EquivalenceClass rewrite, but it
seems wise to include the regression test for it anyway.
with the recent patch to log temp file sizes at removal time. Doesn't seem
worth fixing since it's unused.
In passing, make a few elog messages conform to the message style guide.
named pg_toast_temp_nnn, alongside the pg_temp_nnn schemas used for the temp
tables themselves. This allows low-level code such as the relcache to
recognize that these tables are indeed temporary, which enables various
optimizations such as not WAL-logging changes and using local rather than
shared buffers for access. Aside from obvious performance benefits, this
provides a solution to bug #3483, in which other backends unexpectedly held
open file references to temporary tables. The scheme preserves the property
that TOAST tables are not in any schema that's normally in the search path,
so they don't conflict with user table names.
initdb forced because of changes in system view definitions.
sugar for PL/PgSQL set-returning functions that want to return the result
of evaluating a query; it should also be more efficient than repeated
RETURN NEXT statements. Based on an earlier patch from Pavel Stehule.
checking whether an IS NULL/IS NOT NULL clause is implied or refuted by
a strict function. Per example from Dawid Kuroczko.
Backpatch to 8.2 since this is arguably a performance bug.
and fsync WAL at convenient intervals. For the moment it just tries to
offload this work from backends, but soon it will be responsible for
guaranteeing a maximum delay before asynchronously-committed transactions
will be flushed to disk.
This is a portion of Simon Riggs' async-commit patch, committed to CVS
separately because a background WAL writer seems like it might be a good idea
independently of the async-commit feature. I rebased walwriter.c on
bgwriter.c because it seemed like a more appropriate way of handling signals;
while the startup/shutdown logic in postmaster.c is more like autovac because
we want walwriter to quit before we start the shutdown checkpoint.
I/O utilization, per discussion.
While at it, lower the autovacuum vacuum and analyze threshold values to 50
tuples. It is a bit higher (i.e. more conservative) than what I originally
proposed but much better than the old values for small tables.
against a Unix server, and Windows-specific server-side authentication
using SSPI "negotiate" method (Kerberos or NTLM).
Only builds properly with MSVC for now.
log_min_error_statement is active and there is some problem in logging the
current query string; for example, that it's too long to include in the log
message without running out of memory. This problem has existed since the
log_min_error_statement feature was introduced. No doubt the reason it
wasn't detected long ago is that 8.2 is the first release that defaults
log_min_error_statement to less than PANIC level.
Per report from Bill Moran.
truncated relation was deleted later in the WAL sequence. Since replay
normally auto-creates a relation upon its first reference by a WAL log entry,
failure is seen only if the truncate entry happens to be the first reference
after the checkpoint we're restarting from; which is a pretty unusual case but
of course not impossible. Fix by making truncate entries auto-create like
the other ones do. Per report and test case from Dharmendra Goyal.
when handed an invalidly-encoded pattern. The previous coding could get
into an infinite loop if pg_mb2wchar_with_len() returned a zero-length
string after we'd tested for nonempty pattern; which is exactly what it
will do if the string consists only of an incomplete multibyte character.
This led to either an out-of-memory error or a backend crash depending
on platform. Per report from Wiktor Wodecki.
a MIN or MAX aggregate call into an indexscan: the initplan is being made at
the current query nesting level and so we shouldn't increment query_level.
Though usually harmless, this mistake could lead to bogus "plan should not
reference subplan's variable" failures on complex queries. Per bug report
from David Sanchez i Gregori.
referencing table does not change the tuple's FK column(s), we don't bother
to check the PK table since the constraint was presumably already valid.
However, the check is still necessary if the tuple was inserted by our own
transaction, since in that case the INSERT trigger will conclude it need not
make the check (since its version of the tuple has been deleted). We got this
right for simple cases, but not when the insert and update are in different
subtransactions of the current top-level transaction; in such cases the FK
check would never be made at all. (Hence, problem dates back to 8.0 when
subtransactions were added --- it's actually the subtransaction version of a
bug fixed in 7.3.5.) Fix, and add regression test cases. Report and fix by
Affan Salman.
been broken since forever, but was not noticed because people seldom look
at raw parse trees. AFAIK, no impact on users except that debug_print_parse
might fail; but patch it all the way back anyway. Per report from Jeff Ross.
name. With this patch, it is always possible for the user to qualify a
plpgsql variable name if needed to avoid ambiguity. While there is much more
work to be done in this area, this simple change removes one unnecessary
incompatibility with Oracle. Per discussion.
theoretically vary depending on what the compile-time locale setting is.
Hence, force it to see LC_CTYPE=C to ensure consistent build results.
(It's likely that this makes no difference in practice, since our
specification for "identifier" surely includes both ends of any possible
uppercase/lowercase pair anyway. But it should silence warnings about
ambiguous character classes that are reported by some buildfarm members.)
sanely if the loop value overflows int32 on the way to the end value.
Avoid useless computation of "SELECT 1" when BY is omitted. Avoid some
type-punning between Datum and int4 that dates from the original coding.
from old versions of gcc. It's not clear to me that this is really
necessary for correctness, but less warnings are always good.
Per buildfarm results and local testing.
define pg_dlsym() as returning a PGFunction pointer, not just any
pointer-to-function. But many are not. Suppress compiler warnings
on platforms that aren't careful by inserting explicit casts at the
two call sites that didn't have a cast already. Per Stefan.
literally, whether quoted or not. Since we allow $ as a character within
identifiers, this behavior is useful, whereas the previous behavior of
treating it as the regexp ending anchor was nearly useless given that the
pattern is automatically anchored anyway. This affects the arguments of
psql's \d commands as well as pg_dump's -n and -t switches. Per discussion.
SIGQUIT) will be recognized and processed while waiting for input,
rather than only after something has been typed. Also make SIGQUIT
do the same thing as SIGTERM in single-user mode, ie, do a normal
shutdown and exit. Since it's relatively easy to provoke SIGQUIT
from the keyboard, people may try that instead of control-D, and we'd
rather this leads to orderly shutdown. Per report from Leon Mergen
and subsequent discussion.
we don't know at that point which relation OID to tell pgstat to forget.
The code was passing the relfilenode, which is incorrect, and could possibly
cause some other relation's stats to be zeroed out. While we could try to
clean this up, it seems much simpler and more reliable to let the next
invocation of pgstat_vacuum_tabstat() fix things; which indeed is how it
worked before I introduced the buggy code into 8.1.3 and later :-(.
Problem noticed by Itagaki Takahiro, fix is per subsequent discussion.
error message, by using PQconnectionUsedPassword() instead. Someday
we might be able to localize that error message, but not until this
coding technique has disappeared everywhere.
PGconn. Invent a new libpq connection-status function,
PQconnectionUsedPassword() that returns true if the server
demanded a password during authentication, false otherwise.
This may be useful to clients in general, but is immediately
useful to help plug a privilege escalation path in dblink.
Per list discussion and design proposed by Tom Lane.
ORDER BY <constant> as redundant. One is that this means query_planner()
has to canonicalize pathkeys even when the query jointree is empty;
the canonicalization was always a no-op in such cases before, but no more.
Also, we have to guard against thinking that a set-returning function is
"constant" for this purpose. Add a couple of regression tests for these
evidently under-tested cases. Per report from Greg Stark and subsequent
experimentation.
unwarranted liberties with int8 vs float8 values for these types.
Specifically, be sure to apply either hashint8 or hashfloat8 depending
on HAVE_INT64_TIMESTAMP. Per my gripe of even date.
checkpoint. The comment claimed that we could do this anytime after
setting the checkpoint REDO point, but actually BufferSync is relying
on the assumption that buffers dumped by other backends will be fsync'd
too. So we really could not do it any sooner than we are doing it.
Sequences and views could previously be renamed using ALTER TABLE, but
this was a repeated source of confusion for users. Update the docs,
and psql tab completion. Patch from David Fetter; various minor fixes
by myself.
This is a Linux kernel bug that apparently exists in every extant kernel
version: sometimes shmctl() will fail with EIDRM when EINVAL is correct.
We were assuming that EIDRM indicates a possible conflict with pre-existing
backends, and refusing to start the postmaster when this happens. Fortunately,
there does not seem to be any case where Linux can legitimately return EIDRM
(it doesn't track shmem segments in a way that would allow that), so we can
get away with just assuming that EIDRM means EINVAL on this platform.
Per reports from Michael Fuhr and Jon Lapham --- it's a bit surprising
we have not seen more reports, actually.
so that it responds to SIGQUIT reasonably promptly even on machines where
SA_RESTART signals restart a sleep from scratch. (This whole area could
stand some rethinking, but for now make it work like the other processes
do.) Also some marginal stylistic cleanups.
for it to die before telling the bgwriter to initiate shutdown checkpoint.
Since it's connected to shared memory, this seems more prudent than the
alternative of letting it quit asynchronously. Resolves my complaint
of yesterday about repeated shutdown checkpoints in CVS HEAD.
that are fired at end-of-statement (as is the normal case for foreign keys,
for example). In this situation the per-subxact deferred trigger context
is always empty when subtransaction exit is reached; so we could free it,
but were not doing so, leading to an intratransaction leak of 8K or more
per subtransaction. Per off-list example from Viatcheslav Kalinin
subsequent to bug #3418 (his original bug report omitted a foreign key
constraint needed to cause this leak).
Back-patch to 8.2; prior versions were not using per-subxact contexts
for deferred triggers, so did not have this leak.
memory context pointing at a context not long lived enough.
Also, create a fake PortalContext where to store the vac_context, if only
to avoid having it be a top-level memory context.
continue with the schedule. Change current uses of SIGINT to abort a worker
into SIGTERM, which keeps the old behaviour of terminating the process.
Patch from ITAGAKI Takahiro, with some editorializing of my own.
overruns (neither of which seem likely to be exploitable as security holes,
fortunately, since the provoker can't control the data written). One of
these is due to choosing to stomp on the output of a called function, which
is bad news in any case; make it treat the called functions' results as
read-only. Avoid some unnecessary palloc/pfree traffic too; it's not
really helpful to free small temporary objects, and again this is presuming
more than it ought to about the nature of the results of called functions.
Per report from Patrick Welche and additional code-reading by Imad.
The correct test for defined-ness is SvOK(sv), not anything involving
SvTYPE. Per bug #3415 from Matt Taylor.
Back-patch as far as 8.0; no apparent problem in 7.x.
over a fairly long period of time, rather than being spat out in a burst.
This happens only for background checkpoints carried out by the bgwriter;
other cases, such as a shutdown checkpoint, are still done at full speed.
Remove the "all buffers" scan in the bgwriter, and associated stats
infrastructure, since this seems no longer very useful when the checkpoint
itself is properly throttled.
Original patch by Itagaki Takahiro, reworked by Heikki Linnakangas,
and some minor API editorialization by me.
installations whose pg_config program does not appear first in the PATH.
Per gripe from Eddie Stanley and subsequent discussions with Fabien Coelho
and others.
by having the postmaster signal it when certain failures occur. This requires
the postmaster setting a flag in shared memory, but should be as safe as the
pmsignal.c code is.
Also make sure the launcher honor's a postgresql.conf change turning it off
on SIGHUP.
(which now deals only in optimizable statements), and put that code
into a new file parser/parse_utilcmd.c. This helps clarify and enforce
the design rule that utility statements shouldn't be processed during
the regular parse analysis phase; all interpretation of their meaning
should happen after they are given to ProcessUtility to execute.
(We need this because we don't retain any locks for a utility statement
that's in a plan cache, nor have any way to detect that it's stale.)
We are also able to simplify the API for parse_analyze() and related
routines, because they will now always return exactly one Query structure.
In passing, fix bug #3403 concerning trying to add a serial column to
an existing temp table (this is largely Heikki's work, but we needed
all that restructuring to make it safe).
output after each FETCH. This ensures that incremental results are
available to clients that are executing long-running SELECT queries
via the FETCH_COUNT feature.
parse_int() and with itself (strtod allows leading whitespace, so it
seems odd not to allow trailing whitespace). parse_bool remains
not-whitespace-friendly, but this is generically true for non-numeric
GUC variables, so I'll desist from changing it.
contain a wrong unit specification, per discussion.
In passing, fix the code to avoid unnecessary integer overflows when
converting units, and to detect overflows when they do occur.
actually works sanely, viz not 0 and not more than INT_MAX/1000
(else TimestampTzPlusMilliseconds can overflow). Per discussion with
Greg Stark. Since this is a superuser-only setting and there was not
previously any big reason to change it, not worth back-patching.
create table foo (bar int default null default 3);
due to not thinking about the special-case handling of DEFAULT NULL.
Problem noticed while investigating bug #3396.
test seems inessential right now since the only control path for not
getting the lock is via CHECK_FOR_INTERRUPTS which won't return control
to ProcSleep, but it would be important if we ever allow the deadlock
code to kill someone else's transaction instead of our own.
within a signal handler (this might be safe given the relatively narrow code
range in which the interrupt is enabled, but it seems awfully risky); do issue
more informative log messages that tell what is being waited for and the exact
length of the wait; minor other code cleanup. Greg Stark and Tom Lane
unreserved according to the grammar. The list of unreserved words has gotten
extensive enough that the unnecessary quoting is becoming a bit of an eyesore.
To do this, add knowledge of the keyword category to keywords.c's table.
(Someday we might be able to generate keywords.c's table and the keyword lists
in gram.y from a common source.) For the moment, lie about WITH's status in
the table so it will still get quoted --- this is because of the expectation
that WITH will become reserved when the SQL recursive-queries patch gets done.
I didn't force initdb because this affects nothing on-disk; but note that a
few regression tests have changed expected output.
profiling that CopyAttributeOutText was taking an unreasonable fraction of
the backend run time (like 66%!) on the following trivial test case:
$ time psql -c "copy (select repeat('xyzzy',50) from generate_series(1,10000000)) to stdout" regression >/dev/null
The time is all being spent on scanning the string for characters to be
escaped, which most of the time there aren't any of. Some tweaking to take
as many tests as possible out of the inner loop reduced the runtime of this
example by more than 10%. In a real-world case it wouldn't be as useful
a speedup, but it still seems worth adding a few lines here.
few lines in sql_exec_error_callback() by using the function source string
field that the patch added to SQL function cache entries. This doesn't work
because the fn_extra field isn't filled in yet during init_sql_fcache().
Probably it could be made to work, but it doesn't seem appropriate to contort
the main code paths to make an error-reporting path a tad faster. Per report
from Pavel Stehule.
an array of strings rather than an array of integers, and allow any simple
constant or identifier to be used in typmods; for example
create table foo (f1 widget(42,'23skidoo',point));
Of course the typmodin function has still got to pack this info into a
non-negative int32 for storage, but it's still a useful improvement in
flexibility, especially considering that you can do nearly anything if you
are willing to keep the info in a side table. We can get away with this
change since we have not yet released a version providing user-definable
typmods. Per discussion.
reassembled in the syslogger before writing to the log file. This prevents
partial messages from being written, which mucks up log rotation, and
messages from different backends being interleaved, which causes garbled
logs. Backport as far as 8.0, where the syslogger was introduced.
Tom Lane and Andrew Dunstan