PGconn. Invent a new libpq connection-status function,
PQconnectionUsedPassword() that returns true if the server
demanded a password during authentication, false otherwise.
This may be useful to clients in general, but is immediately
useful to help plug a privilege escalation path in dblink.
Per list discussion and design proposed by Tom Lane.
ORDER BY <constant> as redundant. One is that this means query_planner()
has to canonicalize pathkeys even when the query jointree is empty;
the canonicalization was always a no-op in such cases before, but no more.
Also, we have to guard against thinking that a set-returning function is
"constant" for this purpose. Add a couple of regression tests for these
evidently under-tested cases. Per report from Greg Stark and subsequent
experimentation.
unwarranted liberties with int8 vs float8 values for these types.
Specifically, be sure to apply either hashint8 or hashfloat8 depending
on HAVE_INT64_TIMESTAMP. Per my gripe of even date.
checkpoint. The comment claimed that we could do this anytime after
setting the checkpoint REDO point, but actually BufferSync is relying
on the assumption that buffers dumped by other backends will be fsync'd
too. So we really could not do it any sooner than we are doing it.
Sequences and views could previously be renamed using ALTER TABLE, but
this was a repeated source of confusion for users. Update the docs,
and psql tab completion. Patch from David Fetter; various minor fixes
by myself.
This is a Linux kernel bug that apparently exists in every extant kernel
version: sometimes shmctl() will fail with EIDRM when EINVAL is correct.
We were assuming that EIDRM indicates a possible conflict with pre-existing
backends, and refusing to start the postmaster when this happens. Fortunately,
there does not seem to be any case where Linux can legitimately return EIDRM
(it doesn't track shmem segments in a way that would allow that), so we can
get away with just assuming that EIDRM means EINVAL on this platform.
Per reports from Michael Fuhr and Jon Lapham --- it's a bit surprising
we have not seen more reports, actually.
so that it responds to SIGQUIT reasonably promptly even on machines where
SA_RESTART signals restart a sleep from scratch. (This whole area could
stand some rethinking, but for now make it work like the other processes
do.) Also some marginal stylistic cleanups.
for it to die before telling the bgwriter to initiate shutdown checkpoint.
Since it's connected to shared memory, this seems more prudent than the
alternative of letting it quit asynchronously. Resolves my complaint
of yesterday about repeated shutdown checkpoints in CVS HEAD.
that are fired at end-of-statement (as is the normal case for foreign keys,
for example). In this situation the per-subxact deferred trigger context
is always empty when subtransaction exit is reached; so we could free it,
but were not doing so, leading to an intratransaction leak of 8K or more
per subtransaction. Per off-list example from Viatcheslav Kalinin
subsequent to bug #3418 (his original bug report omitted a foreign key
constraint needed to cause this leak).
Back-patch to 8.2; prior versions were not using per-subxact contexts
for deferred triggers, so did not have this leak.
memory context pointing at a context not long lived enough.
Also, create a fake PortalContext where to store the vac_context, if only
to avoid having it be a top-level memory context.
continue with the schedule. Change current uses of SIGINT to abort a worker
into SIGTERM, which keeps the old behaviour of terminating the process.
Patch from ITAGAKI Takahiro, with some editorializing of my own.
overruns (neither of which seem likely to be exploitable as security holes,
fortunately, since the provoker can't control the data written). One of
these is due to choosing to stomp on the output of a called function, which
is bad news in any case; make it treat the called functions' results as
read-only. Avoid some unnecessary palloc/pfree traffic too; it's not
really helpful to free small temporary objects, and again this is presuming
more than it ought to about the nature of the results of called functions.
Per report from Patrick Welche and additional code-reading by Imad.
The correct test for defined-ness is SvOK(sv), not anything involving
SvTYPE. Per bug #3415 from Matt Taylor.
Back-patch as far as 8.0; no apparent problem in 7.x.
over a fairly long period of time, rather than being spat out in a burst.
This happens only for background checkpoints carried out by the bgwriter;
other cases, such as a shutdown checkpoint, are still done at full speed.
Remove the "all buffers" scan in the bgwriter, and associated stats
infrastructure, since this seems no longer very useful when the checkpoint
itself is properly throttled.
Original patch by Itagaki Takahiro, reworked by Heikki Linnakangas,
and some minor API editorialization by me.
installations whose pg_config program does not appear first in the PATH.
Per gripe from Eddie Stanley and subsequent discussions with Fabien Coelho
and others.
by having the postmaster signal it when certain failures occur. This requires
the postmaster setting a flag in shared memory, but should be as safe as the
pmsignal.c code is.
Also make sure the launcher honor's a postgresql.conf change turning it off
on SIGHUP.
(which now deals only in optimizable statements), and put that code
into a new file parser/parse_utilcmd.c. This helps clarify and enforce
the design rule that utility statements shouldn't be processed during
the regular parse analysis phase; all interpretation of their meaning
should happen after they are given to ProcessUtility to execute.
(We need this because we don't retain any locks for a utility statement
that's in a plan cache, nor have any way to detect that it's stale.)
We are also able to simplify the API for parse_analyze() and related
routines, because they will now always return exactly one Query structure.
In passing, fix bug #3403 concerning trying to add a serial column to
an existing temp table (this is largely Heikki's work, but we needed
all that restructuring to make it safe).
output after each FETCH. This ensures that incremental results are
available to clients that are executing long-running SELECT queries
via the FETCH_COUNT feature.
parse_int() and with itself (strtod allows leading whitespace, so it
seems odd not to allow trailing whitespace). parse_bool remains
not-whitespace-friendly, but this is generically true for non-numeric
GUC variables, so I'll desist from changing it.
contain a wrong unit specification, per discussion.
In passing, fix the code to avoid unnecessary integer overflows when
converting units, and to detect overflows when they do occur.
actually works sanely, viz not 0 and not more than INT_MAX/1000
(else TimestampTzPlusMilliseconds can overflow). Per discussion with
Greg Stark. Since this is a superuser-only setting and there was not
previously any big reason to change it, not worth back-patching.
create table foo (bar int default null default 3);
due to not thinking about the special-case handling of DEFAULT NULL.
Problem noticed while investigating bug #3396.
test seems inessential right now since the only control path for not
getting the lock is via CHECK_FOR_INTERRUPTS which won't return control
to ProcSleep, but it would be important if we ever allow the deadlock
code to kill someone else's transaction instead of our own.
within a signal handler (this might be safe given the relatively narrow code
range in which the interrupt is enabled, but it seems awfully risky); do issue
more informative log messages that tell what is being waited for and the exact
length of the wait; minor other code cleanup. Greg Stark and Tom Lane
unreserved according to the grammar. The list of unreserved words has gotten
extensive enough that the unnecessary quoting is becoming a bit of an eyesore.
To do this, add knowledge of the keyword category to keywords.c's table.
(Someday we might be able to generate keywords.c's table and the keyword lists
in gram.y from a common source.) For the moment, lie about WITH's status in
the table so it will still get quoted --- this is because of the expectation
that WITH will become reserved when the SQL recursive-queries patch gets done.
I didn't force initdb because this affects nothing on-disk; but note that a
few regression tests have changed expected output.
profiling that CopyAttributeOutText was taking an unreasonable fraction of
the backend run time (like 66%!) on the following trivial test case:
$ time psql -c "copy (select repeat('xyzzy',50) from generate_series(1,10000000)) to stdout" regression >/dev/null
The time is all being spent on scanning the string for characters to be
escaped, which most of the time there aren't any of. Some tweaking to take
as many tests as possible out of the inner loop reduced the runtime of this
example by more than 10%. In a real-world case it wouldn't be as useful
a speedup, but it still seems worth adding a few lines here.
few lines in sql_exec_error_callback() by using the function source string
field that the patch added to SQL function cache entries. This doesn't work
because the fn_extra field isn't filled in yet during init_sql_fcache().
Probably it could be made to work, but it doesn't seem appropriate to contort
the main code paths to make an error-reporting path a tad faster. Per report
from Pavel Stehule.
an array of strings rather than an array of integers, and allow any simple
constant or identifier to be used in typmods; for example
create table foo (f1 widget(42,'23skidoo',point));
Of course the typmodin function has still got to pack this info into a
non-negative int32 for storage, but it's still a useful improvement in
flexibility, especially considering that you can do nearly anything if you
are willing to keep the info in a side table. We can get away with this
change since we have not yet released a version providing user-definable
typmods. Per discussion.
reassembled in the syslogger before writing to the log file. This prevents
partial messages from being written, which mucks up log rotation, and
messages from different backends being interleaved, which causes garbled
logs. Backport as far as 8.0, where the syslogger was introduced.
Tom Lane and Andrew Dunstan
historically worked in some but not all cases, but as of 8.2 it failed for all
timezone formats. Fix, and add regression test cases to catch future
regressions in this area. Per gripe from Adam Witney.
regression driver into two parts and reusing half of it. Required to
run ECPG tests without a shell on MSVC builds.
Fix ECPG thread tests for MSVC build (incl output files).
Joachim Wieland and Magnus Hagander
with a plpgsql-defined cursor. The underlying mechanism for this is that the
main SQL engine will now take "WHERE CURRENT OF $n" where $n is a refcursor
parameter. Not sure if we should document that fact or consider it an
implementation detail. Per discussion with Pavel Stehule.
Along the way, allow FOR UPDATE in non-WITH-HOLD cursors; there may once
have been a reason to disallow that, but it seems to work now, and it's
really rather necessary if you want to select a row via a cursor and then
update it in a concurrent-safe fashion.
Original patch by Arul Shaji, rather heavily editorialized by Tom Lane.
pseudo HeapScanDesc created for a bitmap heap scan. This avoids some useless
overhead during a bitmap scan startup, in particular invoking the syncscan
code. (We might someday want to do that, but right now it's merely useless
contention for shared memory, to say nothing of possibly pushing useful
entries out of syncscan's small LRU list.) This also allows elimination of
ugly pgstat_discount_heap_scan() kluge.
results due to syncscan patch, when shared_buffers is small enough. Per
buildfarm reports and some local testing with shared_buffers set to the
lowest value considered by initdb.
large inputs. Also cause it to error out immediately if the result will
overflow, instead of grinding through a lot of calculation first.
Per gripe from Jim Nasby.
causes a division-by-zero error in the vacuum code. This can happen when there
are more workers than cost limit units.
Per report from Galy Lee in
<200705310914.l4V9E6JA094603@wwwmaster.postgresql.org>.
value for the vacuum code. Instead, make zero signify getting the value from a
higher level configuration facility, just like -1 in the original coding. We
still document that -1 is the value that disables the feature, to avoid
confusing the user unnecessarily.
Reported by Galy Lee in <200705310914.l4V9E6JA094603@wwwmaster.postgresql.org>;
per subsequent discussion.
which is the only state in which it's safe to initiate database queries.
It turns out that all but two of the callers thought that's what it meant;
and the other two were using it as a proxy for "will GetTopTransactionId()
return a nonzero XID"? Since it was in fact an unreliable guide to that,
make those two just invoke GetTopTransactionId() always, then deal with a
zero result if they get one.
for each temp file, rather than once per sort or hashjoin; this allows
spreading the data of a large sort or join across multiple tablespaces.
(I remain dubious that this will make any difference in practice, but certain
people insisted.) Arrange to cache the results of parsing the GUC variable
instead of recomputing from scratch on every demand, and push usage of the
cache down to the bottommost fd.c level.
were accepted by prior Postgres releases. This takes care of the loose end
left by the preceding patch to downgrade implicit casts-to-text. To avoid
breaking desirable behavior for array concatenation, introduce a new
polymorphic pseudo-type "anynonarray" --- the added concatenation operators
are actually text || anynonarray and anynonarray || text.
from the other string-category types; this eliminates a lot of surprising
interpretations that the parser could formerly make when there was no directly
applicable operator.
Create a general mechanism that supports casts to and from the standard string
types (text,varchar,bpchar) for *every* datatype, by invoking the datatype's
I/O functions. These new casts are assignment-only in the to-string direction,
explicit-only in the other, and therefore should create no surprising behavior.
Remove a bunch of thereby-obsoleted datatype-specific casting functions.
The "general mechanism" is a new expression node type CoerceViaIO that can
actually convert between *any* two datatypes if their external text
representations are compatible. This is more general than needed for the
immediate feature, but might be useful in plpgsql or other places in future.
This commit does nothing about the issue that applying the concatenation
operator || to non-text types will now fail, often with strange error messages
due to misinterpreting the operator as array concatenation. Since it often
(not always) worked before, we should either make it succeed or at least give
a more user-friendly error; but details are still under debate.
Peter Eisentraut and Tom Lane
a session regardless of the existence of cached plans. The plancache
only needs to be invalidated so that rules affected by the new setting
will be reflected in the new query plans.
Jan
- Fix possible deadlock between UPDATE and VACUUM queries. Bug never was
observed in 8.2, but it still exist there. HEAD is more sensitive to
bug after recent "ring" of buffer improvements.
- Fix WAL creation: if parent page is stored as is after split then
incomplete split isn't removed during replay. This happens rather rare, only
on large tables with a lot of updates/inserts.
- Fix WAL replay: there was wrong test of XLR_BKP_BLOCK_* for left
page after deletion of page. That causes wrong rightlink field: it pointed
to deleted page.
- add checking of match of clearing incomplete split
- cleanup incomplete split list after proceeding
All of this chages doesn't change on-disk storage, so backpatch...
But second point may be an issue for replaying logs from previous version.
to be cases when at least Windows 2000 can do this even though select
just indicated that the socket is readable.
Per report and analysis from Cyril VELTER.
(Possibly release notes material, lest users be confused.)
The --quiet option is now obsolete and without effect in createdb,
createuser, dropdb, dropuser; kept for compatibility but marked for
removal in 8.4.
Progress messages when acting on all databases now go to stdout instead
of stderr, since they are not in fact errors.
Ordered options in reindexdb reference page alphabetically, like in
other programs' pages.
tablespace(s) in which to store temp tables and temporary files. This is a
list to allow spreading the load across multiple tablespaces (a random list
element is chosen each time a temp object is to be created). Temp files are
not stored in per-database pgsql_tmp/ directories anymore, but per-tablespace
directories.
Jaime Casanova and Albert Cervera, with review by Bernd Helmle and Tom Lane.
and most especially for UTF8. Remove unnecessary special cases for bytea
processing and single-byte charset ILIKE. a ILIKE b is now processed as
lower(a) LIKE lower(b) in all cases. The code is now considerably simpler. All
comparisons are now performed byte-wise, and the text and pattern are also
advanced byte-wise where it is safe to do so - essentially where a wildcard is
not being matched.
Andrew Dunstan, from an original patch by ITAGAKI Takahiro, with ideas from
Tom Lane and Mark Mielke.
wrong data when dumping a bufferload that crosses a component-file boundary.
This probably has not been seen in the wild because (a) component files are
normally 1GB apiece and (b) non-block-aligned buffer usage is relatively
rare. But it's fairly easy to reproduce a problem if one reduces RELSEG_SIZE
in a test build. Kudos to Kurt Harriman for spotting the bug.
type. Also, add explicit casts between boolean and text/varchar. Both
of these changes are for conformance with SQL:2003.
Update the regression tests, bump the catversion.
will exit before failing because of conflicting DB usage. Per discussion,
this seems a good idea to help mask the fact that backend exit takes nonzero
time. Remove a couple of thereby-obsoleted sleeps in contrib and PL
regression test sequences.
selecting power-of-2, rather than prime, numbers of buckets in hash joins.
If the hash functions are doing their jobs properly by making all hash bits
equally random, this is good enough, and it saves expensive integer division
and modulus operations.
delivering a well-randomized hash value. I got religion on this after
observing that performance of multi-batch hash join degrades terribly if the
higher-order bits of hash values aren't random, as indeed was true for say
hashes of small integer values. It's now expected and documented that hash
functions should use hash_any or some comparable method to ensure that all
bits of their output are about equally random.
initdb forced because this change invalidates existing hash indexes. For the
same reason, this isn't back-patchable; the hash join performance problem
will get a band-aid fix in the back branches.
EXPLAIN-only operation was a little too short; it skipped initializing the
node's result tuple type, which may be needed depending on what's above the
indexscan node. Call ExecAssignResultTypeFromTL before exiting. (For good
luck I moved up the ExecAssignScanProjectionInfo call as well, so that
everything except indexscan-specific initialization will still be done.)
Per example from Grant Finnemore.
index key columns always have the type expected by the index's associated
operators, ie, we add RelabelType nodes when dealing with binary-compatible
index opclasses. This is needed to get varchar indexes to play nicely with
the new EquivalenceClass machinery, as per recent gripe from Josh Berkus that
CVS HEAD was failing to match a varchar index column to a constant restriction
in the query.
It seems likely that this change will allow removal of a lot of ugly ad-hoc
RelabelType-stripping that the planner has traditionally done while matching
expressions to other expressions, but I'll worry about that some other day.
fail when used in a deferred trigger. Bug goes back to 8.0; no doubt the
reason it hadn't been noticed is that we've been discouraging use of
user-defined constraint triggers. Per report from Frank van Vugt.
buffers, rather than blowing out the whole shared-buffer arena. Aside from
avoiding cache spoliation, this fixes the problem that VACUUM formerly tended
to cause a WAL flush for every page it modified, because we had it hacked to
use only a single buffer. Those flushes will now occur only once per
ring-ful. The exact ring size, and the threshold for seqscans to switch into
the ring usage pattern, remain under debate; but the infrastructure seems
done. The key bit of infrastructure is a new optional BufferAccessStrategy
object that can be passed to ReadBuffer operations; this replaces the former
StrategyHintVacuum API.
This patch also changes the buffer usage-count methodology a bit: we now
advance usage_count when first pinning a buffer, rather than when last
unpinning it. To preserve the behavior that a buffer's lifetime starts to
decrease when it's released, the clock sweep code is modified to not decrement
usage_count of pinned buffers.
Work not done in this commit: teach GiST and GIN indexes to use the vacuum
BufferAccessStrategy for vacuum-driven fetches.
Original patch by Simon, reworked by Heikki and again by Tom.
"microsecond" and "millisecond" units were not considered valid input
by themselves, which caused inputs like "1 millisecond" to be rejected
erroneously.
Update the docs, add regression tests, and backport to 8.2 and 8.1
or abort within a backend; rearrange InitPostgres processing to make it so.
Revealed by just-added Asserts along with ECPG regression tests (hm, I wonder
why the core regression tests didn't expose it?). This possibly is another
reason for missing stats updates ...
and aborted transactions have different effects; also teach it not to assume
that prepared transactions are always committed.
Along the way, simplify the pgstats API by tying counting directly to
Relations; I cannot detect any redeeming social value in having stats
pointers in HeapScanDesc and IndexScanDesc structures. And fix a few
corner cases in which counts might be missed because the relation's
pgstat_info pointer hadn't been set.
inheritance child of an UPDATE/DELETE target relation can be excluded by
constraints. I had rearranged some code in set_append_rel_pathlist() to
avoid "useless" work when a child is excluded, but overdid it and left
the child with no cheapest_path entry, causing possible failure later
if the appendrel was involved in a join. Also, it seems that the dummy
plan generated by inheritance_planner() when all branches are excluded
has to be a bit less dummy now than was required in 8.2.
Per report from Jan Wieck. Add his test case to the regression tests.
and/or create plans for hypothetical situations; in particular, investigate
plans that would be generated using hypothetical indexes. This is a
heavily-rewritten version of the hooks proposed by Gurjeet Singh for his
Index Advisor project. In this formulation, the index advisor can be
entirely a loadable module instead of requiring a significant part to be
in the core backend, and plans can be generated for hypothetical indexes
without requiring the creation and rolling-back of system catalog entries.
The index advisor patch as-submitted is not compatible with these hooks,
but it needs significant work anyway due to other 8.2-to-8.3 planner
changes. With these hooks in the core backend, development of the advisor
can proceed as a pgfoundry project.
what a Var node refers to. This is no longer necessary because the new
flat-range-table representation of plan trees makes it relatively easy to dig
down through child plan levels to find the original reference; and to keep
doing it that way, we'd have to store joinaliasvars lists in flattened RTEs,
as demonstrated by bug report from Leszek Trenkner. This change makes
varnoold/varoattno truly just debug aids, which wasn't quite the case before.
Perhaps we should drop them, or only have them in assert-enabled builds?
in cases where a sub-SELECT inserts a WHERE clause between two outer joins,
that clause may prevent us from re-ordering the two outer joins. The code
was considering only the joins' own ON-conditions in determining reordering
safety, which is not good enough. Add a "delay_upper_joins" flag to
OuterJoinInfo to flag that we have detected such a clause and higher-level
outer joins shouldn't be permitted to commute with this one. (This might
seem overly coarse, but given the current rules for OJ reordering, it's
sufficient AFAICT.)
The failure case is actually pretty narrow: it needs a WHERE clause within
the RHS of a left join that checks the RHS of a lower left join, but is not
strict for that RHS (else we'd have simplified the lower join to a plain
join). Even then no failure will be manifest unless the planner chooses to
rearrange the join order.
Per bug report from Adam Terrey.
cheapest-startup-cost innerjoin indexscans, and make joinpath.c consider
both of these (when different) as the inside of a nestloop join. The
original design was based on the assumption that indexscan paths always
have negligible startup cost, and so total cost is the only important
figure of merit; an assumption that's obviously broken by bitmap
indexscans. This oversight could lead to choosing poor plans in cases
where fast-start behavior is more important than total cost, such as
LIMIT and IN queries. 8.1-vintage brain fade exposed by an example from
Chuck D.
is using mark/restore but not rewind or backward-scan capability. Insert a
materialize plan node between a mergejoin and its inner child if the inner
child is a sort that is expected to spill to disk. The materialize shields
the sort from the need to do mark/restore and thereby allows it to perform
its final merge pass on-the-fly; while the materialize itself is normally
cheap since it won't spill to disk unless the number of tuples with equal
key values exceeds work_mem.
Greg Stark, with some kibitzing from Tom Lane.
- Function renamed to "xpath".
- Function is now strict, per discussion.
- Return empty array in case when XPath expression detects nothing
(previously, NULL was returned in such case), per discussion.
- (bugfix) Work with fragments with prologue: select xpath('/a',
'<?xml version="1.0"?><a /><b />'); // now XML datum is always wrapped
with dummy <x>...</x>, XML prologue simply goes away (if any).
- Some cleanup.
Nikolay Samokhvalov
Some code cleanup and documentation work by myself.
WAL records that shows whether it is safe to remove full-page images
(ie, whether or not an on-line backup was in progress when the WAL entry
was made). Also make provision for an XLOG_NOOP record type that can be
used to fill in the extra space when decompressing the data for restore.
This is the portion of Koichi Suzuki's "full page writes" patch that
has to go into the core database. The remainder of that work is two
external compression and decompression programs, which for the time being
will undergo separate development on pgfoundry. Per discussion.
Also, twiddle the handling of BTREE_SPLIT records to ensure it'll be
possible to compress them (the previous coding caused essential info
to be omitted). The other commonly-used record types seem OK already,
with the possible exception of GIN and GIST WAL records, which I don't
understand well enough to opine on.
FreezeXid introduced in a recent commit, so there isn't any data loss in this
approach.
Doing it causes ALTER TABLE (or rather, the forms of it that cause a full table
rewrite) to be affected as well. In this case, the frozen point is RecentXmin,
because after the rewrite all the tuples are relabeled with the rewriting
transaction's Xid.
TOAST tables are fixed automatically as well, as fallout of the way they were
already being handled in the respective code paths.
With this patch, there is no longer need to VACUUM tables for Xid wraparound
purposes that have been cleaned up via TRUNCATE or CLUSTER.
had been taught not to do that ages ago, the SSL code was helpfully bleating
anyway. Resolves some recent reports such as bug #3266; however the
underlying cause of the related bug #2829 is still unclear.
and inet_server_addr() fail if the client connected over a "scoped" IPv6
address. In this case getnameinfo() will return a string ending with
a poorly-standardized "%something" zone specifier, which these functions
try to feed to network_in(), which won't take it. So that we don't lose
functionality altogether, suppress the zone specifier before giving the
string to network_in(). Per report from Brian Hirt.
TODO: probably someday the inet type should support scoped IPv6 addresses,
and then this patch should be reverted.
Backpatch to 8.2 ... is it worth going further?
recompute the limit/offset immediately, so that the updated values are
available when the child's ReScan function is invoked. Add a regression
test for this, too. Bug is new in HEAD (due to the bounded-sorting patch)
so no need for back-patch.
I did not do anything about merging this signaling with chgParam processing,
but if we were to do that we'd still need to compute the updated values
at this point rather than during the first ProcNode call.
Per observation and test case from Greg Stark, though I didn't use his patch.
to avoid losing useful Xid information in not-so-old tuples. This makes
CLUSTER behave the same as VACUUM as far a tuple-freezing behavior goes
(though CLUSTER does not yet advance the table's relfrozenxid).
While at it, move the actual freezing operation in rewriteheap.c to a more
appropriate place, and document it thoroughly. This part of the patch from
Tom Lane.
avoid a later needless VACUUM for Xid-wraparound purposes. We can do this
since the table is known to be left empty, so no Xid remains on it.
Per discussion.
applied to live tuples older than a recent Xmin, not to tuples that may be part
of an update chain. Those still keep their original markings.
This patch makes it possible for CLUSTER to advance relfrozenxid, thus avoiding
the need of vacuuming the table for Xid wraparound purposes. That will be
patched separately.
Patch from Heikki Linnakangas.
there's an indirect dependency on the owner via the parent table. We were
already handling indexes that way, but not toast tables for some reason.
Saves a little catalog space and cuts down the verbosity of checkSharedDependencies
reports.
ActiveSnapshot. Having it affect ActiveSnapshot only in the unusual
case of needing to replan seems a bad idea, and there's also the problem
that the created snap might be in a relatively short-lived context, as
noted by Jan Wieck. Also, there's no need to force a new snap at all
unless we are called with no snap currently set, which is an unusual
case in itself.
and only a truncated log of the objects in the current database to the client.
Also, instead of reporting object counts for all databases on which the user
might own objects, report only as many as fit in the predefined line count.
This is to avoid flooding the client when the user owns too many objects,
which could cause problems.
Per report from Ed L. on April 4th and subsequent discussion.
more completely. The motivation for having it understand IS NULL at all was
to allow use of "foo IS NULL" as one of the subsets of a partitioning on
"foo", but as reported by Aleksander Kmetec, it wasn't really getting the job
done. Backpatch to 8.2 since this is arguably a performance bug.
named foo, would work but the other ordering would not. If a user-specified
type or table name collides with an existing auto-generated array name, just
rename the array type out of the way by prepending more underscores. This
should not create any backward-compatibility issues, since the cases in which
this will happen would have failed outright in prior releases.
Also fix an oversight in the arrays-of-composites patch: ALTER TABLE RENAME
renamed the table's rowtype but not its array type.
needs to check the new constraint against columns of derived domains too.
Also, make it error out if the domain to be modified is used within any
composite-type columns. Eventually we should support that case, but it seems
a bit painful, and not suitable for a back-patch. For the moment just let the
user know we can't do it.
Backpatch to 8.2, which is the only released version that allows nested
domains. Possibly the other part should be back-patched further.
and views (but not system catalogs, nor sequences or toast tables). Get rid
of the hardwired convention that a type's array type is named exactly "_type",
instead using a new column pg_type.typarray to provide the linkage. (It still
will be named "_type", though, except in odd corner cases such as
maximum-length type names.)
Along the way, make tracking of owner and schema dependencies for types more
uniform: a type directly created by the user has these dependencies, while a
table rowtype or auto-generated array type does not have them, but depends on
its parent object instead.
David Fetter, Andrew Dunstan, Tom Lane
numerics as "oprcanhash", and make the corresponding system catalog
updates. As a result, hash indexes, hashed aggregation, and hash
joins can now be used with the numeric type. Bump the catversion.
The only tricky aspect to doing this is writing a correct hash
function: it's possible for two Numerics to be equal according to
their equality operator, but have different in-memory bit patterns.
To cope with this, the hash function doesn't consider the Numeric's
"scale" or "sign", and explictly skips any leading or trailing
zeros in the Numeric's digit buffer (the current implementation
should suppress any such zeros, but it seems unwise to rely upon
this). See discussion on pgsql-patches for more details.
and for other compilers, insert a dummy exit() call so that they understand
PG_RE_THROW() doesn't return. Insert fflush(stderr) in ExceptionalCondition,
per recent buildfarm evidence that that might not happen automatically on some
platforms. And const-ify ExceptionalCondition's declaration while at it.
need be returned. We keep a heap of the current best N tuples and sift-up
new tuples into it as we scan the input. For M input tuples this means
only about M*log(N) comparisons instead of M*log(M), not to mention a lot
less workspace when N is small --- avoiding spill-to-disk for large M
is actually the most attractive thing about it. Patch includes planner
and executor support for invoking this facility in ORDER BY ... LIMIT
queries. Greg Stark, with some editorialization by moi.
pages it intends to zero immediately. Just to show there is some use for that
function besides WAL recovery :-).
Along the way, fold _hash_checkpage and _hash_pageinit calls into _hash_getbuf
and friends, instead of expecting callers to do that separately.
for local symbols, that shouldn't be exported. This patch excludes them,
cutting down about 10,000 exported symbols and decreasing the binary size
by 20%.
ReadOrZeroBuffer to fetch pages from beyond physical EOF. This would
usually work, but would cause problems for md.c if writes occurred
beyond a segment boundary when the previous segment file hadn't been
fully extended.
from the WAL data, don't bother to physically read it; just have bufmgr.c
return a zeroed-out buffer instead. This speeds recovery significantly,
and also avoids unnecessary failures when a page-to-be-overwritten has corrupt
page headers on disk. This replaces a former kluge that accomplished the
latter by pretending zero_damaged_pages was always ON during WAL recovery;
which was OK when the kluge was put in, but is unsafe when restoring a WAL
log that was written with full_page_writes off.
Heikki Linnakangas
true at the very end of its processing, the update is broadcast via a
shared-cache-inval message for the index; without this, existing backends that
already have relcache entries for the index might never see it become valid.
Also, force a relcache inval on the index's parent table at the same time,
so that any cached plans for that table are re-planned; this ensures that
the newly valid index will be used if appropriate. Aside from making
C.I.C. behave more reasonably, this is necessary infrastructure for some
aspects of the HOT patch. Pavan Deolasee, with a little further stuff from
me.
and TimestampDifference, to make coding clearer. I think this should also fix
the failure to start workers in platforms with low resolution timers, as
reported by Itagaki Takahiro.
isn't any place to throw the error to. If so, we should treat the error
as FATAL, just as we would have if it'd been thrown outside the PG_TRY
block to begin with.
Although this is clearly a *potential* source of bugs, it is not clear
at the moment whether it is an *actual* source of bugs; there may not
presently be any PG_TRY blocks in code that can be reached with no outer
longjmp catcher. So for the moment I'm going to be conservative and not
back-patch this. The change breaks ABI for users of PG_RE_THROW and hence
might create compatibility problems for loadable modules, so we should not
put it into released branches without proof that it's needed.
wrong thing when inlining polymorphic SQL functions, because it was using the
function's declared return type where it should have used the actual result
type of the current call. In 8.1 and 8.2 this causes obvious failures even if
you don't have assertions turned on; in 8.0 and 7.4 it would only be a problem
if the inlined expression were used as an input to a function that did
run-time type determination on its inputs. Add a regression test, since this
is evidently an under-tested area.
from time_t to TimestampTz representation. This provides full gettimeofday()
resolution of the timestamps, which might be useful when attempting to
do point-in-time recovery --- previously it was not possible to specify
the stop point with sub-second resolution. But mostly this is to get
rid of TimestampTz-to-time_t conversion overhead during commit. Per my
proposal of a day or two back.
messages to the stats collector. This avoids the problem that enabling
stats_row_level for autovacuum has a significant overhead for short
read-only transactions, as noted by Arjen van der Meijden. We can avoid
an extra gettimeofday call by piggybacking on the one done for WAL-logging
xact commit or abort (although that doesn't help read-only transactions,
since they don't WAL-log anything).
In my proposal for this, I noted that we could change the WAL log entries
for commit/abort to record full TimestampTz precision, instead of only
time_t as at present. That's not done in this patch, but will be committed
separately.
We can just palloc, instead of using makeNode, when we are going to
overwrite the whole node anyway in the FLATCOPY macro. Also, use
FLATCOPY instead of copyObject for common node types Var and Const.
look through a freelist for a chunk of adequate size. For a long time
now, all elements of a given freelist have been exactly the same
allocated size, so we don't need a loop. Since the loop never iterated
more than once, you'd think this wouldn't matter much, but it makes a
noticeable savings in a simple test --- perhaps because the compiler
isn't optimizing on a mistaken assumption that the loop would repeat.
AllocSetAlloc is called often enough that saving even a couple of
instructions is worthwhile.
types of unspecified parameters when submitted via extended query protocol.
This worked in 8.2 but I had broken it during plancache changes. DECLARE
CURSOR is now treated almost exactly like a plain SELECT through parse
analysis, rewrite, and planning; only just before sending to the executor
do we divert it away to ProcessUtility. This requires a special-case check
in a number of places, but practically all of them were already special-casing
SELECT INTO, so it's not too ugly. (Maybe it would be a good idea to merge
the two by treating IntoClause as a form of utility statement? Not going to
worry about that now, though.) That approach doesn't work for EXPLAIN,
however, so for that I punted and used a klugy solution of running parse
analysis an extra time if under extended query protocol.
is in progress on the same hashtable. This seems the least invasive way to
fix the recently-recognized problem that a split could cause the scan to
visit entries twice or (with much lower probability) miss them entirely.
The only field-reported problem caused by this is the "failed to re-find
shared lock object" PANIC in COMMIT PREPARED reported by Michel Dorochevsky,
which was caused by multiply visited entries. However, it seems certain
that mdsync() is vulnerable to missing required fsync's due to missed
entries, and I am fearful that RelationCacheInitializePhase2() might be at
risk as well. Because of that and the generalized hazard presented by this
bug, back-patch all the supported branches.
Along the way, fix pg_prepared_statement() and pg_cursor() to not assume
that the hashtables they are examining will stay static between calls.
This is risky regardless of the newly noted dynahash problem, because
hash_seq_search() has never promised to cope with deletion of table entries
other than the just-returned one. There may be no bug here because the only
supported way to call these functions is via ExecMakeTableFunctionResult()
which will cycle them to completion before doing anything very interesting,
but it seems best to get rid of the assumption. This affects 8.2 and HEAD
only, since those functions weren't there earlier.
we can complete "TABLE". The previous coding only looked for "CREATE TEMP".
Note that I didn't add TEMPORARY to the list of suggested completions
after we've seen "CREATE", since TEMP is equivalent and more concise. But
if the user has already manually typed TEMPORARY, we may as well
complete TABLE for them.
RESET SESSION, RESET PLANS, and RESET TEMP are now DISCARD ALL,
DISCARD PLANS, and DISCARD TEMP, respectively. This is to avoid
confusion with the pre-existing RESET variants: the DISCARD
commands are not actually similar to RESET. Patch from Marko
Kreen, with some minor editorialization.
(it's so nice to have a buildfarm member that actively rejects naked
uses of strcasecmp). This coding is still pretty awful, though, since
it's going to be O(N^2) in the number of guc variables. May I direct
your attention to bsearch?
are mostly excluded by constraints: do the CE test a bit earlier to save
some adjust_appendrel_attrs() work on excluded children, and arrange to
use array indexing rather than rt_fetch() to fetch RTEs in the main body
of the planner. The latter is something I'd wanted to do for awhile anyway,
but seeing list_nth_cell() as 35% of the runtime gets one's attention.