"ctid IN (list)" will still work after we convert IN to ScalarArrayOpExpr.
Make some minor efficiency improvements while at it, such as ensuring that
multiple TIDs are fetched in physical heap order. And fix EXPLAIN so that
it shows what's really going on for a TID scan.
when we first read the page, rather than checking them one at a time.
This allows us to take and release the buffer content lock just once
per page, instead of once per tuple. Since it's a shared lock the
contention penalty for holding the lock longer shouldn't be too bad.
We can safely do this only when using an MVCC snapshot; else the
assumption that visibility won't change over time is uncool. Therefore
there are now two code paths depending on the snapshot type. I also
made the same change in nodeBitmapHeapscan.c, where it can be done always
because we only support MVCC snapshots for bitmap scans anyway.
Also make some incidental cleanups in the APIs of these functions.
Per a suggestion from Qingqing Zhou.
qualification when the underlying operator is indexable and useOr is true.
That is, indexkey op ANY (ARRAY[...]) is effectively translated into an
OR combination of one indexscan for each array element. This only works
for bitmap index scans, of course, since regular indexscans no longer
support OR'ing of scans. There are still some loose ends to clean up
before changing 'x IN (list)' to translate as a ScalarArrayOpExpr;
for instance predtest.c ought to be taught about it. But this gets the
basic functionality in place.
a TupleTableSlot: instead of calling ExecClearTuple, inline the needed
operations, so that we can avoid redundant steps. In particular, when
the old and new tuples are both on the same disk page, avoid releasing
and re-acquiring the buffer pin --- this saves work in both the bufmgr
and ResourceOwner modules. To make this improvement actually useful,
partially revert a change I made on 2004-04-21 that caused SeqNext
et al to call ExecClearTuple before ExecStoreTuple. The motivation
for that, to avoid grabbing the BufMgrLock separately for releasing
the old buffer and grabbing the new one, no longer applies. My
profiling says that this saves about 5% of the CPU time for an
all-in-memory seqscan.
generate their output tuple descriptors from their target lists (ie, using
ExecAssignResultTypeFromTL()). We long ago fixed things so that all node
types have minimally valid tlists, so there's no longer any good reason to
have two different ways of doing it. This change is needed to fix bug
reported by Hayden James: the fix of 2005-11-03 to emit the correct column
names after optimizing away a SubqueryScan node didn't work if the new
top-level plan node used ExecAssignResultTypeFromOuterPlan to generate its
tupdesc, since the next plan node down won't have the correct column labels.
a SubLink expression into a rule query. Pre-8.1 we essentially did this
unconditionally; 8.1 tries to do it only when needed, but was missing a
couple of cases. Per report from Kyle Bateman. Add some regression test
cases covering this area.
comment line where output as too long, and update typedefs for /lib
directory. Also fix case where identifiers were used as variable names
in the backend, but as typedefs in ecpg (favor the backend for
indenting).
Backpatch to 8.1.X.
process of dropping roles by dropping objects owned by them and privileges
granted to them, or giving the owned objects to someone else, through the
use of the data stored in the new pg_shdepend catalog.
Some refactoring of the GRANT/REVOKE code was needed, as well as ALTER OWNER
code. Further cleanup of code duplication in the GRANT code seems necessary.
Implemented by me after an idea from Tom Lane, who also provided various kind
of implementation advice.
Regression tests pass. Some tests for the new functionality are also added,
as well as rudimentary documentation.
tuple in-place, but instead passes back an all-new tuple structure if
any changes are needed. This is a much cleaner and more robust solution
for the bug discovered by Alexey Beschiokov; accordingly, revert the
quick hack I installed yesterday.
With this change, HeapTupleData.t_datamcxt is no longer needed; will
remove it in a separate commit in HEAD only.
doing heap_insert or heap_update, wipe out any extracted fields in
the TupleTableSlot containing the tuple, because they might not be valid
anymore if tuptoaster.c changed the tuple. Safe because slot must be
in the materialized state, but mighty ugly --- find a better answer!
the array (for array_push) or higher-dimensional array (for array_cat)
rather than decrementing it as before. This avoids generating lower
bounds other than one for any array operation within the SQL spec. Per
recent discussion.
Interestingly, this seems to have been the original behavior, because
while updating the docs I noticed that a large fraction of relevant
examples were *wrong* for the old behavior and are now right. Is it
worth correcting this in the back-branch docs?
recursed twice on its first argument, leading to exponential time spent
on a deep nest of COALESCEs ... such as a deeply nested FULL JOIN would
produce. Per report from Matt Carter.
functionality, but I still need to make another pass looking at places
that incidentally use arrays (such as ACL manipulation) to make sure they
are null-safe. Contrib needs work too.
I have not changed the behaviors that are still under discussion about
array comparison and what to do with lower bounds.
that was added to localbuf.c in 8.1; therefore, applying it to a temp table
left corrupt lookup state in memory. The only case where this had a
significant chance of causing problems was an ON COMMIT DELETE ROWS temp
table; the other possible paths left bogus state that was unlikely to
be used again. Per report from Csaba Nagy.
names from being added to pgindent's typedef list. The existance of
them caused weird formatting in the date/type files, and in keywords.c.
Backpatch to 8.1.X.
columns, shifting comment to the right when more than 150 'else if'
clauses were used, and update typedefs for 8.1.X.
NetBSD patched updated, with documentation.
sense and rename to "outerjoin_delayed" to more clearly reflect what it
means). I had decided that it was redundant in 8.1, but the folly of this
is exposed by a bug report from Sebastian Böck. The place where it's
needed is to prevent orindxpath.c from cherry-picking arms of an outer-join
OR clause to form a relation restriction that isn't actually legal to push
down to the relation scan level. There may be some legal cases that this
forbids optimizing, but we'd need much closer analysis to determine it.
slot of the topmost plan node when a trigger returns a modified tuple.
These appear to be the only places where a plan node's caller did not
treat the result slot as read-only, which is an assumption that nodeUnique
makes as of 8.1. Fixes trigger-vs-DISTINCT bug reported by Frank van Vugt.
surprising results when it's some other numeric type. This doesn't solve
the generic problem of surprising implicit casts to text, but it's a
low-impact way of making sure this particular case behaves sanely.
Per gripe from Harald Fuchs and subsequent discussion.
anything but transaction-exiting commands (ROLLBACK etc). We already rejected
Parse and Execute in such cases, so there seems little point in allowing Bind.
This prevents at least an Assert failure, and probably worse things, since
there's a lot of infrastructure that doesn't work when not in a live
transaction. We can also simplify the Bind logic a bit by rejecting messages
with a nonzero number of parameters, instead of the former kluge to silently
substitute NULL for each parameter. Per bug #2033 from Joel Stevenson.
on every index page they read; in particular to catch the case of an
all-zero page, which PageHeaderIsValid allows to pass. It turns out
hash already had this idea, but it was just Assert()ing things rather
than doing a straight error check, and the Asserts were partially
redundant with PageHeaderIsValid anyway. Per recent failure example
from Jim Nasby. (gist still needs the same treatment.)
to assume that the string pointer passed to set_ps_display is good forever.
There's no need to anyway since ps_status.c itself saves the string, and
we already had an API (get_ps_display) to return it.
I believe this explains Jim Nasby's report of intermittent crashes in
elog.c when %i format code is in use in log_line_prefix.
While at it, repair a previously unnoticed problem: on some platforms such as
Darwin, the string returned by get_ps_display was blank-padded to the maximum
length, meaning that lock.c's attempt to append " waiting" to it never worked.
create circularity of role memberships. This is a minimum-impact fix
for the problem reported by Florian Pflug. I thought about removing
the superuser_arg test from is_member_of_role() altogether, as it seems
redundant for many of the callers --- but not all, and it's way too late
in the 8.1 cycle to be making large changes. Perhaps reconsider this
later.
from a finished plan tree. We have to copy the output column names
(resname fields) from the SubqueryScan down to its child plan node;
else, if this is the topmost level of the plan, the wrong column names
will be delivered to the client. Per bug #2017 reported by Jolly Chen.
very narrow window in which SimpleLruReadPage or SimpleLruWritePage could
think that I/O was needed when it wasn't (and indeed the buffer had already
been assigned to another page). This would result in an Assert failure if
Asserts were enabled, and probably in silent data corruption if not.
Reported independently by Jim Nasby and Robert Creager.
I intend a more extensive fix when 8.2 development starts, but this is a
reasonably low-impact patch for the existing branches.
setting for the regression makefile, allowing Windows users to force locale
settings since Windows does not get its locale from the environment.
Per Petr Jelinek.
of postgres.imp file into BE_DLLLIBS macro. This makes the AIX build
work more like the Windows and Darwin builds, which have similar requirements
to mention a backend library when linking shared libraries that will be
dynamically loaded into the backend.
multixact's starting offset before the offset has been stored into the
SLRU file. A simple fix would be to hold the MultiXactGenLock until the
offset has been stored, but that looks like a big concurrency hit. Instead
rely on knowledge that unset offsets will be zero, and loop when we see
a zero. This requires a little extra hacking to ensure that zero is never
a valid value for the offset. Problem reported by Matteo Beccati, fix
ideas from Martijn van Oosterhout, Alvaro Herrera, and Tom Lane.
advance its usage_count. This includes writes of dirty buffers triggered
by bgwriter, checkpoint, or FlushRelationBuffers, as well as various
corner cases that really ought not count as accesses to the page.
Should make for some marginal improvement in the quality of our decisions
about when to recycle buffers. Per suggestion from ITAGAKI Takahiro.
inFromCl true, meaning that they will list out as explicit RTEs if they
are in a view or rule. Update comments about inFromCl to reflect the way
it's now actually used. Per recent discussion.
for an outer join; symptom is bogus error "RIGHT JOIN is only supported with
merge-joinable join conditions". Problem was that select_mergejoin_clauses
did its tests in the wrong order. We need to force left join not right join
for a merge join when there are non-mergeable join clauses; but the test for
this only accounted for mergejoinability of the clause operator, and not
whether the left and right Vars were of the proper relations. Per report
from Jean-Pierre Pelletier.
some small stylistic improvements in these functions. Also fix several
places where TMODULO() was being used with wrong-sized quotient argument,
creating a risk of overflow --- interval2tm was actually capable of going
into an infinite loop because of this.
to the main thread. This allows removal of WaitForSingleObjectEx() calls
from the main thread, thereby allowing us to re-enable Qingqing Zhou's
CHECK_FOR_INTERRUPTS performance improvement. Qingqing, Magnus, et al.
PQregisterThreadLock().
I also remove the crypt() mention in the libpq threading section and
added a single sentence in the client-auth manual page under crypt().
Crypt authentication is so old now that a separate paragraph about it
seemed unwise.
I also added a comment about our use of locking around pqGetpwuid().
WaitForSingleObjectEx is always called by CHECK_FOR_INTERRUPTS. This
should be reinstated but the setitimer() emulation will have to be
redesigned first.
a kernel call unless there's some evidence of a pending signal. This should
bring its performance on Windows into line with the Unix version. Problem
diagnosis and patch by Qingqing Zhou. Minor stylistic tweaks by moi ...
if it's broken, it's my fault.
properly advancing the CommandCounter between multiple sub-queries
generated by rules, we forgot to update the snapshot being used, so
that the successive sub-queries didn't actually see each others'
results. This is still not *exactly* like the semantics of normal
execution of the same queries, in that we don't take new transaction
snapshots and hence don't see changes from concurrently committed
commands, but I think that's OK and probably even preferable for
EXPLAIN ANALYZE.
a parameter in binary format. Also, add a TIP explaining how to use casts
in the query text to avoid needing to specify parameter types by OID.
Also fix bogus spacing --- apparently somebody expanded the tabs in the
example programs to 8 spaces instead of 4 when transposing them into SGML.
since it can take a fair amount of time and this can confuse boot scripts
that expect postmaster.pid to appear quickly. Move initialization of SSL
library and preloaded libraries to after that point, too, just for luck.
Per reports from Tony Caduto and others.
module. Don't rely on backend palloc semantics; in fact, best to not
use palloc at all, rather than #define'ing it to malloc, because that
just encourages errors of omission. Bug spotted by Volkan YAZICI,
but I went further than he did to fix it.
generated from subquery outputs: use the type info stored in the Var
itself. To avoid making ExecEvalVar and slot_getattr more complex
and slower, I split out the whole-row case into a separate ExecEval routine.
type ID information even when it's a record type. This is needed to
handle whole-row Vars referencing subquery outputs. Per example from
Richard Huxton.
optimization for subquery and function scan nodes: we can't just do it
unconditionally, we still have to check whether there is any need for
a whole-row Var. I had been thinking that these node types couldn't
have any system columns, which is true, but that loop is also checking
for attno zero, ie, whole-row Var. Fix comment to not be so misleading.
Per test case from Richard Huxton.
fix problems with replacement-string backslashes that aren't followed by
one of the expected characters, avoid giving the impression that
replace_text_regexp() is meant to be called directly as a SQL function,
etc.
the facility has been set, the facility gets set to LOCAL0 and cannot
be changed later. This seems reasonably plausible to happen, particularly
at higher debug log levels, though I am not certain it explains Han Holl's
recent report. Easiest fix is to teach the code how to change the value
on-the-fly, which is nicer anyway. I made the settings PGC_SIGHUP to
conform with log_destination.
regression=# select '23:59:59.9'::time(0);
time
----------
24:00:00
(1 row)
This is bad because:
regression=# select '24:00:00'::time(0);
ERROR: date/time field value out of range: "24:00:00"
The last example now works.
of client_min_messages (fatal + panic) are valid and also fixes a slight
issue with how psql tried to display error messages that aren't sent to
the client.
We often tell people to ignore errors in response to requests for things
like "drop if exists", but there's no good way to completely hide this
without upping client_min_messages past ERROR. When running a file like
SET client_min_messages TO 'FATAL';
DROP TABLE doesntexist;
with "psql -f filename" you get an error prefix of
"psql:/home/username/filename:3" even though there is no error message to
prefix because it isn't sent to the client.
Kris Jurka
This makes the error messages for PREPARE TRANSACTION, COMMIT PREPARED
etc. match the docs, which talk about "transaction identifier" not
"gid" or "global transaction identifier".
Steve Woodcock
make_restrictinfo_from_bitmapqual. The likelihood of finding duplicates
seems much less than in the AND-subclause case, and the cost much higher,
because OR lists with hundreds or even thousands of subclauses are not
uncommon. Per discussion with Ilia Kantor and andrew@supernews.
the wrong buffer dirty when trying to kill a dead index entry that's on
a page after the one it started on. No risk of data corruption, just
inefficiency, but still a bug.
pointers, to ensure that compilers won't rearrange accesses to occur
while we're not holding the buffer header spinlock. It's probably
not necessary to mark volatile in every single place in bufmgr.c,
but better safe than sorry. Per trouble report from Kevin Grittner.
whether we seem to be running in a uniprocessor or multiprocessor.
The adjustment rules could probably still use further tweaking, but
I'm convinced this should be a win overall.
valid type information if they are asked to fetch the values part of a
pg_statistic slot; these arguments are unneeded if fetching only the
numbers part. Use this to save a catcache lookup in btcostestimate,
which is looking like a bit of a hotspot in recent profiling. Not a
big savings, but since it's essentially free, might as well do it.
A RestrictInfo representing an OR clause now contains two versions of
the contained expression, one with sub-RestrictInfos and one without.
clause_selectivity() should descend to the version with sub-RestrictInfos
so that it has a chance of caching its results for the OR's sub-clauses.
Failing to do so resulted in redundant planner effort.
emit when given the --clean option, in favor of individual DROP ROLE
commands. The old technique could not possibly work in 8.1, and was
never a very good idea anyway IMHO. The DROP ROLE approach has the
defect that the DROPs will fail for roles that own objects or have
privileges, but perhaps we can improve that later.
ie removing shared-dependency entries, should happen before non-rollbackable
ones. That way a failure during the rollbackable part doesn't leave us
with inconsistent state.
traceable to grant options. As per my earlier proposal, a GRANT made by
a role member has to be recorded as being granted by the role that actually
holds the grant option, and not the member.
like '23:59:60' because of fractional-second roundoff problems. Trying
to control this upstream of the actual display code was hopeless; the right
way is to explicitly round fractional seconds in the display code and then
refigure the results if the fraction rounds up to 1. Per bug #1927.
to call krb5_sname_to_principal() always. Also, use krb_srvname rather
than the hardwired string 'postgres' as the appl_version string in the
krb5_sendauth/recvauth calls, to avoid breaking compatibility with PG
8.0. Magnus Hagander
testing ownership if the caller isn't interested in any GOPTION bits
(which is the common case). It did not matter in 8.0 where the ownership
test was just a trivial equality test, but it matters now.
level for unrecognized win32 error codes to LOG, and make messages
conform to style guide. Per old suggestion from Qingqing Zhou, which
seems to have gotten lost in the shuffle.
cache lookup in the success case. This won't help much for cases where
the given relation is far down the search path, but it does not hurt in
any cases either; and it requires only a little new code. Per gripe from
Jim Nasby about slowness of \d with many tables.
the parameter's name (if any) as the default column name for SELECT FROM
the function, rather than the function name as previously. I still think
this is a bad idea, but I lost the argument. Force decompilation of
function RTEs to specify full aliases always, to reduce the odds of this
decision breaking dumped views.
predicate_implied_by() to detect redundant filter conditions, but forgot
that predicate_implied_by() assumes its first argument contains only
immutable functions. Add a check to guarantee that. Also, test to see
if filter conditions can be discarded because they are redundant with
the predicate of a partial index.
generated by bitmap index scans. Along the way, simplify and speed up
the code for counting sequential and index scans; it was both confusing
and inefficient to be taking care of that in the per-tuple loops, IMHO.
initdb forced because of internal changes in pg_stat view definitions.
was created on a machine with alignment rules and floating-point format
similar to the current machine. Per recent discussion, this seems like
a good idea with the increasing prevalence of 32/64 bit environments.
argument as a 'regclass' value instead of a text string. The frontend
conversion of text string to pg_class OID is now encapsulated as an
implicitly-invocable coercion from text to regclass. This provides
backwards compatibility to the old behavior when the sequence argument
is explicitly typed as 'text'. When the argument is just an unadorned
literal string, it will be taken as 'regclass', which means that the
stored representation will be an OID. This solves longstanding problems
with renaming sequences that are referenced in default expressions, as
well as new-in-8.1 problems with renaming such sequences' schemas or
moving them to another schema. All per recent discussion.
Along the way, fix some rather serious problems in dbmirror's support
for mirroring sequence operations (int4 vs int8 confusion for instance).
the ProcessUtility case, resulting in an intratransaction memory leak
if a utility command actually did return any tuples, as reported by
Dmitry Karasik. Fix this and also make the behavior more consistent
for cases involving nested SPI operations and multiple query trees,
by ensuring that we store the state locally until it is ready to be
returned to the caller.
only the inner-side relation would be considered as potential equijoin clauses,
which is wrong because the condition doesn't necessarily hold above the point
of the outer join. Per test case from Kevin Grittner (bug#1916).
relocated after installation. We can't trust the installation paths
inserted into Makefile.global by configure, so instead we must get the
paths from pg_config. This requires extending pg_config to support all
the separately-configurable path names, but that was on TODO anyway.
outer relation is empty did not work, per test case from Patrick Welche.
It tried to use nodeHashjoin.c's high-level mechanisms for fetching an
outer-relation tuple, but that code expected the hash table to be filled
already. As patched, the code failed in corner cases such as having no
outer-relation tuples for the first hash batch. Revert and rewrite.
for int8 and related types. However we might be talking to a client
that has working int64; so pq_getmsgint64 really needs to check the
incoming value and throw an overflow error if we can't represent it
accurately.
"optimization". When we find a potentially useful joinclause, we
have to add all its other required_relids to the result, not only the
other clause_relids. They are different in the case of a joinclause
whose applicability has to be postponed due to outer join. We have
to include the extra rels because otherwise, after best_inner_indexscan
masks the join rels with index_outer_relids, it will always fail to
find the joinclause as applicable. Per report from Husam Tomeh.
in which invalid page data could be transiently written to disk by
concurrent bgwriter activity. There doesn't seem any risk of loss of
actual user data, but an empty page could possibly be left corrupt if a
crash occurs before the correct data gets written out. Pointed out by
Alvaro Herrera.
strings. This is consistent with SQL conventions, and since Bruce
already changed initdb in a way that assumed it worked like this, seems
we'd better make it work like this.
sake of brevity and clarity.
Make pg_reload_conf(), pg_rotate_logfile(), and pg_cancel_backend()
return a boolean rather than an integer to indicate success or failure.
Along the way, make some minor cleanups to dbsize.c -- in particular,
use elog() rather than ereport() for "shouldn't happen" error
conditions, and remove some of the more flagrant violations of the
Postgres indentation conventions.
Catalog version bumped.
bytes. This shouldn't make any difference on x86 machines, where the size
happened to be 16 bytes anyway, but on 64-bit machines and machines with
slock_t int or wider, it will speed array indexing and hopefully reduce
SMP cache contention effects. Per recent experimentation.
recovered. I did not see any actual leak while testing this in CVS tip,
but 8.0 definitely has a problem with leaking the space temporarily
palloc'd by BufferSync(). In any case this seems a good idea to forestall
similar problems in future. Per report from Arjen van der Meijden.
to drop connections unceremoniously. Also some other marginal cleanups:
don't query getsockopt() repeatedly if it fails, and avoid having the
apparent definition of struct Port depend on which system headers you
might have included or not. Oliver Jowett and Tom Lane.
> found in a pg_dump archive. It had problems with dollar-quote tags
broken across bufferload boundaries (this may explain bug report from
Rod Taylor), also with dollar-quote literals of the form $a$a$...,
and was also confused about the rules for backslash in double quoted
identifiers (ie, they're not special). Also put in placeholder support
for E'...' literals --- this will need more work later.
really the source or destination of the archive. I think this will
resolve recent complaints that password prompting is broken in pg_restore
on Windows. Note that password prompting and reading from stdin is an
unworkable combination on Windows ... but that was true anyway.
is certainly no longer immutable, but must indeed be marked volatile.
I wonder if it should use the value of now() (that is, transaction
start time) so that it could be marked stable. But it's probably not
important enough to be worth changing the code for ... indeed, I'm not
even going to force an initdb for this catalog change, seeing that we
just did one a few hours ago.
in the zic database or zone names found in the date token table. This
preserves the old ability to do AT TIME ZONE 'PST' along with the new
ability to do AT TIME ZONE 'PST8PDT'. Per gripe from Bricklen Anderson.
Also, fix some inconsistencies in usage of TZ_STRLEN_MAX --- the old
code had the potential for one-byte buffer overruns, though given
alignment considerations it's unlikely there was any real risk.
for procedural languages. This replaces the hard-wired table I had
originally proposed as a stopgap solution. For the moment, the initial
contents only include languages shipped with the core distribution.
as per my recent proposal. For now the template data is hard-wired in
proclang.c --- this should be replaced later by a new shared system
catalog, but we don't want to force initdb during 8.1 beta. This change
lets us cleanly load existing dump files even if they contain outright
wrong information about a PL's support functions, such as a wrong path
to the shared library or a missing validator function. Also, we can
revert the recent kluges to make pg_dump dump PL support functions that
are stored in pg_catalog.
While at it, I removed the code in pg_regress that replaced $libdir
with a hardcoded path for temporary installations. This is no longer
needed given our support for relocatable installations.
when there are extra resjunk columns in the child node. I found some
additional cases involving Append nodes that weren't handled by the
prior patch, and it's not clear how to fix them in the same way without
breaking inheritance cases. So the prudent path seems to be to narrow
the scope of the optimization.
has to recopy the input plan node's targetlist if it removes a
SubqueryScan node just below the non-projecting node. For simplicity
I made it recopy always. Per bug report from Allan Wang and Michael Fuhr.
on a page, as suggested by ITAGAKI Takahiro. Also, change a few places
that were using some other estimates of max-items-per-page to consistently
use MaxOffsetNumber. This is conservatively large --- we could have used
the new MaxHeapTuplesPerPage macro, or a similar one for index tuples ---
but those places are simply declaring a fixed-size buffer and assuming it
will work, rather than actively testing for overrun. It seems safer to
size these buffers in a way that can't overflow even if the page is
corrupt.
saves nearly 700kB in the default shared memory segment size, which seems
worthwhile, and it is a feature that many users won't use anyway. Per
Heikki's argument, there is no point in a compromise value --- those who
are using 2PC at all will probably want it at least equal to max_connections.
But we can't set it to zero by default without breaking the prepared_xacts
regression test.
so that the latter estimates the number of groups that grouping will
produce. This is needed because it is primarily query_planner that
makes the decision between fast-start and fast-finish plans, and in the
original coding it was unable to make more than a crude rule-of-thumb
choice when the query involved grouping. This revision helps us make
saner choices for queries like SELECT ... GROUP BY ... LIMIT, as in a
recent example from Mark Kirkwood. Also move the responsibility for
canonicalizing sort_pathkeys and group_pathkeys into query_planner;
this information has to be available anyway to support the first change,
and doing it this way lets us get rid of compare_noncanonical_pathkeys
entirely.
to copy the whole plan tree before invoking adjust_plan_varnos(); else
if there is any multiply-linked substructure, the latter might increment
some Var's varno twice. Previously there were some retail copyObject
calls inside adjust_plan_varnos, but it seems a lot safer to just dup the
whole tree first. Also, set_inner_join_references was trying to avoid
work by not recursing if a BitmapHeapScan's bitmapqualorig contained no
outer references; which was OK at the time the code was written, I think,
but now that create_bitmap_scan_plan removes duplicate clauses from
bitmapqualorig it is possible for that field to be NULL while outer
references still remain in the qpqual and/or contained indexscan nodes.
For safety, always recurse even if the BitmapHeapScan looks to be outer
reference free. Per reports from Michael Fuhr and Oleg Bartunov.
the parent table, even if the command that creates them is executed by
someone else (such as a superuser or a member of the owning role).
Per gripe from Michael Fuhr.
in interval_mul and interval_div. This avoids an optimization bug
in A Certain Company's compiler (and given their explanation, I wouldn't
be surprised if other compilers blow it too). Besides the code seems
more clear this way --- in the original formulation, you had to mentally
recognize the common subexpression in order to understand what was going
on.
IPv6 is obsoleted by recent Windows patch. Perform the runtime test
whenever HAVE_IPV6 is set. This should be OK since initdb can get
getaddrinfo from libpgport if needed.
use these instead of its previous hack of changing pg_class.reltriggers.
Documentation is lacking, will add that later.
Patch by Satoshi Nagayasu, review and some extra work by Tom Lane.
Windows. The test itself is bypassed in configure as discussed, and
libpq has been updated appropriately to allow it to build in thread-safe
mode.
Dave Page
after the fact. Fix bug with incorrect test for whether we are at end
of logfile segment. Arrange for writes triggered by XLogInsert's
is-cache-more-than-half-full test to synchronize with the cache boundaries,
so that in long transactions we tend to write alternating halves of the
cache rather than randomly chosen portions of it; this saves one more
write syscall per cache load.
erroring out as it has done for the last couple weeks. Document that this
form is now ignored because indexes can't usefully have different owners
from their parent tables. Fix pg_dump to not generate ALTER OWNER commands
for indexes.
discussion of getting around this by relaxing the checks made for regular
users, but I'm disinclined to toy with the security model right now,
so just special-case it for superusers where needed.
not accepting queries).
errmsg("database is not accepting queries to avoid
wraparound data loss in database \"%s\"",
errhint("Stop the postmaster and use a standalone
backend to VACUUM database \"%s\".",
in postgresql.conf.sample, mark custom_variable_classes as SIGHUP not
POSTMASTER to agree with the documentation (I can't see a reason it has
to be POSTMASTER so I think the docs are right).
to 'Size' (that is, size_t), and install overflow detection checks in it.
This allows us to remove the former arbitrary restrictions on NBuffers
etc. It won't make any difference in a 32-bit machine, but in a 64-bit
machine you could theoretically have terabytes of shared buffers.
(How efficiently we could manage 'em remains to be seen.) Similarly,
num_temp_buffers, work_mem, and maintenance_work_mem can be set above
2Gb on a 64-bit machine. Original patch from Koichi Suzuki, additional
work by moi.
insufficient paranoia in code that follows t_ctid links. (We must do both
because even with VACUUM doing it properly, the intermediate state with
a dangling t_ctid link is visible concurrently during lazy VACUUM, and
could be seen afterwards if either type of VACUUM crashes partway through.)
Also try to improve documentation about what's going on. Patch is a bit
bulky because passing the XMAX information around required changing the
APIs of some low-level heapam.c routines, but it's not conceptually very
complicated. Per trouble report from Teodor and subsequent analysis.
This needs to be back-patched, but I'll do that after 8.1 beta is out.
or OFFSET clauses by using estimate_expression_value(). The main advantage
of this is that if the expression is a Param and we have a value for the
Param, we'll use that value rather than defaulting. Also, fix some
thinkos in the logic for combining LIMIT/OFFSET with an externally
supplied tuple fraction (this covers cases like EXISTS(...LIMIT...)).
And make sure the results of all this are shown by EXPLAIN. Per a
gripe from Merlin Moncure.
Fix to_char(interval) to return large year/month/day/hour values that
are larger than possible timestamp values.
Prevent to_char(interval) format specifications that make no sense, like
Month.
Clean up formatting.c code to more logically handle return lengths.
their OID parameter. It was possible to crash the backend with
select array_in('{123}',0,0); because that would bypass the needed step
of initializing the workspace. These seem to be the only two places
with a problem, though (record_in and record_recv don't have the issue,
and the other array functions aren't depending on user-supplied input).
Back-patch as far as 7.4; 7.3 does not have the bug.
(the stats system has always collected this info, but the views were
filtering it out). Modify autovacuum so that over-threshold activity
in a toast table can trigger a VACUUM of the parent table, even if the
parent didn't appear to need vacuuming itself. Per discussion a month
or so back about "short, wide tables".
other stuff; change \du and \dg to be role-aware (Stefan Kaltenbrunner).
Also make tab completion fetch the list of GUC variables from pg_settings
instead of having a hard-wired copy of the list (Tom Lane).
SearchCatCacheList and ReleaseCatCacheList. Previously, we incremented
and decremented the refcounts of list member tuples along with the list
itself, but that's unnecessary, and very expensive when the list is big.
It's cheaper to change only the list refcount. When we are considering
deleting a cache entry, we have to check not only its own refcount but
its parent list's ... but it's easy to arrange the code so that this
check is not made in any commonly-used paths, so the cost is really nil.
The bigger gain though is to refrain from DLMoveToFront'ing each individual
member tuple each time the list is referenced. To keep some semblance
of fair space management, lists are just marked as used or not since the
last cache cleanout search, and we do a MoveToFront pass only when about
to run a cleanout. In combination, these changes reduce the costs of
SearchCatCacheList and ReleaseCatCacheList from about 4.5% of pgbench
runtime to under 1%, according to my gprof results.
remember the output parameter set for himself. It's a bit of a kluge
but fixing array_in to work in bootstrap mode looks worse.
I removed the separate pg_file_length() function, as it no longer has any
real notational advantage --- you can write (pg_stat_file(...)).length.
stderr-in-service or output-from-syslogger-in-service code. Previously
everything was flagged as ERRORs there, which caused all instances to
log "LOG: logger shutting down" as error...
Please apply for 8.1. I'd also like it considered for 8.0 since logging
non-errors as errors can be cause for alarm amongst people who actually
look at their logs...
Magnus Hagander
> >> 3) I restarted the postmaster both times. I got this error
> both times.
> >> :25: ERROR: could not load library "C:/Program
> >> Files/PostgreSQL/8.0/lib/testtrigfuncs.dll": dynamic load error
>
> > Yes. We really need to look at fixing that error message. I had
> > forgotten it completely :-(
>
> > Bruce, you think we can sneak that in after feature freeze? I would
> > call it a bugfix :-)
>
> Me too. That's been on the radar for awhile --- please do
> send in a patch.
Here we go, that wasn't too hard :-)
Apart from adding the error handling, it does one more thing: it changes
the errormode when loading the DLLs. Previously if a DLL was broken, or
referenced other DLLs that couldn't be found, a popup dialog box would
appear on the screen. Which had to be clicked before the backend could
continue. This patch also disables the popup error message for DLL
loads.
I think this is something we should consider doing for the entire
backend - disable those popups, and say we deal with it ourselves. What
do you other win32 hackers thinnk about this?
In the meantime, this patch fixes the error msgs. Please apply for 8.1
and please consider a backpatch to 8.0.
Magnus Hagander
> > I ran across this yesterday on HEAD:
>
> > template1=# grant select on foo, foo to swm;
> > ERROR: tuple already updated by self
>
> Seems to fail similarly in every version back to 7.2; probably further,
> but that's all I have running at the moment.
>
> > We could do away with the error by producing a unique list of object names
> > -- but that would impose an extra cost on the common case.
>
> CommandCounterIncrement in the GRANT loop would be easier, likely.
> I'm having a hard time getting excited about it though...
Yeah, its not that exciting but that error message would throw your
average user.
I've attached a patch which calls CommandCounterIncrement() in each of the
grant loops.
Gavin Sherry
should surely be timestamptz not timestamp; fix some but not all of the
holes in check_and_make_absolute(); other minor cleanup. Also put in
the missed catversion bump.
computation. On modern machines this is as fast if not faster, and we
don't have to clog the CPU's L2 cache with a tens-of-KB pointer array.
If we ever decide to adopt a more dynamic allocation method for shared
buffers, we'll probably have to revert this patch, but in the meantime
we might as well save a few bytes and nanoseconds. Per Qingqing Zhou.
whenever we generate a new OID. This prevents occasional duplicate-OID
errors that can otherwise occur once the OID counter has wrapped around.
Duplicate relfilenode values are also checked for when creating new
physical files. Per my recent proposal.
delay and limit, both as global GUCs and as table-specific entries in
pg_autovacuum. stats_reset_on_server_start is now OFF by default,
but a reset is forced if we did WAL replay. XID-wrap vacuums do not
ANALYZE, but do FREEZE if it's a template database. Alvaro Herrera
CPPFLAGS, CFLAGS, CFLAGS_SL, LDFLAGS, LDFLAGS_SL, and LIBS. Change it
so that invoking pg_config with no arguments reports all available
information, rather than just giving an error message. Per discussion.
against the PGPROC array. Anything in the file that isn't in PGPROC
gets rejected as being a stale entry. This should solve complaints about
stale entries in pg_stat_activity after a BETERM message has been dropped
due to overload.
ResourceOwner mechanism already released all reference counts for the
cache entries; therefore, we do not need to scan the catcache or relcache
at transaction end, unless we want to do it as a debugging crosscheck.
Do the crosscheck only in Assert mode. This is the same logic we had
previously installed in AtEOXact_Buffers to avoid overhead with large
numbers of shared buffers. I thought it'd be a good idea to do it here
too, in view of Kari Lavikka's recent report showing a real-world case
where AtEOXact_CatCache is taking a significant fraction of runtime.
exit, instead of trying to take shortcuts. Introduce some additional
shutdown callback routines to eliminate kluges like having ProcKill
be responsible for shutting down the buffer manager. Ensure that the
order of operations during shutdown is predictable and what you would
expect given the module layering.
max_files_per_process. Going further than that is just a waste of
cycles, and it seems that current Cygwin does not cope gracefully
with deliberately running the system out of FDs. Per Andrew Dunstan.
character, tighten the inner loops of CopyReadLine and CopyReadAttribute,
arrange to parse out all the attributes of a line in just one call instead
of one CopyReadAttribute call per attribute, be smarter about which client
encodings require slow pg_encoding_mblen() loops. Also, clean up the
mishmash of static variables and overly-long parameter lists in favor of
passing around a single CopyState struct containing all the state data.
Original patch by Alon Goldshuv, reworked by Tom Lane.
This was not especially critical before, but it is now that we track
ownership dependencies --- the dependency for the rowtype *must* shift
to the new owner. Spotted by Bernd Helmle.
Also fix a problem introduced by recent change to allow non-superusers
to do ALTER OWNER in some cases: if the table had a toast table, ALTER
OWNER failed *even for superusers*, because the test being applied would
conclude that the new would-be owner had no create rights on pg_toast.
A side-effect of the fix is to disallow changing the ownership of indexes
or toast tables separately from their parent table, which seems a good
idea on the whole.
doesn't block the bgwriter from making progress writing out other buffers.
This was a hard problem in the context of the ARC/2Q design, but it's
trivial in the context of clock sweep ... just advance the sweep counter
before we try to write not after.
of special case for Windows port. Put a PG_TRY around most of createdb()
to ensure that we remove copied subdirectories on failure, even if the
failure happens while creating the pg_database row. (I think this explains
Oliver Siegmar's recent report.) Having done that, there's no need for
the fragile assumption that copydir() mustn't ereport(ERROR), so simplify
its API. Eliminate the old code that used system("cp ...") to copy
subdirectories, in favor of using copydir() on all platforms. This not
only should allow much better error reporting, but allows us to fsync
the created files before trusting that the copy has succeeded.
tests for the new interval->day changes. I added tests for
justify_hours() and justify_days() to interval.sql, as they take
interval input and produce interval output. If there's a more
appropriate place for them, please let me know.
Michael Glaesemann
continue to recurse after eliminating a NOT-below-a-NOT, since the
contained subexpression will now be part of the top-level AND/OR structure
and so deserves to be simplified. The real-world impact of this is
probably minimal, since it'd require at least three levels of NOT to make
a difference, but it's still a bug.
Also remove some redundant tests for NULL subexpressions.
track shared relations in a separate hashtable, so that operations done
from different databases are counted correctly. Add proper support for
anti-XID-wraparound vacuuming, even in databases that are never connected
to and so have no stats entries. Miscellaneous other bug fixes.
Alvaro Herrera, some additional fixes by Tom Lane.
Also, write multiple WAL buffers out in one write() operation.
ITAGAKI Takahiro
---------------------------------------------------------------------------
> If we disable writeback-cache and use open_sync, the per-page writing
> behavior in WAL module will show up as bad result. O_DIRECT is similar
> to O_DSYNC (at least on linux), so that the benefit of it will disappear
> behind the slow disk revolution.
>
> In the current source, WAL is written as:
> for (i = 0; i < N; i++) { write(&buffers[i], BLCKSZ); }
> Is this intentional? Can we rewrite it as follows?
> write(&buffers[0], N * BLCKSZ);
>
> In order to achieve it, I wrote a 'gather-write' patch (xlog.gw.diff).
> Aside from this, I'll also send the fixed direct io patch (xlog.dio.diff).
> These two patches are independent, so they can be applied either or both.
>
>
> I tested them on my machine and the results as follows. It shows that
> direct-io and gather-write is the best choice when writeback-cache is off.
> Are these two patches worth trying if they are used together?
>
>
> | writeback | fsync= | fdata | open_ | fsync_ | open_
> patch | cache | false | sync | sync | direct | direct
> ------------+-----------+--------+-------+-------+--------+---------
> direct io | off | 124.2 | 105.7 | 48.3 | 48.3 | 48.2
> direct io | on | 129.1 | 112.3 | 114.1 | 142.9 | 144.5
> gather-write| off | 124.3 | 108.7 | 105.4 | (N/A) | (N/A)
> both | off | 131.5 | 115.5 | 114.4 | 145.4 | 145.2
>
> - 20runs * pgbench -s 100 -c 50 -t 200
> - with tuning (wal_buffers=64, commit_delay=500, checkpoint_segments=8)
> - using 2 ATA disks:
> - hda(reiserfs) includes system and wal.
> - hdc(jfs) includes database files. writeback-cache is always on.
>
> ---
> ITAGAKI Takahiro
An attached patch is a small additional improvement.
This patch use appendStringInfoText instead of appendStringInfoString.
There is an overhead of PG_TEXT_GET_STR when appendStringInfoString is
executed by text type. This can be reduced by appendStringInfoText.
Atsushi Ogawa
planning logic for bitmap indexscans. Partial indexes create corner
cases in which a scan might be done with no explicit index qual conditions,
and the code wasn't handling those cases nicely. Also be a little
tenser about eliminating redundant clauses in the generated plan.
Per report from Dmitry Karasik.
to the "text" segment. It would be possible to mark the elements of the
array "const" as well, but this would require multiple API changes and
does not seem to be worth the notational inconvenience.
into .def and .exp files automatically on Windows, AIX, and the like.
An additional benefit is that changes in libpgport files correctly
propagate to force rebuild of the backend executable. This is my
reworking of Rocco Altier's idea, and if it breaks anything it's
definitely my fault.
The first rule of portability for us is 'thou shalt have no other gods
before c.h', and a whole lot of these files were either not including
c.h at all, or including random system headers beforehand, either of
which sins can mess up largefile support nicely. Once you have
included c.h, there is no need to re-include what it includes, either.
doesn't automatically inherit the privileges of roles it is a member of;
for such a role, membership in another role can be exploited only by doing
explicit SET ROLE. The default inherit setting is TRUE, so by default
the behavior doesn't change, but creating a user with NOINHERIT gives closer
adherence to our current reading of SQL99. Documentation still lacking,
and I think the information schema needs another look.
existing ones for object privileges. Update the information_schema for
roles --- pg_has_role() makes this a whole lot easier, removing the need
for most of the explicit joins with pg_user. The views should be a tad
faster now, too. Stephen Frost and Tom Lane.
pg_strcasecmp and pg_strncasecmp ... but I see some of the former have
crept back in.
Eternal vigilance is the price of locale independence, apparently.
on the not-very-good .so pattern rules in the port-specific Makefiles.
(This leaves only pgxs' MODULES case needing those rules.) Also,
compile pgsleep.c locally and add it to regress.so to avoid failure
on AIX.