#2075: consider an index redundant if any of its index conditions were already
used, rather than if all of them were. Also, make the selectivity comparison
a bit fuzzy, so that very small differences in estimated selectivities don't
skew the results.
it's worth probing the outer relation for emptiness before building the
hash table. To wit, if we're rescanning a join previously performed,
remember whether we found it nonempty the previous time, and don't bother
with the probe if it was nonempty. This buys back the performance lost
in examples like Mario Weilguni's.
one child or the other had a problem: they did not leave the node in a
state that ExecReScanHashJoin would understand. In particular it would
tend to fail to reset the child plans when needed. Per report from
Mario Weilguni.
ScalarArrayOpExpr when possible, that is, whenever there is an array type
for the values of the expression list. This completes the project I've
been working on to improve the speed of index searches with long IN lists,
as per discussion back in mid-October.
I did not force initdb, but until you do one you will see failures in the
"rules" regression test, because some of the standard system views use IN
and their compiled formats have changed.
they were broken-out AND or OR lists. The least grotty way to do this
seemed to be to set up a general mechanism for handling nodes as though
they were ANDs or ORs. There's no other immediate use for it, but perhaps
we might want to use the mechanism someday for things like BETWEEN
SYMMETRIC.
"ctid IN (list)" will still work after we convert IN to ScalarArrayOpExpr.
Make some minor efficiency improvements while at it, such as ensuring that
multiple TIDs are fetched in physical heap order. And fix EXPLAIN so that
it shows what's really going on for a TID scan.
when we first read the page, rather than checking them one at a time.
This allows us to take and release the buffer content lock just once
per page, instead of once per tuple. Since it's a shared lock the
contention penalty for holding the lock longer shouldn't be too bad.
We can safely do this only when using an MVCC snapshot; else the
assumption that visibility won't change over time is uncool. Therefore
there are now two code paths depending on the snapshot type. I also
made the same change in nodeBitmapHeapscan.c, where it can be done always
because we only support MVCC snapshots for bitmap scans anyway.
Also make some incidental cleanups in the APIs of these functions.
Per a suggestion from Qingqing Zhou.
qualification when the underlying operator is indexable and useOr is true.
That is, indexkey op ANY (ARRAY[...]) is effectively translated into an
OR combination of one indexscan for each array element. This only works
for bitmap index scans, of course, since regular indexscans no longer
support OR'ing of scans. There are still some loose ends to clean up
before changing 'x IN (list)' to translate as a ScalarArrayOpExpr;
for instance predtest.c ought to be taught about it. But this gets the
basic functionality in place.
a TupleTableSlot: instead of calling ExecClearTuple, inline the needed
operations, so that we can avoid redundant steps. In particular, when
the old and new tuples are both on the same disk page, avoid releasing
and re-acquiring the buffer pin --- this saves work in both the bufmgr
and ResourceOwner modules. To make this improvement actually useful,
partially revert a change I made on 2004-04-21 that caused SeqNext
et al to call ExecClearTuple before ExecStoreTuple. The motivation
for that, to avoid grabbing the BufMgrLock separately for releasing
the old buffer and grabbing the new one, no longer applies. My
profiling says that this saves about 5% of the CPU time for an
all-in-memory seqscan.
generate their output tuple descriptors from their target lists (ie, using
ExecAssignResultTypeFromTL()). We long ago fixed things so that all node
types have minimally valid tlists, so there's no longer any good reason to
have two different ways of doing it. This change is needed to fix bug
reported by Hayden James: the fix of 2005-11-03 to emit the correct column
names after optimizing away a SubqueryScan node didn't work if the new
top-level plan node used ExecAssignResultTypeFromOuterPlan to generate its
tupdesc, since the next plan node down won't have the correct column labels.
a SubLink expression into a rule query. Pre-8.1 we essentially did this
unconditionally; 8.1 tries to do it only when needed, but was missing a
couple of cases. Per report from Kyle Bateman. Add some regression test
cases covering this area.
comment line where output as too long, and update typedefs for /lib
directory. Also fix case where identifiers were used as variable names
in the backend, but as typedefs in ecpg (favor the backend for
indenting).
Backpatch to 8.1.X.
process of dropping roles by dropping objects owned by them and privileges
granted to them, or giving the owned objects to someone else, through the
use of the data stored in the new pg_shdepend catalog.
Some refactoring of the GRANT/REVOKE code was needed, as well as ALTER OWNER
code. Further cleanup of code duplication in the GRANT code seems necessary.
Implemented by me after an idea from Tom Lane, who also provided various kind
of implementation advice.
Regression tests pass. Some tests for the new functionality are also added,
as well as rudimentary documentation.
tuple in-place, but instead passes back an all-new tuple structure if
any changes are needed. This is a much cleaner and more robust solution
for the bug discovered by Alexey Beschiokov; accordingly, revert the
quick hack I installed yesterday.
With this change, HeapTupleData.t_datamcxt is no longer needed; will
remove it in a separate commit in HEAD only.
doing heap_insert or heap_update, wipe out any extracted fields in
the TupleTableSlot containing the tuple, because they might not be valid
anymore if tuptoaster.c changed the tuple. Safe because slot must be
in the materialized state, but mighty ugly --- find a better answer!
the array (for array_push) or higher-dimensional array (for array_cat)
rather than decrementing it as before. This avoids generating lower
bounds other than one for any array operation within the SQL spec. Per
recent discussion.
Interestingly, this seems to have been the original behavior, because
while updating the docs I noticed that a large fraction of relevant
examples were *wrong* for the old behavior and are now right. Is it
worth correcting this in the back-branch docs?
recursed twice on its first argument, leading to exponential time spent
on a deep nest of COALESCEs ... such as a deeply nested FULL JOIN would
produce. Per report from Matt Carter.
functionality, but I still need to make another pass looking at places
that incidentally use arrays (such as ACL manipulation) to make sure they
are null-safe. Contrib needs work too.
I have not changed the behaviors that are still under discussion about
array comparison and what to do with lower bounds.
that was added to localbuf.c in 8.1; therefore, applying it to a temp table
left corrupt lookup state in memory. The only case where this had a
significant chance of causing problems was an ON COMMIT DELETE ROWS temp
table; the other possible paths left bogus state that was unlikely to
be used again. Per report from Csaba Nagy.
names from being added to pgindent's typedef list. The existance of
them caused weird formatting in the date/type files, and in keywords.c.
Backpatch to 8.1.X.
columns, shifting comment to the right when more than 150 'else if'
clauses were used, and update typedefs for 8.1.X.
NetBSD patched updated, with documentation.
sense and rename to "outerjoin_delayed" to more clearly reflect what it
means). I had decided that it was redundant in 8.1, but the folly of this
is exposed by a bug report from Sebastian Böck. The place where it's
needed is to prevent orindxpath.c from cherry-picking arms of an outer-join
OR clause to form a relation restriction that isn't actually legal to push
down to the relation scan level. There may be some legal cases that this
forbids optimizing, but we'd need much closer analysis to determine it.
slot of the topmost plan node when a trigger returns a modified tuple.
These appear to be the only places where a plan node's caller did not
treat the result slot as read-only, which is an assumption that nodeUnique
makes as of 8.1. Fixes trigger-vs-DISTINCT bug reported by Frank van Vugt.
surprising results when it's some other numeric type. This doesn't solve
the generic problem of surprising implicit casts to text, but it's a
low-impact way of making sure this particular case behaves sanely.
Per gripe from Harald Fuchs and subsequent discussion.
anything but transaction-exiting commands (ROLLBACK etc). We already rejected
Parse and Execute in such cases, so there seems little point in allowing Bind.
This prevents at least an Assert failure, and probably worse things, since
there's a lot of infrastructure that doesn't work when not in a live
transaction. We can also simplify the Bind logic a bit by rejecting messages
with a nonzero number of parameters, instead of the former kluge to silently
substitute NULL for each parameter. Per bug #2033 from Joel Stevenson.
on every index page they read; in particular to catch the case of an
all-zero page, which PageHeaderIsValid allows to pass. It turns out
hash already had this idea, but it was just Assert()ing things rather
than doing a straight error check, and the Asserts were partially
redundant with PageHeaderIsValid anyway. Per recent failure example
from Jim Nasby. (gist still needs the same treatment.)
to assume that the string pointer passed to set_ps_display is good forever.
There's no need to anyway since ps_status.c itself saves the string, and
we already had an API (get_ps_display) to return it.
I believe this explains Jim Nasby's report of intermittent crashes in
elog.c when %i format code is in use in log_line_prefix.
While at it, repair a previously unnoticed problem: on some platforms such as
Darwin, the string returned by get_ps_display was blank-padded to the maximum
length, meaning that lock.c's attempt to append " waiting" to it never worked.
create circularity of role memberships. This is a minimum-impact fix
for the problem reported by Florian Pflug. I thought about removing
the superuser_arg test from is_member_of_role() altogether, as it seems
redundant for many of the callers --- but not all, and it's way too late
in the 8.1 cycle to be making large changes. Perhaps reconsider this
later.
from a finished plan tree. We have to copy the output column names
(resname fields) from the SubqueryScan down to its child plan node;
else, if this is the topmost level of the plan, the wrong column names
will be delivered to the client. Per bug #2017 reported by Jolly Chen.
very narrow window in which SimpleLruReadPage or SimpleLruWritePage could
think that I/O was needed when it wasn't (and indeed the buffer had already
been assigned to another page). This would result in an Assert failure if
Asserts were enabled, and probably in silent data corruption if not.
Reported independently by Jim Nasby and Robert Creager.
I intend a more extensive fix when 8.2 development starts, but this is a
reasonably low-impact patch for the existing branches.
setting for the regression makefile, allowing Windows users to force locale
settings since Windows does not get its locale from the environment.
Per Petr Jelinek.
of postgres.imp file into BE_DLLLIBS macro. This makes the AIX build
work more like the Windows and Darwin builds, which have similar requirements
to mention a backend library when linking shared libraries that will be
dynamically loaded into the backend.
multixact's starting offset before the offset has been stored into the
SLRU file. A simple fix would be to hold the MultiXactGenLock until the
offset has been stored, but that looks like a big concurrency hit. Instead
rely on knowledge that unset offsets will be zero, and loop when we see
a zero. This requires a little extra hacking to ensure that zero is never
a valid value for the offset. Problem reported by Matteo Beccati, fix
ideas from Martijn van Oosterhout, Alvaro Herrera, and Tom Lane.
advance its usage_count. This includes writes of dirty buffers triggered
by bgwriter, checkpoint, or FlushRelationBuffers, as well as various
corner cases that really ought not count as accesses to the page.
Should make for some marginal improvement in the quality of our decisions
about when to recycle buffers. Per suggestion from ITAGAKI Takahiro.
inFromCl true, meaning that they will list out as explicit RTEs if they
are in a view or rule. Update comments about inFromCl to reflect the way
it's now actually used. Per recent discussion.
for an outer join; symptom is bogus error "RIGHT JOIN is only supported with
merge-joinable join conditions". Problem was that select_mergejoin_clauses
did its tests in the wrong order. We need to force left join not right join
for a merge join when there are non-mergeable join clauses; but the test for
this only accounted for mergejoinability of the clause operator, and not
whether the left and right Vars were of the proper relations. Per report
from Jean-Pierre Pelletier.
some small stylistic improvements in these functions. Also fix several
places where TMODULO() was being used with wrong-sized quotient argument,
creating a risk of overflow --- interval2tm was actually capable of going
into an infinite loop because of this.
to the main thread. This allows removal of WaitForSingleObjectEx() calls
from the main thread, thereby allowing us to re-enable Qingqing Zhou's
CHECK_FOR_INTERRUPTS performance improvement. Qingqing, Magnus, et al.
PQregisterThreadLock().
I also remove the crypt() mention in the libpq threading section and
added a single sentence in the client-auth manual page under crypt().
Crypt authentication is so old now that a separate paragraph about it
seemed unwise.
I also added a comment about our use of locking around pqGetpwuid().
WaitForSingleObjectEx is always called by CHECK_FOR_INTERRUPTS. This
should be reinstated but the setitimer() emulation will have to be
redesigned first.
a kernel call unless there's some evidence of a pending signal. This should
bring its performance on Windows into line with the Unix version. Problem
diagnosis and patch by Qingqing Zhou. Minor stylistic tweaks by moi ...
if it's broken, it's my fault.
properly advancing the CommandCounter between multiple sub-queries
generated by rules, we forgot to update the snapshot being used, so
that the successive sub-queries didn't actually see each others'
results. This is still not *exactly* like the semantics of normal
execution of the same queries, in that we don't take new transaction
snapshots and hence don't see changes from concurrently committed
commands, but I think that's OK and probably even preferable for
EXPLAIN ANALYZE.
a parameter in binary format. Also, add a TIP explaining how to use casts
in the query text to avoid needing to specify parameter types by OID.
Also fix bogus spacing --- apparently somebody expanded the tabs in the
example programs to 8 spaces instead of 4 when transposing them into SGML.
since it can take a fair amount of time and this can confuse boot scripts
that expect postmaster.pid to appear quickly. Move initialization of SSL
library and preloaded libraries to after that point, too, just for luck.
Per reports from Tony Caduto and others.
module. Don't rely on backend palloc semantics; in fact, best to not
use palloc at all, rather than #define'ing it to malloc, because that
just encourages errors of omission. Bug spotted by Volkan YAZICI,
but I went further than he did to fix it.
generated from subquery outputs: use the type info stored in the Var
itself. To avoid making ExecEvalVar and slot_getattr more complex
and slower, I split out the whole-row case into a separate ExecEval routine.
type ID information even when it's a record type. This is needed to
handle whole-row Vars referencing subquery outputs. Per example from
Richard Huxton.
optimization for subquery and function scan nodes: we can't just do it
unconditionally, we still have to check whether there is any need for
a whole-row Var. I had been thinking that these node types couldn't
have any system columns, which is true, but that loop is also checking
for attno zero, ie, whole-row Var. Fix comment to not be so misleading.
Per test case from Richard Huxton.
fix problems with replacement-string backslashes that aren't followed by
one of the expected characters, avoid giving the impression that
replace_text_regexp() is meant to be called directly as a SQL function,
etc.
the facility has been set, the facility gets set to LOCAL0 and cannot
be changed later. This seems reasonably plausible to happen, particularly
at higher debug log levels, though I am not certain it explains Han Holl's
recent report. Easiest fix is to teach the code how to change the value
on-the-fly, which is nicer anyway. I made the settings PGC_SIGHUP to
conform with log_destination.
regression=# select '23:59:59.9'::time(0);
time
----------
24:00:00
(1 row)
This is bad because:
regression=# select '24:00:00'::time(0);
ERROR: date/time field value out of range: "24:00:00"
The last example now works.
of client_min_messages (fatal + panic) are valid and also fixes a slight
issue with how psql tried to display error messages that aren't sent to
the client.
We often tell people to ignore errors in response to requests for things
like "drop if exists", but there's no good way to completely hide this
without upping client_min_messages past ERROR. When running a file like
SET client_min_messages TO 'FATAL';
DROP TABLE doesntexist;
with "psql -f filename" you get an error prefix of
"psql:/home/username/filename:3" even though there is no error message to
prefix because it isn't sent to the client.
Kris Jurka
This makes the error messages for PREPARE TRANSACTION, COMMIT PREPARED
etc. match the docs, which talk about "transaction identifier" not
"gid" or "global transaction identifier".
Steve Woodcock
make_restrictinfo_from_bitmapqual. The likelihood of finding duplicates
seems much less than in the AND-subclause case, and the cost much higher,
because OR lists with hundreds or even thousands of subclauses are not
uncommon. Per discussion with Ilia Kantor and andrew@supernews.
the wrong buffer dirty when trying to kill a dead index entry that's on
a page after the one it started on. No risk of data corruption, just
inefficiency, but still a bug.
pointers, to ensure that compilers won't rearrange accesses to occur
while we're not holding the buffer header spinlock. It's probably
not necessary to mark volatile in every single place in bufmgr.c,
but better safe than sorry. Per trouble report from Kevin Grittner.
whether we seem to be running in a uniprocessor or multiprocessor.
The adjustment rules could probably still use further tweaking, but
I'm convinced this should be a win overall.
valid type information if they are asked to fetch the values part of a
pg_statistic slot; these arguments are unneeded if fetching only the
numbers part. Use this to save a catcache lookup in btcostestimate,
which is looking like a bit of a hotspot in recent profiling. Not a
big savings, but since it's essentially free, might as well do it.
A RestrictInfo representing an OR clause now contains two versions of
the contained expression, one with sub-RestrictInfos and one without.
clause_selectivity() should descend to the version with sub-RestrictInfos
so that it has a chance of caching its results for the OR's sub-clauses.
Failing to do so resulted in redundant planner effort.
emit when given the --clean option, in favor of individual DROP ROLE
commands. The old technique could not possibly work in 8.1, and was
never a very good idea anyway IMHO. The DROP ROLE approach has the
defect that the DROPs will fail for roles that own objects or have
privileges, but perhaps we can improve that later.
ie removing shared-dependency entries, should happen before non-rollbackable
ones. That way a failure during the rollbackable part doesn't leave us
with inconsistent state.