a kernel call unless there's some evidence of a pending signal. This should
bring its performance on Windows into line with the Unix version. Problem
diagnosis and patch by Qingqing Zhou. Minor stylistic tweaks by moi ...
if it's broken, it's my fault.
properly advancing the CommandCounter between multiple sub-queries
generated by rules, we forgot to update the snapshot being used, so
that the successive sub-queries didn't actually see each others'
results. This is still not *exactly* like the semantics of normal
execution of the same queries, in that we don't take new transaction
snapshots and hence don't see changes from concurrently committed
commands, but I think that's OK and probably even preferable for
EXPLAIN ANALYZE.
since it can take a fair amount of time and this can confuse boot scripts
that expect postmaster.pid to appear quickly. Move initialization of SSL
library and preloaded libraries to after that point, too, just for luck.
Per reports from Tony Caduto and others.
module. Don't rely on backend palloc semantics; in fact, best to not
use palloc at all, rather than #define'ing it to malloc, because that
just encourages errors of omission. Bug spotted by Volkan YAZICI,
but I went further than he did to fix it.
generated from subquery outputs: use the type info stored in the Var
itself. To avoid making ExecEvalVar and slot_getattr more complex
and slower, I split out the whole-row case into a separate ExecEval routine.
type ID information even when it's a record type. This is needed to
handle whole-row Vars referencing subquery outputs. Per example from
Richard Huxton.
optimization for subquery and function scan nodes: we can't just do it
unconditionally, we still have to check whether there is any need for
a whole-row Var. I had been thinking that these node types couldn't
have any system columns, which is true, but that loop is also checking
for attno zero, ie, whole-row Var. Fix comment to not be so misleading.
Per test case from Richard Huxton.
fix problems with replacement-string backslashes that aren't followed by
one of the expected characters, avoid giving the impression that
replace_text_regexp() is meant to be called directly as a SQL function,
etc.
the facility has been set, the facility gets set to LOCAL0 and cannot
be changed later. This seems reasonably plausible to happen, particularly
at higher debug log levels, though I am not certain it explains Han Holl's
recent report. Easiest fix is to teach the code how to change the value
on-the-fly, which is nicer anyway. I made the settings PGC_SIGHUP to
conform with log_destination.
regression=# select '23:59:59.9'::time(0);
time
----------
24:00:00
(1 row)
This is bad because:
regression=# select '24:00:00'::time(0);
ERROR: date/time field value out of range: "24:00:00"
The last example now works.
This makes the error messages for PREPARE TRANSACTION, COMMIT PREPARED
etc. match the docs, which talk about "transaction identifier" not
"gid" or "global transaction identifier".
Steve Woodcock
make_restrictinfo_from_bitmapqual. The likelihood of finding duplicates
seems much less than in the AND-subclause case, and the cost much higher,
because OR lists with hundreds or even thousands of subclauses are not
uncommon. Per discussion with Ilia Kantor and andrew@supernews.
the wrong buffer dirty when trying to kill a dead index entry that's on
a page after the one it started on. No risk of data corruption, just
inefficiency, but still a bug.
pointers, to ensure that compilers won't rearrange accesses to occur
while we're not holding the buffer header spinlock. It's probably
not necessary to mark volatile in every single place in bufmgr.c,
but better safe than sorry. Per trouble report from Kevin Grittner.
whether we seem to be running in a uniprocessor or multiprocessor.
The adjustment rules could probably still use further tweaking, but
I'm convinced this should be a win overall.
valid type information if they are asked to fetch the values part of a
pg_statistic slot; these arguments are unneeded if fetching only the
numbers part. Use this to save a catcache lookup in btcostestimate,
which is looking like a bit of a hotspot in recent profiling. Not a
big savings, but since it's essentially free, might as well do it.
A RestrictInfo representing an OR clause now contains two versions of
the contained expression, one with sub-RestrictInfos and one without.
clause_selectivity() should descend to the version with sub-RestrictInfos
so that it has a chance of caching its results for the OR's sub-clauses.
Failing to do so resulted in redundant planner effort.
ie removing shared-dependency entries, should happen before non-rollbackable
ones. That way a failure during the rollbackable part doesn't leave us
with inconsistent state.
traceable to grant options. As per my earlier proposal, a GRANT made by
a role member has to be recorded as being granted by the role that actually
holds the grant option, and not the member.
like '23:59:60' because of fractional-second roundoff problems. Trying
to control this upstream of the actual display code was hopeless; the right
way is to explicitly round fractional seconds in the display code and then
refigure the results if the fraction rounds up to 1. Per bug #1927.
to call krb5_sname_to_principal() always. Also, use krb_srvname rather
than the hardwired string 'postgres' as the appl_version string in the
krb5_sendauth/recvauth calls, to avoid breaking compatibility with PG
8.0. Magnus Hagander
testing ownership if the caller isn't interested in any GOPTION bits
(which is the common case). It did not matter in 8.0 where the ownership
test was just a trivial equality test, but it matters now.
level for unrecognized win32 error codes to LOG, and make messages
conform to style guide. Per old suggestion from Qingqing Zhou, which
seems to have gotten lost in the shuffle.
cache lookup in the success case. This won't help much for cases where
the given relation is far down the search path, but it does not hurt in
any cases either; and it requires only a little new code. Per gripe from
Jim Nasby about slowness of \d with many tables.
the parameter's name (if any) as the default column name for SELECT FROM
the function, rather than the function name as previously. I still think
this is a bad idea, but I lost the argument. Force decompilation of
function RTEs to specify full aliases always, to reduce the odds of this
decision breaking dumped views.
predicate_implied_by() to detect redundant filter conditions, but forgot
that predicate_implied_by() assumes its first argument contains only
immutable functions. Add a check to guarantee that. Also, test to see
if filter conditions can be discarded because they are redundant with
the predicate of a partial index.
generated by bitmap index scans. Along the way, simplify and speed up
the code for counting sequential and index scans; it was both confusing
and inefficient to be taking care of that in the per-tuple loops, IMHO.
initdb forced because of internal changes in pg_stat view definitions.
was created on a machine with alignment rules and floating-point format
similar to the current machine. Per recent discussion, this seems like
a good idea with the increasing prevalence of 32/64 bit environments.
argument as a 'regclass' value instead of a text string. The frontend
conversion of text string to pg_class OID is now encapsulated as an
implicitly-invocable coercion from text to regclass. This provides
backwards compatibility to the old behavior when the sequence argument
is explicitly typed as 'text'. When the argument is just an unadorned
literal string, it will be taken as 'regclass', which means that the
stored representation will be an OID. This solves longstanding problems
with renaming sequences that are referenced in default expressions, as
well as new-in-8.1 problems with renaming such sequences' schemas or
moving them to another schema. All per recent discussion.
Along the way, fix some rather serious problems in dbmirror's support
for mirroring sequence operations (int4 vs int8 confusion for instance).
the ProcessUtility case, resulting in an intratransaction memory leak
if a utility command actually did return any tuples, as reported by
Dmitry Karasik. Fix this and also make the behavior more consistent
for cases involving nested SPI operations and multiple query trees,
by ensuring that we store the state locally until it is ready to be
returned to the caller.
only the inner-side relation would be considered as potential equijoin clauses,
which is wrong because the condition doesn't necessarily hold above the point
of the outer join. Per test case from Kevin Grittner (bug#1916).
outer relation is empty did not work, per test case from Patrick Welche.
It tried to use nodeHashjoin.c's high-level mechanisms for fetching an
outer-relation tuple, but that code expected the hash table to be filled
already. As patched, the code failed in corner cases such as having no
outer-relation tuples for the first hash batch. Revert and rewrite.
for int8 and related types. However we might be talking to a client
that has working int64; so pq_getmsgint64 really needs to check the
incoming value and throw an overflow error if we can't represent it
accurately.
"optimization". When we find a potentially useful joinclause, we
have to add all its other required_relids to the result, not only the
other clause_relids. They are different in the case of a joinclause
whose applicability has to be postponed due to outer join. We have
to include the extra rels because otherwise, after best_inner_indexscan
masks the join rels with index_outer_relids, it will always fail to
find the joinclause as applicable. Per report from Husam Tomeh.
in which invalid page data could be transiently written to disk by
concurrent bgwriter activity. There doesn't seem any risk of loss of
actual user data, but an empty page could possibly be left corrupt if a
crash occurs before the correct data gets written out. Pointed out by
Alvaro Herrera.
strings. This is consistent with SQL conventions, and since Bruce
already changed initdb in a way that assumed it worked like this, seems
we'd better make it work like this.
sake of brevity and clarity.
Make pg_reload_conf(), pg_rotate_logfile(), and pg_cancel_backend()
return a boolean rather than an integer to indicate success or failure.
Along the way, make some minor cleanups to dbsize.c -- in particular,
use elog() rather than ereport() for "shouldn't happen" error
conditions, and remove some of the more flagrant violations of the
Postgres indentation conventions.
Catalog version bumped.
bytes. This shouldn't make any difference on x86 machines, where the size
happened to be 16 bytes anyway, but on 64-bit machines and machines with
slock_t int or wider, it will speed array indexing and hopefully reduce
SMP cache contention effects. Per recent experimentation.
recovered. I did not see any actual leak while testing this in CVS tip,
but 8.0 definitely has a problem with leaking the space temporarily
palloc'd by BufferSync(). In any case this seems a good idea to forestall
similar problems in future. Per report from Arjen van der Meijden.
to drop connections unceremoniously. Also some other marginal cleanups:
don't query getsockopt() repeatedly if it fails, and avoid having the
apparent definition of struct Port depend on which system headers you
might have included or not. Oliver Jowett and Tom Lane.
in the zic database or zone names found in the date token table. This
preserves the old ability to do AT TIME ZONE 'PST' along with the new
ability to do AT TIME ZONE 'PST8PDT'. Per gripe from Bricklen Anderson.
Also, fix some inconsistencies in usage of TZ_STRLEN_MAX --- the old
code had the potential for one-byte buffer overruns, though given
alignment considerations it's unlikely there was any real risk.
for procedural languages. This replaces the hard-wired table I had
originally proposed as a stopgap solution. For the moment, the initial
contents only include languages shipped with the core distribution.
as per my recent proposal. For now the template data is hard-wired in
proclang.c --- this should be replaced later by a new shared system
catalog, but we don't want to force initdb during 8.1 beta. This change
lets us cleanly load existing dump files even if they contain outright
wrong information about a PL's support functions, such as a wrong path
to the shared library or a missing validator function. Also, we can
revert the recent kluges to make pg_dump dump PL support functions that
are stored in pg_catalog.
While at it, I removed the code in pg_regress that replaced $libdir
with a hardcoded path for temporary installations. This is no longer
needed given our support for relocatable installations.
when there are extra resjunk columns in the child node. I found some
additional cases involving Append nodes that weren't handled by the
prior patch, and it's not clear how to fix them in the same way without
breaking inheritance cases. So the prudent path seems to be to narrow
the scope of the optimization.
has to recopy the input plan node's targetlist if it removes a
SubqueryScan node just below the non-projecting node. For simplicity
I made it recopy always. Per bug report from Allan Wang and Michael Fuhr.
on a page, as suggested by ITAGAKI Takahiro. Also, change a few places
that were using some other estimates of max-items-per-page to consistently
use MaxOffsetNumber. This is conservatively large --- we could have used
the new MaxHeapTuplesPerPage macro, or a similar one for index tuples ---
but those places are simply declaring a fixed-size buffer and assuming it
will work, rather than actively testing for overrun. It seems safer to
size these buffers in a way that can't overflow even if the page is
corrupt.
saves nearly 700kB in the default shared memory segment size, which seems
worthwhile, and it is a feature that many users won't use anyway. Per
Heikki's argument, there is no point in a compromise value --- those who
are using 2PC at all will probably want it at least equal to max_connections.
But we can't set it to zero by default without breaking the prepared_xacts
regression test.
so that the latter estimates the number of groups that grouping will
produce. This is needed because it is primarily query_planner that
makes the decision between fast-start and fast-finish plans, and in the
original coding it was unable to make more than a crude rule-of-thumb
choice when the query involved grouping. This revision helps us make
saner choices for queries like SELECT ... GROUP BY ... LIMIT, as in a
recent example from Mark Kirkwood. Also move the responsibility for
canonicalizing sort_pathkeys and group_pathkeys into query_planner;
this information has to be available anyway to support the first change,
and doing it this way lets us get rid of compare_noncanonical_pathkeys
entirely.
to copy the whole plan tree before invoking adjust_plan_varnos(); else
if there is any multiply-linked substructure, the latter might increment
some Var's varno twice. Previously there were some retail copyObject
calls inside adjust_plan_varnos, but it seems a lot safer to just dup the
whole tree first. Also, set_inner_join_references was trying to avoid
work by not recursing if a BitmapHeapScan's bitmapqualorig contained no
outer references; which was OK at the time the code was written, I think,
but now that create_bitmap_scan_plan removes duplicate clauses from
bitmapqualorig it is possible for that field to be NULL while outer
references still remain in the qpqual and/or contained indexscan nodes.
For safety, always recurse even if the BitmapHeapScan looks to be outer
reference free. Per reports from Michael Fuhr and Oleg Bartunov.
the parent table, even if the command that creates them is executed by
someone else (such as a superuser or a member of the owning role).
Per gripe from Michael Fuhr.
in interval_mul and interval_div. This avoids an optimization bug
in A Certain Company's compiler (and given their explanation, I wouldn't
be surprised if other compilers blow it too). Besides the code seems
more clear this way --- in the original formulation, you had to mentally
recognize the common subexpression in order to understand what was going
on.
use these instead of its previous hack of changing pg_class.reltriggers.
Documentation is lacking, will add that later.
Patch by Satoshi Nagayasu, review and some extra work by Tom Lane.
after the fact. Fix bug with incorrect test for whether we are at end
of logfile segment. Arrange for writes triggered by XLogInsert's
is-cache-more-than-half-full test to synchronize with the cache boundaries,
so that in long transactions we tend to write alternating halves of the
cache rather than randomly chosen portions of it; this saves one more
write syscall per cache load.
erroring out as it has done for the last couple weeks. Document that this
form is now ignored because indexes can't usefully have different owners
from their parent tables. Fix pg_dump to not generate ALTER OWNER commands
for indexes.
discussion of getting around this by relaxing the checks made for regular
users, but I'm disinclined to toy with the security model right now,
so just special-case it for superusers where needed.
not accepting queries).
errmsg("database is not accepting queries to avoid
wraparound data loss in database \"%s\"",
errhint("Stop the postmaster and use a standalone
backend to VACUUM database \"%s\".",
in postgresql.conf.sample, mark custom_variable_classes as SIGHUP not
POSTMASTER to agree with the documentation (I can't see a reason it has
to be POSTMASTER so I think the docs are right).
to 'Size' (that is, size_t), and install overflow detection checks in it.
This allows us to remove the former arbitrary restrictions on NBuffers
etc. It won't make any difference in a 32-bit machine, but in a 64-bit
machine you could theoretically have terabytes of shared buffers.
(How efficiently we could manage 'em remains to be seen.) Similarly,
num_temp_buffers, work_mem, and maintenance_work_mem can be set above
2Gb on a 64-bit machine. Original patch from Koichi Suzuki, additional
work by moi.
insufficient paranoia in code that follows t_ctid links. (We must do both
because even with VACUUM doing it properly, the intermediate state with
a dangling t_ctid link is visible concurrently during lazy VACUUM, and
could be seen afterwards if either type of VACUUM crashes partway through.)
Also try to improve documentation about what's going on. Patch is a bit
bulky because passing the XMAX information around required changing the
APIs of some low-level heapam.c routines, but it's not conceptually very
complicated. Per trouble report from Teodor and subsequent analysis.
This needs to be back-patched, but I'll do that after 8.1 beta is out.
or OFFSET clauses by using estimate_expression_value(). The main advantage
of this is that if the expression is a Param and we have a value for the
Param, we'll use that value rather than defaulting. Also, fix some
thinkos in the logic for combining LIMIT/OFFSET with an externally
supplied tuple fraction (this covers cases like EXISTS(...LIMIT...)).
And make sure the results of all this are shown by EXPLAIN. Per a
gripe from Merlin Moncure.
Fix to_char(interval) to return large year/month/day/hour values that
are larger than possible timestamp values.
Prevent to_char(interval) format specifications that make no sense, like
Month.
Clean up formatting.c code to more logically handle return lengths.
their OID parameter. It was possible to crash the backend with
select array_in('{123}',0,0); because that would bypass the needed step
of initializing the workspace. These seem to be the only two places
with a problem, though (record_in and record_recv don't have the issue,
and the other array functions aren't depending on user-supplied input).
Back-patch as far as 7.4; 7.3 does not have the bug.
(the stats system has always collected this info, but the views were
filtering it out). Modify autovacuum so that over-threshold activity
in a toast table can trigger a VACUUM of the parent table, even if the
parent didn't appear to need vacuuming itself. Per discussion a month
or so back about "short, wide tables".
SearchCatCacheList and ReleaseCatCacheList. Previously, we incremented
and decremented the refcounts of list member tuples along with the list
itself, but that's unnecessary, and very expensive when the list is big.
It's cheaper to change only the list refcount. When we are considering
deleting a cache entry, we have to check not only its own refcount but
its parent list's ... but it's easy to arrange the code so that this
check is not made in any commonly-used paths, so the cost is really nil.
The bigger gain though is to refrain from DLMoveToFront'ing each individual
member tuple each time the list is referenced. To keep some semblance
of fair space management, lists are just marked as used or not since the
last cache cleanout search, and we do a MoveToFront pass only when about
to run a cleanout. In combination, these changes reduce the costs of
SearchCatCacheList and ReleaseCatCacheList from about 4.5% of pgbench
runtime to under 1%, according to my gprof results.
remember the output parameter set for himself. It's a bit of a kluge
but fixing array_in to work in bootstrap mode looks worse.
I removed the separate pg_file_length() function, as it no longer has any
real notational advantage --- you can write (pg_stat_file(...)).length.
stderr-in-service or output-from-syslogger-in-service code. Previously
everything was flagged as ERRORs there, which caused all instances to
log "LOG: logger shutting down" as error...
Please apply for 8.1. I'd also like it considered for 8.0 since logging
non-errors as errors can be cause for alarm amongst people who actually
look at their logs...
Magnus Hagander
> >> 3) I restarted the postmaster both times. I got this error
> both times.
> >> :25: ERROR: could not load library "C:/Program
> >> Files/PostgreSQL/8.0/lib/testtrigfuncs.dll": dynamic load error
>
> > Yes. We really need to look at fixing that error message. I had
> > forgotten it completely :-(
>
> > Bruce, you think we can sneak that in after feature freeze? I would
> > call it a bugfix :-)
>
> Me too. That's been on the radar for awhile --- please do
> send in a patch.
Here we go, that wasn't too hard :-)
Apart from adding the error handling, it does one more thing: it changes
the errormode when loading the DLLs. Previously if a DLL was broken, or
referenced other DLLs that couldn't be found, a popup dialog box would
appear on the screen. Which had to be clicked before the backend could
continue. This patch also disables the popup error message for DLL
loads.
I think this is something we should consider doing for the entire
backend - disable those popups, and say we deal with it ourselves. What
do you other win32 hackers thinnk about this?
In the meantime, this patch fixes the error msgs. Please apply for 8.1
and please consider a backpatch to 8.0.
Magnus Hagander
> > I ran across this yesterday on HEAD:
>
> > template1=# grant select on foo, foo to swm;
> > ERROR: tuple already updated by self
>
> Seems to fail similarly in every version back to 7.2; probably further,
> but that's all I have running at the moment.
>
> > We could do away with the error by producing a unique list of object names
> > -- but that would impose an extra cost on the common case.
>
> CommandCounterIncrement in the GRANT loop would be easier, likely.
> I'm having a hard time getting excited about it though...
Yeah, its not that exciting but that error message would throw your
average user.
I've attached a patch which calls CommandCounterIncrement() in each of the
grant loops.
Gavin Sherry
should surely be timestamptz not timestamp; fix some but not all of the
holes in check_and_make_absolute(); other minor cleanup. Also put in
the missed catversion bump.
computation. On modern machines this is as fast if not faster, and we
don't have to clog the CPU's L2 cache with a tens-of-KB pointer array.
If we ever decide to adopt a more dynamic allocation method for shared
buffers, we'll probably have to revert this patch, but in the meantime
we might as well save a few bytes and nanoseconds. Per Qingqing Zhou.
whenever we generate a new OID. This prevents occasional duplicate-OID
errors that can otherwise occur once the OID counter has wrapped around.
Duplicate relfilenode values are also checked for when creating new
physical files. Per my recent proposal.
delay and limit, both as global GUCs and as table-specific entries in
pg_autovacuum. stats_reset_on_server_start is now OFF by default,
but a reset is forced if we did WAL replay. XID-wrap vacuums do not
ANALYZE, but do FREEZE if it's a template database. Alvaro Herrera
against the PGPROC array. Anything in the file that isn't in PGPROC
gets rejected as being a stale entry. This should solve complaints about
stale entries in pg_stat_activity after a BETERM message has been dropped
due to overload.
ResourceOwner mechanism already released all reference counts for the
cache entries; therefore, we do not need to scan the catcache or relcache
at transaction end, unless we want to do it as a debugging crosscheck.
Do the crosscheck only in Assert mode. This is the same logic we had
previously installed in AtEOXact_Buffers to avoid overhead with large
numbers of shared buffers. I thought it'd be a good idea to do it here
too, in view of Kari Lavikka's recent report showing a real-world case
where AtEOXact_CatCache is taking a significant fraction of runtime.
exit, instead of trying to take shortcuts. Introduce some additional
shutdown callback routines to eliminate kluges like having ProcKill
be responsible for shutting down the buffer manager. Ensure that the
order of operations during shutdown is predictable and what you would
expect given the module layering.
max_files_per_process. Going further than that is just a waste of
cycles, and it seems that current Cygwin does not cope gracefully
with deliberately running the system out of FDs. Per Andrew Dunstan.
character, tighten the inner loops of CopyReadLine and CopyReadAttribute,
arrange to parse out all the attributes of a line in just one call instead
of one CopyReadAttribute call per attribute, be smarter about which client
encodings require slow pg_encoding_mblen() loops. Also, clean up the
mishmash of static variables and overly-long parameter lists in favor of
passing around a single CopyState struct containing all the state data.
Original patch by Alon Goldshuv, reworked by Tom Lane.
This was not especially critical before, but it is now that we track
ownership dependencies --- the dependency for the rowtype *must* shift
to the new owner. Spotted by Bernd Helmle.
Also fix a problem introduced by recent change to allow non-superusers
to do ALTER OWNER in some cases: if the table had a toast table, ALTER
OWNER failed *even for superusers*, because the test being applied would
conclude that the new would-be owner had no create rights on pg_toast.
A side-effect of the fix is to disallow changing the ownership of indexes
or toast tables separately from their parent table, which seems a good
idea on the whole.
doesn't block the bgwriter from making progress writing out other buffers.
This was a hard problem in the context of the ARC/2Q design, but it's
trivial in the context of clock sweep ... just advance the sweep counter
before we try to write not after.
of special case for Windows port. Put a PG_TRY around most of createdb()
to ensure that we remove copied subdirectories on failure, even if the
failure happens while creating the pg_database row. (I think this explains
Oliver Siegmar's recent report.) Having done that, there's no need for
the fragile assumption that copydir() mustn't ereport(ERROR), so simplify
its API. Eliminate the old code that used system("cp ...") to copy
subdirectories, in favor of using copydir() on all platforms. This not
only should allow much better error reporting, but allows us to fsync
the created files before trusting that the copy has succeeded.
continue to recurse after eliminating a NOT-below-a-NOT, since the
contained subexpression will now be part of the top-level AND/OR structure
and so deserves to be simplified. The real-world impact of this is
probably minimal, since it'd require at least three levels of NOT to make
a difference, but it's still a bug.
Also remove some redundant tests for NULL subexpressions.
track shared relations in a separate hashtable, so that operations done
from different databases are counted correctly. Add proper support for
anti-XID-wraparound vacuuming, even in databases that are never connected
to and so have no stats entries. Miscellaneous other bug fixes.
Alvaro Herrera, some additional fixes by Tom Lane.
Also, write multiple WAL buffers out in one write() operation.
ITAGAKI Takahiro
---------------------------------------------------------------------------
> If we disable writeback-cache and use open_sync, the per-page writing
> behavior in WAL module will show up as bad result. O_DIRECT is similar
> to O_DSYNC (at least on linux), so that the benefit of it will disappear
> behind the slow disk revolution.
>
> In the current source, WAL is written as:
> for (i = 0; i < N; i++) { write(&buffers[i], BLCKSZ); }
> Is this intentional? Can we rewrite it as follows?
> write(&buffers[0], N * BLCKSZ);
>
> In order to achieve it, I wrote a 'gather-write' patch (xlog.gw.diff).
> Aside from this, I'll also send the fixed direct io patch (xlog.dio.diff).
> These two patches are independent, so they can be applied either or both.
>
>
> I tested them on my machine and the results as follows. It shows that
> direct-io and gather-write is the best choice when writeback-cache is off.
> Are these two patches worth trying if they are used together?
>
>
> | writeback | fsync= | fdata | open_ | fsync_ | open_
> patch | cache | false | sync | sync | direct | direct
> ------------+-----------+--------+-------+-------+--------+---------
> direct io | off | 124.2 | 105.7 | 48.3 | 48.3 | 48.2
> direct io | on | 129.1 | 112.3 | 114.1 | 142.9 | 144.5
> gather-write| off | 124.3 | 108.7 | 105.4 | (N/A) | (N/A)
> both | off | 131.5 | 115.5 | 114.4 | 145.4 | 145.2
>
> - 20runs * pgbench -s 100 -c 50 -t 200
> - with tuning (wal_buffers=64, commit_delay=500, checkpoint_segments=8)
> - using 2 ATA disks:
> - hda(reiserfs) includes system and wal.
> - hdc(jfs) includes database files. writeback-cache is always on.
>
> ---
> ITAGAKI Takahiro
An attached patch is a small additional improvement.
This patch use appendStringInfoText instead of appendStringInfoString.
There is an overhead of PG_TEXT_GET_STR when appendStringInfoString is
executed by text type. This can be reduced by appendStringInfoText.
Atsushi Ogawa
planning logic for bitmap indexscans. Partial indexes create corner
cases in which a scan might be done with no explicit index qual conditions,
and the code wasn't handling those cases nicely. Also be a little
tenser about eliminating redundant clauses in the generated plan.
Per report from Dmitry Karasik.
to the "text" segment. It would be possible to mark the elements of the
array "const" as well, but this would require multiple API changes and
does not seem to be worth the notational inconvenience.
into .def and .exp files automatically on Windows, AIX, and the like.
An additional benefit is that changes in libpgport files correctly
propagate to force rebuild of the backend executable. This is my
reworking of Rocco Altier's idea, and if it breaks anything it's
definitely my fault.
doesn't automatically inherit the privileges of roles it is a member of;
for such a role, membership in another role can be exploited only by doing
explicit SET ROLE. The default inherit setting is TRUE, so by default
the behavior doesn't change, but creating a user with NOINHERIT gives closer
adherence to our current reading of SQL99. Documentation still lacking,
and I think the information schema needs another look.
existing ones for object privileges. Update the information_schema for
roles --- pg_has_role() makes this a whole lot easier, removing the need
for most of the explicit joins with pg_user. The views should be a tad
faster now, too. Stephen Frost and Tom Lane.
pg_strcasecmp and pg_strncasecmp ... but I see some of the former have
crept back in.
Eternal vigilance is the price of locale independence, apparently.
near daylight savings time boudaries. This handles it properly, e.g.
test=> select '2005-04-03 04:00:00'::timestamp at time zone
'America/Los_Angeles';
timezone
------------------------
2005-04-03 07:00:00-04
(1 row)
test=> select (CURRENT_DATE + '05:00'::time)::timestamp at time zone
'Canada/Pacific';
timezone
------------------------
2005-07-22 08:00:00-04
(1 row)
test=> select ('2005-07-20 00:00:00'::timestamp without time zone) at
time zone 'Europe/Paris';
timezone
------------------------
2005-07-19 22:00:00-04
Udpate documentation.
coding would ignore startup cost differences of less than 1% of the
estimated total cost; which was OK for normal planning but highly not OK
if a very small LIMIT was applied afterwards, so that startup cost becomes
the name of the game. Instead, compare startup and total costs fuzzily
but independently. This changes the plan selected for two queries in the
regression tests; adjust expected-output files for resulting changes in
row order. Per reports from Dawid Kuroczko and Sam Mason.
24 hours. This is very helpful for daylight savings time:
select '2005-05-03 00:00:00 EST'::timestamp with time zone + '24 hours';
?column?
----------------------
2005-05-04 01:00:00-04
select '2005-05-03 00:00:00 EST'::timestamp with time zone + '1 day';
?column?
----------------------
2005-05-04 01:00:00-04
Michael Glaesemann
test=> select '4 months'::interval / 5;
?column?
---------------
1 mon -6 days
(1 row)
after:
test=> select '4 months'::interval / 5;
?column?
----------
24 days
(1 row)
The problem was the use of rint() to round, and then find the remainder,
causing the negative values.
output targetlist of the Unique or HashAgg plan. This code was OK when
written, but subsequent changes to use "physical tlists" where possible
had broken it: given an input subplan that has extra variables added to
avoid a projection step, it would copy those extra variables into the
upper tlist, which is pointless since a projection has to happen anyway.
I have seen this case in CVS tip due to new "physical tlist" optimization
for subqueries. I believe it probably can't happen in existing releases,
but the check is not going to hurt anything, so backpatch to 8.0 just
in case.
cases: we can't just consider whether the subquery's output is unique on its
own terms, we have to check whether the set of output columns we are going to
use will be unique. Per complaint from Luca Pireddu and test case from
Michael Fuhr.
requiring superuserness always, allow an owner to reassign ownership
to any role he is a member of, if that role would have the right to
create a similar object. These three requirements essentially state
that the would-be alterer has enough privilege to DROP the existing
object and then re-CREATE it as the new role; so we might as well
let him do it in one step. The ALTER TABLESPACE case is a bit
squirrely, but the whole concept of non-superuser tablespace owners
is pretty dubious anyway. Stephen Frost, code review by Tom Lane.
optional arguments as text input functions, ie, typioparam OID and
atttypmod. Make all the datatypes that use typmod enforce it the same
way in typreceive as they do in typinput. This fixes a problem with
failure to enforce length restrictions during COPY FROM BINARY.
The specification of this function is as follows.
regexp_replace(source text, pattern text, replacement text, [flags
text])
returns text
Replace string that matches to regular expression in source text to
replacement text.
- pattern is regular expression pattern.
- replacement is replace string that can use '\1'-'\9', and '\&'.
'\1'-'\9': back reference to the n'th subexpression.
'\&' : entire matched string.
- flags can use the following values:
g: global (replace all)
i: ignore case
When the flags is not specified, case sensitive, replace the first
instance only.
Atsushi Ogawa
XLOG_DBASE_DROP_OLD WAL records -- these records are no longer created in
current sources. Adjust numbering of XLOG_DBASE_CREATE and XLOG_DBASE_DROP
and bump the catversion. Patch from Gavin Sherry, adjusted by Neil Conway.
have adequate mechanisms for tracking the contents of databases and
tablespaces). This solves the longstanding problem that you can drop a
user who still owns objects and/or has access permissions.
Alvaro Herrera, with some kibitzing from Tom Lane.
The content of the patch is as follows:
(1)Create shortcut when subtext was not found.
(2)Stop using LEFT and RIGHT macro.
In LEFT and RIGHT macro, TEXTPOS is executed by the same content as
execution immediately before. The execution frequency of TEXTPOS can be
reduced by using text_substring instead of LEFT and RIGHT macro.
(3)Add appendStringInfoText, and use it instead of
appendStringInfoString.
There is an overhead of PG_TEXT_GET_STR when appendStringInfoString is
executed by text type. This can be reduced by appendStringInfoText.
(4)Reduce execution of TEXTDUP.
The effect of the patch that I measured is as follows:
- The Data for test was created by 'pgbench -i'.
- Test SQL:
select replace(aid, '9', 'A') from accounts;
- Test results: Linux(CPU: Pentium III, Compiler option: -O2)
original: 1.515s
patched: 1.250s
Atsushi Ogawa
chdir into PGDATA and subsequently use relative paths instead of absolute
paths to access all files under PGDATA. This seems to give a small
performance improvement, and it should make the system more robust
against naive DBAs doing things like moving a database directory that
has a live postmaster in it. Per recent discussion.
able to do this before, but I had tried to make an exception for functions
with OUT parameters. Michael Fuhr found one problem with it already, and
I found another, which was it didn't work for strict functions with a
NULL input. While both of these could be worked around, the probability
that there are more gotchas seems high; I think prudence dictates just
reverting to the former behavior for now. Accordingly, remove the kluge
added to get_expr_result_type() for Michael's case.
propagated inside an outer join. In particular, given
LEFT JOIN ON (A = B) WHERE A = constant, we cannot conclude that
B = constant at the top level (B might be null instead), but we
can nonetheless put a restriction B = constant into the quals for
B's relation, since no inner-side rows not meeting that condition
can contribute to the final result. Similarly, given
FULL JOIN USING (J) WHERE J = constant, we can't directly conclude
that either input J variable = constant, but it's OK to push such
quals into each input rel. Per recent gripe from Kim Bisgaard.
Along the way, remove 'valid_everywhere' flag from RestrictInfo,
as on closer analysis it was not being used for anything, and was
defined backwards anyway.
basic regression tests for GiST to the standard regression tests.
I took the opportunity to add an rtree-equivalent gist opclass for
circles; the contrib version only covered boxes and polygons, but
indexing circles is very handy for distance searches.
the difference between checkpoints forced due to WAL segment consumption
and checkpoints forced for other reasons (such as CREATE DATABASE). Avoid
generating 'checkpoints are occurring too frequently' messages when the
checkpoint wasn't caused by WAL segment consumption. Per gripe from
Chris K-L.
current time: provide a GetCurrentTimestamp() function that returns
current time in the form of a TimestampTz, instead of separate time_t
and microseconds fields. This is what all the callers really want
anyway, and it eliminates low-level dependencies on AbsoluteTime,
which is a deprecated datatype that will have to disappear eventually.
role memberships; make superuser/createrole distinction do something
useful; fix some locking and CommandCounterIncrement issues; prevent
creation of loops in the membership graph.
syntactic conflicts, both privilege and role GRANT/REVOKE commands have
to use the same production for scanning the list of tokens that might
eventually turn out to be privileges or role names. So, change the
existing GRANT/REVOKE code to expect a list of strings not pre-reduced
AclMode values. Fix a couple other minor issues while at it, such as
InitializeAcl function name conflicting with a Windows system function.
and pg_auth_members. There are still many loose ends to finish in this
patch (no documentation, no regression tests, no pg_dump support for
instance). But I'm going to commit it now anyway so that Alvaro can
make some progress on shared dependencies. The catalog changes should
be pretty much done.
- full concurrency for insert/update/select/vacuum:
- select and vacuum never locks more than one page simultaneously
- select (gettuple) hasn't any lock across it's calls
- insert never locks more than two page simultaneously:
- during search of leaf to insert it locks only one page
simultaneously
- while walk upward to the root it locked only parent (may be
non-direct parent) and child. One of them X-lock, another may
be S- or X-lock
- 'vacuum full' locks index
- improve gistgetmulti
- simplify XLOG records
Fix bug in index_beginscan_internal: LockRelation may clean
rd_aminfo structure, so move GET_REL_PROCEDURE after LockRelation
with a table that has a small predicted size. Avoids wasting several
hundred K on the timezone hash table, which is likely to have only one
or a few entries, but the entries use up 10Kb apiece ...
with main, avoid using a SQL-defined SQLSTATE for what is most definitely
not a SQL-compatible error condition, fix documentation omissions,
adhere to message style guidelines, don't use two GUC_REPORT variables
when one is sufficient. Nothing done about pg_dump issues.
literally.
Add GUC variables:
"escape_string_warning" - warn about backslashes in non-E strings
"escape_string_syntax" - supports E'' syntax?
"standard_compliant_strings" - treats backslashes literally in ''
Update code to use E'' when escapes are used.
should fix the recent reports of "index is not a btree" failures,
as well as preventing a more obscure race condition involving changes
to a template database just after copying it with CREATE DATABASE.
to the existing X-direction tests. An rtree class now includes 4 actual
2-D tests, 4 1-D X-direction tests, and 4 1-D Y-direction tests.
This involved adding four new Y-direction test operators for each of
box and polygon; I followed the PostGIS project's lead as to the names
of these operators.
NON BACKWARDS COMPATIBLE CHANGE: the poly_overleft (&<) and poly_overright
(&>) operators now have semantics comparable to box_overleft and box_overright.
This is necessary to make r-tree indexes work correctly on polygons.
Also, I changed circle_left and circle_right to agree with box_left and
box_right --- formerly they allowed the boundaries to touch. This isn't
actually essential given the lack of any r-tree opclass for circles, but
it seems best to sync all the definitions while we are at it.
CURRENT_TIME, and LOCALTIME: now they just produce "timestamptz" not
"timestamptz(6)", etc. This makes the behavior more consistent with our
choice to not assign a specific default precision to column datatypes.
It should also save a few cycles at runtime due to not having to invoke
the round-to-given-precision functions.
I also took the opportunity to translate CURRENT_TIMESTAMP into "now()"
instead of an invocation of the timestamptz input converter --- this should
save a few cycles too.
polygon operators (<<, &<, >>, &>). Per ideas originally put forward
by andrew@supernews and later rediscovered by moi. This patch just
fixes the existing opclasses, and does not add any new behavior as I
proposed earlier; that can be sorted out later. In principle this
could be back-patched, since it changes only search behavior and not
system catalog entries nor rtree index contents. I'm not currently
planning to do that, though, since I think it could use more testing.
in the database. The old behavior (reindex system catalogs only) is now
available as REINDEX SYSTEM. I did not add the complementary REINDEX USER
case since there did not seem to be consensus for this, but it would be
trivial to add later. Per recent discussions.
argument list contains parameter symbols ($n) declared as type VOID,
discard these arguments. This allows the driver to avoid renumbering
mixed IN and OUT argument placeholders (the JDBC syntax involves writing
? for both IN and OUT parameters, but on the server side we don't think
that OUT parameters are arguments). This doesn't break any currently-
useful cases since VOID is not used as an input argument type.
is redundant after a check has already been made for "num < 0". The "set"
variable can also be removed, as it is now no longer used. Per checking
with Karel, this is the right fix.
Per Coverity static analysis performed by EnterpriseDB.
unlike template0 and template1 does not have any special status in
terms of backend functionality. However, all external utilities such
as createuser and createdb now connect to "postgres" instead of
template1, and the documentation is changed to encourage people to use
"postgres" instead of template1 as a play area. This should fix some
longstanding gotchas involving unexpected propagation of database
objects by createdb (when you used template1 without understanding
the implications), as well as ameliorating the problem that CREATE
DATABASE is unhappy if anyone else is connected to template1.
Patch by Dave Page, minor editing by Tom Lane. All per recent
pghackers discussions.
malformed ident map file. This was introduced by the linked list
rewrite in 8.0 -- mea maxima culpa.
Per Coverity static analysis performed by EnterpriseDB.
only used in one branch of an if statement, so we can move its
declaration to that block. This also avoids an unnecessary syscache
lookup.
Per Coverity static analysis performed by EnterpriseDB.
(due to the preceding strlen(), for example), so we needn't recheck this
before invoking pg_mbcliplen().
Per Coverity static analysis performed by EnterpriseDB.
(a/k/a SELECT INTO). Instead, flush and fsync the whole relation before
committing. We do still need the WAL log when PITR is active, however.
Simon Riggs and Tom Lane.
2. improve vacuum for gist
- use FSM
- full vacuum:
- reforms parent tuple if it's needed
( tuples was deleted on child page or parent tuple remains invalid
after crash recovery )
- truncate index file if possible
3. fixes bugs and mistakes
scankeys arrays that it needs can never have more than INDEX_MAX_KEYS
entries, so it's reasonable to just allocate them as fixed-size local
arrays, and save the cost of palloc/pfree. Not a huge savings, but
a cycle saved is a cycle earned ...
includes error checking and an appropriate ereport(ERROR) message.
This gets rid of rather tedious and error-prone manipulation of errno,
as well as a Windows-specific bug workaround, at more than a dozen
call sites. After an idea in a recent patch by Heikki Linnakangas.
given reasonably short lifespans for prepared transactions, this should
mean that only a small minority of state files ever need to be fsynced
at all. Per discussion with Heikki Linnakangas.
not memcpy() to copy the offered key into the hash table during HASH_ENTER.
This avoids possible core dump if the passed key is located very near the
end of memory. Per report from Stefan Kaltenbrunner.
old suggestion by Oliver Jowett. Also, add a transaction column to the
pg_locks view to show the xid of each transaction holding or awaiting
locks; this allows prepared transactions to be properly associated with
the locks they own. There was already a column named 'transaction',
and I chose to rename it to 'transactionid' --- since this column is
new in the current devel cycle there should be no backwards compatibility
issue to worry about.
work if either of the join relations are empty. The logic is:
(1) if the inner relation's startup cost is less than the outer
relation's startup cost and this is not an outer join, read
a single tuple from the inner relation via ExecHash()
- if NULL, we're done
(2) read a single tuple from the outer relation
- if NULL, we're done
(3) build the hash table on the inner relation
- if hash table is empty and this is not an outer join,
we're done
(4) otherwise, do hash join as usual
The implementation uses the new MultiExecProcNode API, per a
suggestion from Tom: invoking ExecHash() now produces the first
tuple from the Hash node's child node, whereas MultiExecHash()
builds the hash table.
I had to put in a bit of a kludge to get the row count returned
for EXPLAIN ANALYZE to be correct: since ExecHash() is invoked to
return a tuple, and then MultiExecHash() is invoked, we would
return one too many tuples to EXPLAIN ANALYZE. I hacked around
this by just manually detecting this situation and subtracting 1
from the EXPLAIN ANALYZE row count.
"AT TIME ZONE", and not just the shorlist previously available. For
example:
SELECT CURRENT_TIMESTAMP AT TIME ZONE 'Europe/London';
works fine now. It will also obey whatever DST rules were in effect at
just that date, which the previous implementation did not.
It also supports the AT TIME ZONE on the timetz datatype. The whole
handling of DST is a bit bogus there, so I chose to make it use whatever
DST rules are in effect at the time of executig the query. not sure if
anybody is actuallyi *using* timetz though, it seems pretty
unpredictable just because of this...
Magnus Hagander
it is sufficient to track whether a backend holds a lock or not, and
store information about transaction vs. session locks only in the
inside-the-backend LocalLockTable. Since there can now be but one
PROCLOCK per lock per backend, LockCountMyLocks() is no longer needed,
thus eliminating some O(N^2) behavior when a backend holds many locks.
Also simplify the LockAcquire/LockRelease API by passing just a
'sessionLock' boolean instead of a transaction ID. The previous API
was designed with the idea that per-transaction lock holding would be
important for subtransactions, but now that we have subtransactions we
know that this is unwanted. While at it, add an 'isTempObject' parameter
to LockAcquire to indicate whether the lock is being taken on a temp
table. This is not used just yet, but will be needed shortly for
two-phase commit.
part of service principal. If not set, any service principal matching
an entry in the keytab can be used.
NEW KERBEROS MATCHING BEHAVIOR FOR 8.1.
Todd Kover
if geqo_rand() returns exactly 1.0, resulting in failure due to indexing
off the end of the pool array. Also, since this is using inexact float math,
it seems wise to guard against roundoff error producing values slightly
outside the expected range. Per report from bug@zedware.org.
constraint while determining whether the index sort order matches the
query's ORDER BY. This for example allows an index on (x,y) to match
... WHERE x = 42 ORDER BY y;
It only works for btree indexes, but since those are the only ones we
currently have that are ordered at all, that's good enough for now.
Per popular demand.
nonconsecutive columns of a multicolumn index, as per discussion around
mid-May (pghackers thread "Best way to scan on-disk bitmaps"). This
turns out to require only minimal changes in btree, and so far as I can
see none at all in GiST. btcostestimate did need some work, but its
original assumption that index selectivity == heap selectivity was
quite bogus even before this.
a descriptor that uses the current transaction snapshot, rather than
SnapshotNow as it did before (and still does if INV_WRITE is set).
This means pg_dump will now dump a consistent snapshot of large object
contents, as it never could do before. Also, add a lo_create() function
that is similar to lo_creat() but allows the desired OID of the large
object to be specified. This will simplify pg_restore considerably
(but I'll fix that in a separate commit).
These contain the SQLSTATE and error message of the current exception,
respectively. They are scope-local variables that are only defined
in exception handlers (so attempting to reference them outside an
exception handler is an error). Update the regression tests and the
documentation.
Also, do some minor related cleanup: export an unpack_sql_state()
function from the backend and use it to unpack a SQLSTATE into a
string, and add a free_var() function to pl_exec.c
Original patch from Pavel Stehule, review by Neil Conway.
to a subquery if the outer query is simple enough that the LIMIT can
be reflected directly to the subquery. This didn't use to be very
interesting, because a subquery that couldn't have been flattened into
the upper query was usually not going to be very responsive to
tuple_fraction anyway. But with new code that allows UNION ALL subqueries
to pay attention to tuple_fraction, this is useful to do. In particular
this lets the optimization occur when the UNION ALL is directly inside
a view.
if the limit were directly applied to it. This does not actually
add a LIMIT plan node to the generated subqueries --- that would be
useless overhead --- but it does cause the planner to prefer fast-
start plans when the limit is small. After an idea from Phil Endecott.
this in turn causes CREATE TABLE AS in plpgsql to set ROW_COUNT.
This is how it behaved before 7.4; I had unintentionally changed the
behavior in a bit of sloppy micro-optimization.
of a relation in a flat 'joininfo' list. The former arrangement grouped
the join clauses according to the set of unjoined relids used in each;
however, profiling on test cases involving lots of joins proves that
that data structure is a net loss. It takes more time to group the
join clauses together than is saved by avoiding duplicate tests later.
It doesn't help any that there are usually not more than one or two
clauses per group ...
as well as the existing pg_catalog entries for prefix and postfix %.
These have never been documented, though they did appear in one old
regression test. This avoids surprising behavior in cases like
"SELECT -25 % -10". Per recent discussion.
Note: although there is a catalog change here, I did not force initdb
since there's no harm in leaving the inaccessible entries in one's
copy of pg_operator.
transaction IDs, rather than like subtrans; in particular, the information
now survives a database restart. Per previous discussion, this is
essential for PITR log shipping and for 2PC.
last nextval() or setval() performed by the current session. Update the
docs, add regression tests, and bump the catalog version. Patch from
Dennis Björklund, various improvements by Neil Conway.
---------------------------------------------------------------------------
While playing around, I got the following error message:
--
FATAL: pre-existing shared memory block (key 5432001, ID 90898435) is
still in use
HINT: If you're sure there are no old server processes still running,
remove the shared memory block with the command "ipcrm", or just delete
the file "/home/hlinnaka/pgsql/data/postmaster.pid".
---
Thats normal because I used "kill -9 postmaster" to shut down.
The hint advises me to use "ipcrm", but there's the "ipcclean" script in
bin for just this purpose. The hint should probably advise to use
ipcclean.
The attached patch replaces all occurances of "ipcrm" with "ipcclean" in
src/backend/utils/init/miscinit.c and all the translations in
src/backend/po.
While reviewing the patch, I noticed a likely typo in hr.po. While I
don't
speak Croatian, the translation seems to advise to use the "icpm(1)"
command. I changed that to "ipcclean" too.
Heikki Linnakangas
up have the standard layout with unused space between pd_lower and pd_upper.
When this is set, XLogInsert will omit the unused space without bothering
to scan it to see if it's zero. That saves time in XLogInsert, and also
allows reversion of my earlier patch to make PageRepairFragmentation et al
explicitly re-zero freed space. Per suggestion by Heikki Linnakangas.
other_rel_list with a single array indexed by rangetable index.
This reduces find_base_rel from O(N) to O(1) without any real penalty.
While find_base_rel isn't one of the major bottlenecks in any profile
I've seen so far, it was starting to creep up on the radar screen
for complex queries --- so might as well fix it.
a new PlannerInfo struct, which is passed around instead of the bare
Query in all the planning code. This commit is essentially just a
code-beautification exercise, but it does open the door to making
larger changes to the planner data structures without having to muck
with the widely-known Query struct.
representation as the jointree) with two lists of RTEs, one showing
the RTEs accessible by qualified names, and the other showing the RTEs
accessible by unqualified names. I think this is conceptually simpler
than what we did before, and it's sure a whole lot easier to search.
This seems to eliminate the parse-time bottleneck for deeply nested
JOIN structures that was exhibited by phil@vodafone.
---------------------------------------------------------------------------
Tom Lane <tgl@sss.pgh.pa.us> writes:
> a_ogawa <a_ogawa@hi-ho.ne.jp> writes:
> > It is a reasonable idea. However, the majority part of MemSet was not
> > able to be avoided by this idea. Because the per-tuple contexts are used
> > at the early stage of executor.
>
> Drat. Well, what about changing that? We could introduce additional
> contexts or change the startup behavior so that the ones that are
> frequently reset don't have any data in them unless you are working
> with pass-by-ref values inside the inner loop.
That might be possible. However, I think that we should change only
aset.c about this article.
I thought further: We can check whether context was used from the last
reset even when blocks list is not empty. Please see attached patch.
postgresql.conf.
---------------------------------------------------------------------------
Here's an updated version of the patch, with the following changes:
1) No longer uses "service name" as "application version". It's instead
hardcoded as "postgres". It could be argued that this part should be
backpatched to 8.0, but it doesn't make a big difference until you can
start changing it with GUC / connection parameters. This change only
affects kerberos 5, not 4.
2) Now downcases kerberos usernames when the client is running on win32.
3) Adds guc option for "krb_caseins_users" to make the server ignore
case mismatch which is required by some KDCs such as Active Directory.
Off by default, per discussion with Tom. This change only affects
kerberos 5, not 4.
4) Updated so it doesn't conflict with the rendevouz/bonjour patch
already in ;-)
Magnus Hagander
> a_ogawa <a_ogawa@hi-ho.ne.jp> writes:
> > It is a reasonable idea. However, the majority part of MemSet was not
> > able to be avoided by this idea. Because the per-tuple contexts are used
> > at the early stage of executor.
>
> Drat. Well, what about changing that? We could introduce additional
> contexts or change the startup behavior so that the ones that are
> frequently reset don't have any data in them unless you are working
> with pass-by-ref values inside the inner loop.
That might be possible. However, I think that we should change only
aset.c about this article.
I thought further: We can check whether context was used from the last
reset even when blocks list is not empty. Please see attached patch.
The effect of the patch that I measured is as follows:
o Execution time that executed the SQL ten times.
(1)Linux(CPU: Pentium III, Compiler option: -O2)
- original: 24.960s
- patched : 23.114s
(2)Linux(CPU: Pentium 4, Compiler option: -O2)
- original: 8.730s
- patched : 7.962s
(3)Solaris(CPU: Ultra SPARC III, Compiler option: -O2)
- original: 37.0s
- patched : 33.7s
Atsushi Ogawa (a_ogawa)
RTE of interest, rather than the whole rangetable list. This makes
the API more understandable and avoids duplicate RTE lookups. This
patch reverts no-longer-needed portions of my patch of 2004-08-19.
performance problem pointed out by phil@vodafone: to wit, we were
spending O(N^2) time to check dropped-ness in an N-deep join tree,
even in the case where the tree was freshly constructed and couldn't
possibly mention any dropped columns. Instead of recursing in
get_rte_attribute_is_dropped(), change the data structure definition:
the joinaliasvars list of a JOIN RTE must have a NULL Const instead
of a Var at any position that references a now-dropped column. This
costs nothing during normal parse-rewrite-plan path, and instead we
have a linear-time update to make when loading a stored rule that
might contain now-dropped columns. While at it, move the responsibility
for acquring locks on relations referenced by rules into this separate
function (which I therefore chose to call AcquireRewriteLocks).
This saves effort --- namely, duplicated lock grabs in parser and rewriter
--- in the normal path at a cost of one extra non-locked heap_open()
in the stored-rule path; seems a good tradeoff. A fringe benefit is
that it is now *much* clearer that we acquire lock on relations referenced
in rules before we make any rewriter decisions based on their properties.
(I don't know of any bug of that ilk, but it wasn't exactly clear before.)
to just around the bare recv() call that gets a command from the client.
The former placement in PostgresMain was unsafe because the intermediate
processing layers (especially SSL) use facilities such as malloc that are
not necessarily re-entrant. Per report from counterstorm.com.
Instead of a separate CRC on each backup block, include backup blocks
in their parent WAL record's CRC; this is important to ensure that the
backup block really goes with the WAL record, ie there was not a page
tear right at the start of the backup block. Implement a simple form
of compression of backup blocks: drop any run of zeroes starting at
pd_lower, so as not to store the unused 'hole' that commonly exists in
PG heap and index pages. Tweak PageRepairFragmentation and related
routines to ensure they keep the unused space zeroed, so that the above
compression method remains effective. All per recent discussions.
and RelationNameGetTupleDesc() as deprecated; remove uses of the
latter in the contrib library. Along the way, clean up crosstab()
code and documentation a little.
key, compare the new and old row versions. If the foreign key column has
not changed, we needn't enqueue the trigger, since the update cannot
violate the foreign key. This optimization was previously applied in the
RI trigger function, but it is more efficient to avoid firing the trigger
altogether. Per recent discussion on pgsql-hackers.
Also add a regression test for some unintuitive foreign key behavior, and
refactor some code that deals with the OIDs of the various RI trigger
functions.
keys, rather than a single trigger for both events. This should not change
functionality, but it is more consistent: previously, there were trigger
functions for both "check_insert" and "check_update", but the former was
used for both events.
Bump catalog version number (not strictly necessary, but best to be
cautious).
would be evaluated only once anyway (ie, it's just a SELECT with no
FROM or an INSERT ... VALUES). The planner can't do it any faster than
the executor, so no point in an extra copying of the expression tree.
pg_class_aclmask(). We only need to do this when we have to check
pg_shadow.usecatupd, and that's not relevant unless the target table
is a system catalog. So we can usually avoid one syscache lookup.
are now reported via elog, eliminating the need to test the result code
at most call sites. Make it possible for the caller to distinguish a
freshly acquired lock from one already held in the current transaction.
Use that capability to avoid redundant AcceptInvalidationMessages() calls
in LockRelation().
status of the most recently queried userid. Since the common pattern is
many successive queries about the same user (ie, the current user) this
can save a lot of syscache probes.
were pretty expensive and I believe the case they were put in to
defend against can no longer arise, now that we have dependency checks
to prevent deletion of a type entry that is still referenced. Certainly
the example given in the CVS log entry can't happen anymore.
Since this was the only use of typeidIsValid(), remove the routine too.
to columns of an RTE that was a function returning RECORD with a column
definition list. Apparently no one has tried to use non-default typmod
with a function returning RECORD before.
spotted by Qingqing Zhou. The HASH_ENTER action now automatically
fails with elog(ERROR) on out-of-memory --- which incidentally lets
us eliminate duplicate error checks in quite a bunch of places. If
you really need the old return-NULL-on-out-of-memory behavior, you
can ask for HASH_ENTER_NULL. But there is now an Assert in that path
checking that you aren't hoping to get that behavior in a palloc-based
hash table.
Along the way, remove the old HASH_FIND_SAVE/HASH_REMOVE_SAVED actions,
which were not being used anywhere anymore, and were surely too ugly
and unsafe to want to see revived again.
hash table. This is a pretty unlikely scenario, since the table
should be tiny, but we can't guarantee continued correct operation
if it does occur. Spotted by Qingqing Zhou.
from a RECORD Const node, because that's what it may be faced with
after constant-folding of a function returning RECORD. Per example
from Michael Fuhr.
routines in the index's relcache entry, instead of doing a fresh fmgr_info
on every index access. We were already doing this for the index's opclass
support functions; not sure why we didn't think to do it for the AM
functions too. This supersedes the former method of caching (only)
amgettuple in indexscan scan descriptors; it's an improvement because the
function lookup can be amortized across multiple statements instead of
being repeated for each statement. Even though lookup for builtin
functions is pretty cheap, this seems to drop a percent or two off some
simple benchmarks.
Display only 9 not 10 digits of precision for timestamp values when
using non-integer timestamps. This prevents the display of rounding
errors for common values like days < 32.
working buffer into ParseDateTime() and reject too-long input there,
rather than checking the length of the input string before calling
ParseDateTime(). The old method was bogus because ParseDateTime() can use
a variable amount of working space, depending on the content of the
input string (e.g. how many fields need to be NUL terminated). This fixes
a minor stack overrun -- I don't _think_ it's exploitable, although I
won't claim to be an expert.
Along the way, fix a bug reported by Mark Dilger: the working buffer
allocated by interval_in() was too short, which resulted in rejecting
some perfectly valid interval input values. I added a regression test for
this fix.
scanner anyway) to avoid having any backup states. According to the
flex manual, this should speed things up, and indeed the backend scanner
is about a third faster according to some quick profiling checks.
I haven't tried to measure the speed change in psql, but it probably
is similar.
using pg_mblen. Therefore, pg_mblen is executed many times, and it
becomes a bottleneck.
This patch makes a short cut, and reduces execution frequency of
pg_mblen by comparing the first byte first.
a_ogawa
where there was also a WHERE-clause restriction that applied to the
join. The check on restrictlist == NIL is really unnecessary anyway,
because select_mergejoin_clauses already checked for and complained
about any unmergejoinable join clauses. So just take it out.
if they are two-byte multibyte characters. Same thing can be happen
if octet_length(multibyte_chars) == n where n is char(n).
Long standing bug since 7.3 days. Per report and fix from Yoshiyuki Asaba.
that we acquire a lock on relations added to the query due to inheritance.
Formerly, no such lock was held throughout planning, which meant that
a schema change could occur to invalidate the plan before it's even
been completed.
aren't doing anything useful (ie, neither selection nor projection).
Also, extend to SubqueryScan the hacks already in place to avoid
unnecessary ExecProject calls when the result would just be the same
tuple the subquery already delivered. This saves some overhead in
UNION and other set operations, as well as avoiding overhead for
unflatten-able subqueries. Per example from Sokolov Yura.
from Abhijit Menon-Sen, minor editorialization from Neil Conway. Also,
improve md5(text) to allocate a constant-sized buffer on the stack
rather than via palloc.
Catalog version bumped.
Also, remove the rather useless return value of LockReleaseAll. Change
response to detection of corruption in the shared lock tables to PANIC,
since that is the only way of cleaning up fully.
Originally an idea of Heikki Linnakangas, variously hacked on by
Alvaro Herrera and Tom Lane.
communication structure, and make it its own module with its own lock.
This should reduce contention at least a little, and it definitely makes
the code seem cleaner. Per my recent proposal.
external projects, we should be careful about what parts of the GiST
API are considered implementation details, and which are part of the
public API. Therefore, I've moved internal-only declarations into
gist_private.h -- future backward-incompatible changes to gist.h should
be made with care, to avoid needlessly breaking external GiST extensions.
Also did some related header cleanup: remove some unnecessary #includes
from gist.h, and remove some unused definitions: isAttByVal(), _gistdump(),
and GISTNStrategies.
- make sure we always invoke user-supplied GiST methods in a short-lived
memory context. This means the backend isn't exposed to any memory leaks
that be in those methods (in fact, it is probably a net loss for most
GiST methods to bother manually freeing memory now). This also means
we can do away with a lot of ugly manual memory management in the
GiST code itself.
- keep the current page of a GiST index scan pinned, rather than doing a
ReadBuffer() for each tuple produced by the scan. Since ReadBuffer() is
expensive, this is a perf. win
- implement dead tuple killing for GiST indexes (which is easy to do, now
that we keep a pin on the current scan page). Now all the builtin indexes
implement dead tuple killing.
- cleanup a lot of ugly code in GiST
than one heap page represented in the bitmap. This is a bit ugly but
it cuts overhead fairly effectively in simple join cases. Per example
from Sergey Koposov.
in an inconsistent state. (This is only latent because in reality
ExecSeqRestrPos is dead code at the moment ... but someday maybe it won't
be.) Add some comments about what the API for plan node mark/restore
actually is, because it's not immediately obvious.
when the blocks list is empty (there can surely be no freelist items if
the context contains no memory), and use MemSetAligned not MemSet to
clear the headers (we assume alignof(pointer) >= alignof(int32)).
Per discussion with Atsushi Ogawa. He proposes some further hacking
that I'm not yet sold on, but these two changes are unconditional wins
since there is no case in which they make things slower.
When one side of the join has a NULL, we don't want to uselessly try
to match it against every remaining tuple of the other side. While
at it, rewrite the comparison machinery to avoid multiple evaluations
of the left and right input expressions and to use a btree comparator
where available, instead of double operator calls. Also revise the
state machine to eliminate redundant comparisons and hopefully make it
more readable too.
AtCommit_Portals is restarted when a portal is deleted. This is
necessary since the deletion of a portal may cause the deletion of
another which on rare occations may cause the iterator to return a
deleted portal an thus a renewed attempt delete.
Thomas Hallgren
methods: they all invoke UpdateStats() since they have computed the
number of heap tuples, so I created a function in catalog/index.c that
each AM now calls.
collector messages, per recent discussion on pgsql-patches. This
actually required quite a few changes -- for example,
"databaseid != InvalidOid" was used to check whether a slot in the
backend entry table was initialized, but that no longer works since
the slot might be initialized prior to receiving the BESTART message
which contains the database id. We now use procpid > 0 to indicate
that a slot is non-empty.
Other changes:
- various comment improvements and cleanups
- there's no need to zero-out the entire activity buffer in
pgstat_add_backend(), we can just set activity[0] to '\0'.
- remove the counting of the # of connections to a database; this
was not used anywhere
One change in behavior I wasn't sure about: previously, the code
would create a hash table entry for a database as soon as any message
was received whose header referenced that database. Now, we only
create hash table entries as needed (so for example BESTART won't
create a database hash table entry, since it doesn't need to
access anything in the per-db hash table). It would be easy enough
to retain the old behavior, but AFAICS it is not required.
memset() or MemSet() to a char *. For one, memset()'s first argument is
a void *, and further void * can be implicitly coerced to/from any other
pointer type.
* Add session start time to pg_stat_activity
* Add the client IP address and port to pg_stat_activity
Original patch from Magnus Hagander, code review by Neil Conway. Catalog
version bumped. This patch sends the client IP address and port number in
every statistics message; that's not ideal, but will be fixed up shortly.
and VACUUM: in the interval between adding a new page to the relation
and formatting it, it was possible for VACUUM to come along and decide
it should format the page too. Though not harmful in itself, this would
cause data loss if a third transaction were able to insert tuples into
the vacuumed page before the original extender got control back.
before we check commit/abort status. Formerly this was done in some paths
but not all, with the result that a transaction might be considered
committed for some purposes before it became committed for others.
Per example found by Jan Wieck.
which is neither needed by nor related to that header. Remove the bogus
inclusion and instead include the header in those C files that actually
need it. Also fix unnecessary inclusions and bad inclusion order in
tsearch2 files.
associated with a hashtable is allocated in that hashtable's private
context, so that hash_destroy only has to destroy the context and not
do any retail pfree's; and tighten the inner loop of hash_seq_search.
startup to end, rather than re-opening it in each MultiExecBitmapIndexScan
call. I had foolishly thought that opening/closing wouldn't be much
more expensive than a rescan call, but that was sheer brain fade.
This seems to fix about half of the performance lossage reported by
Sergey Koposov. I'm still not sure where the other half went.
moment this has no particular use except to allow table rows to be
passed to record_out(), but that case seems to be useful in itself
per recent example from Elein. Further down the road we could look
at letting PL functions be declared to accept RECORD parameters.
output area as INTERNAL not CSTRING. This is to prevent people from
calling the functions by hand. This is a permanent solution for the
back branches but I hope it is just a stopgap for HEAD.
that return INTERNAL without also having INTERNAL arguments. Since the
functions in question aren't meant to be called by hand anyway, I just
redeclared them to take 'internal' instead of 'text'. Also add code
to ProcedureCreate() to enforce the restriction, as I should have done
to start with :-(
to produce when running the executor. This is consistent with the internal
executor APIs (such as ExecutorRun), which also use a long for this purpose.
It also allows FETCH_ALL to be passed -- since FETCH_ALL is defined as
LONG_MAX, this wouldn't have worked on platforms where int and long are of
different sizes. Per report from Tzahi Fadida.
only one argument. (Per recent discussion, the option to accept multiple
arguments is pretty useless for user-defined types, and would be a likely
source of security holes if it was used.) Simplify call sites of
output/send functions to not bother passing more than one argument.
record object itself, rather than relying on a second OID argument to be
correct. This patch just changes the function behavior and not the
catalogs, so it's OK to back-patch to 8.0. Will remove the now-redundant
second argument in pg_proc in a separate patch in HEAD only.
is contention for a tuple-level lock. This solves the problem of a
would-be exclusive locker being starved out by an indefinite succession
of share-lockers. Per recent discussion with Alvaro.
a warning when a variable is used as a format string for printf()
and similar functions (if the variable is derived from untrusted
data, it could include unexpected formatting sequences). This
emits too many warnings to be enabled by default, but it does
flag a few dubious constructs in the Postgres tree. This patch
fixes up the obvious variants: functions that are passed a variable
format string but no additional arguments.
Most of these are harmless (e.g. the ruleutils stuff), but there
is at least one actual bug here: if you create a trigger named
"%sfoo", pg_dump will read uninitialized memory and fail to dump
the trigger correctly.
Essentially, we shoehorn in a lockable-object-type field by taking
a byte away from the lockmethodid, which can surely fit in one byte
instead of two. This allows less artificial definitions of all the
other fields of LOCKTAG; we can get rid of the special pg_xactlock
pseudo-relation, and also support locks on individual tuples and
general database objects (including shared objects). None of those
possibilities are actually exploited just yet, however.
I removed pg_xactlock from pg_class, but did not force initdb for
that change. At this point, relkind 's' (SPECIAL) is unused and
could be removed entirely.
to eliminate unnecessary deadlocks. This commit adds SELECT ... FOR SHARE
paralleling SELECT ... FOR UPDATE. The implementation uses a new SLRU
data structure (managed much like pg_subtrans) to represent multiple-
transaction-ID sets. When more than one transaction is holding a shared
lock on a particular row, we create a MultiXactId representing that set
of transactions and store its ID in the row's XMAX. This scheme allows
an effectively unlimited number of row locks, just as we did before,
while not costing any extra overhead except when a shared lock actually
has to be shared. Still TODO: use the regular lock manager to control
the grant order when multiple backends are waiting for a row lock.
Alvaro Herrera and Tom Lane.
or bitmap), use pred_test to be a little smarter about cases where a
filter clause is logically unnecessary. This may be overkill for the
plain indexscan case, but it's definitely useful for OR'd bitmap scans.
node, as this behavior is now better done as a bitmap OR indexscan.
This allows considerable simplification in nodeIndexscan.c itself as
well as several planner modules concerned with indexscan plan generation.
Also we can improve the sharing of code between regular and bitmap
indexscans, since they are now working with nigh-identical Plan nodes.
but just to open and close it during MultiExecBitmapIndexScan. This
avoids acquiring duplicate resources (eg, multiple locks on the same
relation) in a tree with many bitmap scans. Also, don't bother to
lock the parent heap at all here, since we must be underneath a
BitmapHeapScan node that will be holding a suitable lock.
of timetz values misbehaved in --enable-integer-datetime cases, and
EXTRACT(EPOCH) subtracted the zone instead of adding it in all cases.
Backpatch to all supported releases (except --enable-integer-datetime code
does not exist in 7.2).
As I pointed out a few days ago, this code has failed to do anything useful
for some time ... and if we did want to revive the capability to select
functions by nearness of inheritance ancestry, this is the wrong place
and way to do it anyway. The knowledge would need to go into
func_select_candidate() instead. Perhaps someday someone will be motivated
to do that, but I am not today.
ExprContexts will be freed anyway when FreeExecutorState() is reached,
and letting that routine do the work is more efficient because it will
automatically free the ExprContexts in reverse creation order. The
existing coding was effectively freeing them in exactly the worst
possible order, resulting in O(N^2) behavior inside list_delete_ptr,
which becomes highly visible in cases with a few thousand plan nodes.
ExecFreeExprContext is now effectively a no-op and could be removed,
but I left it in place in case we ever want to put it back to use.
c_expr. Perhaps the restriction was once needed to avoid bison errors,
but it seems to work just fine now --- and even generates a slightly
smaller state machine. This change allows examples like
SELECT '13:45'::timetz AT TIME ZONE '-07:00'::interval;
to work without parentheses around the right-hand input.
code in prepqual.c had a small drawback: the flatten_andors code was
able to cope with deeply nested AND/OR structures (like 10000 ORs in
a row), whereas eval_const_expressions tends to recurse until it
overruns the stack. Revise eval_const_expressions so that it doesn't
choke on deeply nested ANDs or ORs.
make some estimate of which available indexes to AND together, rather
than blindly taking 'em all. This could probably stand further
improvement, but it seems to do OK in simple tests.
but the code is basically working. Along the way, rewrite the entire
approach to processing OR index conditions, and make it work in join
cases for the first time ever. orindxpath.c is now basically obsolete,
but I left it in for the time being to allow easy comparison testing
against the old implementation.
logic operations during planning. Seems cleaner to create two new Path
node types, instead --- this avoids duplication of cost-estimation code.
Also, create an enable_bitmapscan GUC parameter to control use of bitmap
plans.
scans, using in-memory tuple ID bitmaps as the intermediary. The planner
frontend (path creation and cost estimation) is not there yet, so none
of this code can be executed. I have tested it using some hacked planner
code that is far too ugly to see the light of day, however. Committing
now so that the bulk of the infrastructure changes go in before the tree
drifts under me.
* Changes the APIs to the timezone functions to take a pg_tz pointer as
an argument, representing the timezone to use for the selected
operation.
* Adds a global_timezone variable that represents the current timezone
in the backend as set by SET TIMEZONE (or guc, or env, etc).
* Implements a hash-table cache of loaded tables, so we don't have to
read and parse the TZ file everytime we change a timezone. While not
necesasry now (we don't change timezones very often), I beleive this
will be necessary (or at least good) when "multiple timezones in the
same query" is eventually implemented. And code-wise, this was the time
to do it.
There are no user-visible changes at this time. Implementing the
"multiple zones in one query" is a later step...
This also gets rid of some of the cruft needed to "back out a timezone
change", since we previously couldn't check a timezone unless it was
activated first.
Passes regression tests on win32, linux (slackware 10) and solaris x86.
Magnus Hagander
return just a single tuple at a time. Currently the only such node
type is Hash, but I expect we will soon have indexscans that can return
tuple bitmaps. A side benefit is that EXPLAIN ANALYZE now shows the
correct tuple count for a Hash node.
critical and noncritical contexts (an example of noncritical being
post-checkpoint removal of dead xlog segments). In the critical cases
the CRIT_SECTION mechanism will cause ERROR to be promoted to PANIC
anyway, and in the noncritical cases we shouldn't let an error take
down the entire database. Arguably there should be *no* explicit
PANIC errors in this module, only more START/END_CRIT_SECTION calls,
but I didn't go that far. (Yet.)
when recycling a large number of xlog segments during checkpoint.
The former behavior searched from the same start point each time,
requiring O(checkpoint_segments^2) stat() calls to relocate all the
segments. Instead keep track of where we stopped last time through.
assuming comparison of atttypid is sufficient. In a dropped column
atttypid will be 0, and we'd better check the physical-storage data
to make sure the tupdescs are physically compatible.
I do not believe there is a real risk before 8.0, since before that
we only used this routine to compare successive states of the tupdesc
for a particular relation. But 8.0's typcache.c might be comparing
arbitrary tupdescs so we'd better play it safer.
whose keys are OIDs. The only one that looks particularly performance
critical is the relcache hashtable, but as long as we've got the function
we may as well use it wherever it's applicable.
indexes. Replace all heap_openr and index_openr calls by heap_open
and index_open. Remove runtime lookups of catalog OID numbers in
various places. Remove relcache's support for looking up system
catalogs by name. Bulky but mostly very boring patch ...
indexes. Extend the macros in include/catalog/*.h to carry the info
about hand-assigned OIDs, and adjust the genbki script and bootstrap
code to make the relations actually get those OIDs. Remove the small
number of RelOid_pg_foo macros that we had in favor of a complete
set named like the catname.h and indexing.h macros. Next phase will
get rid of internal use of names for looking up catalogs and indexes;
but this completes the changes forcing an initdb, so it looks like a
good place to commit.
Along the way, I made the shared relations (pg_database etc) not be
'bootstrap' relations any more, so as to reduce the number of hardwired
entries and simplify changing those relations in future. I'm not
sure whether they ever really needed to be handled as bootstrap
relations, but it seems to work fine to not do so now.
avoid encroaching on the 'user' range of OIDs by allowing automatic
OID assignment to use values below 16k until we reach normal operation.
initdb not forced since this doesn't make any incompatible change;
however a lot of stuff will have different OIDs after your next initdb.
of just a relation OID, thereby not having to open the relation for itself.
This actually saves code rather than adding it for most of the existing
callers, which had the rel open already. The main point though is to be
able to use this rather than plain addRangeTableEntry in setTargetTable,
thus saving one relation_openrv/relation_close cycle for every INSERT,
UPDATE, or DELETE. Seems to provide a several percent win on simple
INSERTs.
be supported for all datatypes. Add CREATE AGGREGATE and pg_dump support
too. Add specialized min/max aggregates for bpchar, instead of depending
on text's min/max, because otherwise the possible use of bpchar indexes
cannot be recognized.
initdb forced because of catalog changes.
into indexscans on matching indexes. For the moment, it only handles
int4 and text datatypes; next step is to add a column to pg_aggregate
so that all MIN/MAX aggregates can be handled. Per my recent proposal.
deferred triggers: either one can create more work for the other,
so we have to loop till it's all gone. Per example from andrew@supernews.
Add a regression test to help spot trouble in this area in future.
while completing execution of the cursor's query. Otherwise we get wrong
answers or even crashes from non-volatile functions called by the query.
Per report from andrew@supernews.
decides whether to use hashed grouping instead of sort-plus-uniq
grouping. The function needs an annoyingly large number of parameters,
but this still seems like a win for legibility, since it removes over
a hundred lines from grouping_planner (which is still too big :-().
the long-term plan for this behavior for quite some time, but it is only
possible now that DELETE has a USING clause so that the user can join
other tables in a DELETE statement without relying on this behavior.
in UPDATE. We also now issue a NOTICE if a query has _any_ implicit
range table entries -- in the past, we would only warn about implicit
RTEs in SELECTs with at least one explicit RTE.
As a result of the warning change, 25 of the regression tests had to
be updated. I also took the opportunity to remove some bogus whitespace
differences between some of the float4 and float8 variants. I believe
I have correctly updated all the platform-specific variants, but let
me know if that's not the case.
Original patch for DELETE ... USING from Euler Taveira de Oliveira,
reworked by Neil Conway.
functions. This patch optimizes int2_sum(), int4_sum(), float4_accum()
and float8_accum() to avoid needing to copy the transition function's
state for each input tuple of the aggregate. In an extreme case
(e.g. SELECT sum(int2_col) FROM table where table has a single column),
it improves performance by about 20%. For more complex queries or tables
with wider rows, the relative performance improvement will not be as
significant.
ExecProcNode() with a NULL value, so the test couldn't do anything
for us except maybe mask bugs. Removing it probably doesn't save
anything much either, but then again this is a hot-spot routine.
few palloc's. I also chose to eliminate the restype and restypmod fields
entirely, since they are redundant with information stored in the node's
contained expression; re-examining the expression at need seems simpler
and more reliable than trying to keep restype/restypmod up to date.
initdb forced due to change in contents of stored rules.
performance hack Tom introduced recently. This means we can avoid
copying the transition array for each input tuple if these functions
are invoked as aggregate transition functions.
To test the performance improvement, I created a 1 million row table
with a single int4 column. Without the patch, SELECT avg(col) FROM
table took about 4.2 seconds (after the data was cached); with the
patch, it took about 3.2 seconds. Naturally, the performance
improvement for a less trivial query (or a table with wider rows)
would be relatively smaller.
cases with binary-compatible relabeling. My first try was implicitly
assuming that all operators scalarineqsel is used for have binary-
compatible datatypes on both sides ... which is very wrong of course.
Per report from Michael Fuhr.