or bitmap), use pred_test to be a little smarter about cases where a
filter clause is logically unnecessary. This may be overkill for the
plain indexscan case, but it's definitely useful for OR'd bitmap scans.
node, as this behavior is now better done as a bitmap OR indexscan.
This allows considerable simplification in nodeIndexscan.c itself as
well as several planner modules concerned with indexscan plan generation.
Also we can improve the sharing of code between regular and bitmap
indexscans, since they are now working with nigh-identical Plan nodes.
but just to open and close it during MultiExecBitmapIndexScan. This
avoids acquiring duplicate resources (eg, multiple locks on the same
relation) in a tree with many bitmap scans. Also, don't bother to
lock the parent heap at all here, since we must be underneath a
BitmapHeapScan node that will be holding a suitable lock.
of timetz values misbehaved in --enable-integer-datetime cases, and
EXTRACT(EPOCH) subtracted the zone instead of adding it in all cases.
Backpatch to all supported releases (except --enable-integer-datetime code
does not exist in 7.2).
As I pointed out a few days ago, this code has failed to do anything useful
for some time ... and if we did want to revive the capability to select
functions by nearness of inheritance ancestry, this is the wrong place
and way to do it anyway. The knowledge would need to go into
func_select_candidate() instead. Perhaps someday someone will be motivated
to do that, but I am not today.
ExprContexts will be freed anyway when FreeExecutorState() is reached,
and letting that routine do the work is more efficient because it will
automatically free the ExprContexts in reverse creation order. The
existing coding was effectively freeing them in exactly the worst
possible order, resulting in O(N^2) behavior inside list_delete_ptr,
which becomes highly visible in cases with a few thousand plan nodes.
ExecFreeExprContext is now effectively a no-op and could be removed,
but I left it in place in case we ever want to put it back to use.
c_expr. Perhaps the restriction was once needed to avoid bison errors,
but it seems to work just fine now --- and even generates a slightly
smaller state machine. This change allows examples like
SELECT '13:45'::timetz AT TIME ZONE '-07:00'::interval;
to work without parentheses around the right-hand input.
code in prepqual.c had a small drawback: the flatten_andors code was
able to cope with deeply nested AND/OR structures (like 10000 ORs in
a row), whereas eval_const_expressions tends to recurse until it
overruns the stack. Revise eval_const_expressions so that it doesn't
choke on deeply nested ANDs or ORs.
make some estimate of which available indexes to AND together, rather
than blindly taking 'em all. This could probably stand further
improvement, but it seems to do OK in simple tests.
but the code is basically working. Along the way, rewrite the entire
approach to processing OR index conditions, and make it work in join
cases for the first time ever. orindxpath.c is now basically obsolete,
but I left it in for the time being to allow easy comparison testing
against the old implementation.
logic operations during planning. Seems cleaner to create two new Path
node types, instead --- this avoids duplication of cost-estimation code.
Also, create an enable_bitmapscan GUC parameter to control use of bitmap
plans.
scans, using in-memory tuple ID bitmaps as the intermediary. The planner
frontend (path creation and cost estimation) is not there yet, so none
of this code can be executed. I have tested it using some hacked planner
code that is far too ugly to see the light of day, however. Committing
now so that the bulk of the infrastructure changes go in before the tree
drifts under me.
* Changes the APIs to the timezone functions to take a pg_tz pointer as
an argument, representing the timezone to use for the selected
operation.
* Adds a global_timezone variable that represents the current timezone
in the backend as set by SET TIMEZONE (or guc, or env, etc).
* Implements a hash-table cache of loaded tables, so we don't have to
read and parse the TZ file everytime we change a timezone. While not
necesasry now (we don't change timezones very often), I beleive this
will be necessary (or at least good) when "multiple timezones in the
same query" is eventually implemented. And code-wise, this was the time
to do it.
There are no user-visible changes at this time. Implementing the
"multiple zones in one query" is a later step...
This also gets rid of some of the cruft needed to "back out a timezone
change", since we previously couldn't check a timezone unless it was
activated first.
Passes regression tests on win32, linux (slackware 10) and solaris x86.
Magnus Hagander
return just a single tuple at a time. Currently the only such node
type is Hash, but I expect we will soon have indexscans that can return
tuple bitmaps. A side benefit is that EXPLAIN ANALYZE now shows the
correct tuple count for a Hash node.
critical and noncritical contexts (an example of noncritical being
post-checkpoint removal of dead xlog segments). In the critical cases
the CRIT_SECTION mechanism will cause ERROR to be promoted to PANIC
anyway, and in the noncritical cases we shouldn't let an error take
down the entire database. Arguably there should be *no* explicit
PANIC errors in this module, only more START/END_CRIT_SECTION calls,
but I didn't go that far. (Yet.)
when recycling a large number of xlog segments during checkpoint.
The former behavior searched from the same start point each time,
requiring O(checkpoint_segments^2) stat() calls to relocate all the
segments. Instead keep track of where we stopped last time through.
assuming comparison of atttypid is sufficient. In a dropped column
atttypid will be 0, and we'd better check the physical-storage data
to make sure the tupdescs are physically compatible.
I do not believe there is a real risk before 8.0, since before that
we only used this routine to compare successive states of the tupdesc
for a particular relation. But 8.0's typcache.c might be comparing
arbitrary tupdescs so we'd better play it safer.
whose keys are OIDs. The only one that looks particularly performance
critical is the relcache hashtable, but as long as we've got the function
we may as well use it wherever it's applicable.
indexes. Replace all heap_openr and index_openr calls by heap_open
and index_open. Remove runtime lookups of catalog OID numbers in
various places. Remove relcache's support for looking up system
catalogs by name. Bulky but mostly very boring patch ...
indexes. Extend the macros in include/catalog/*.h to carry the info
about hand-assigned OIDs, and adjust the genbki script and bootstrap
code to make the relations actually get those OIDs. Remove the small
number of RelOid_pg_foo macros that we had in favor of a complete
set named like the catname.h and indexing.h macros. Next phase will
get rid of internal use of names for looking up catalogs and indexes;
but this completes the changes forcing an initdb, so it looks like a
good place to commit.
Along the way, I made the shared relations (pg_database etc) not be
'bootstrap' relations any more, so as to reduce the number of hardwired
entries and simplify changing those relations in future. I'm not
sure whether they ever really needed to be handled as bootstrap
relations, but it seems to work fine to not do so now.
avoid encroaching on the 'user' range of OIDs by allowing automatic
OID assignment to use values below 16k until we reach normal operation.
initdb not forced since this doesn't make any incompatible change;
however a lot of stuff will have different OIDs after your next initdb.
of just a relation OID, thereby not having to open the relation for itself.
This actually saves code rather than adding it for most of the existing
callers, which had the rel open already. The main point though is to be
able to use this rather than plain addRangeTableEntry in setTargetTable,
thus saving one relation_openrv/relation_close cycle for every INSERT,
UPDATE, or DELETE. Seems to provide a several percent win on simple
INSERTs.
be supported for all datatypes. Add CREATE AGGREGATE and pg_dump support
too. Add specialized min/max aggregates for bpchar, instead of depending
on text's min/max, because otherwise the possible use of bpchar indexes
cannot be recognized.
initdb forced because of catalog changes.
into indexscans on matching indexes. For the moment, it only handles
int4 and text datatypes; next step is to add a column to pg_aggregate
so that all MIN/MAX aggregates can be handled. Per my recent proposal.
deferred triggers: either one can create more work for the other,
so we have to loop till it's all gone. Per example from andrew@supernews.
Add a regression test to help spot trouble in this area in future.
while completing execution of the cursor's query. Otherwise we get wrong
answers or even crashes from non-volatile functions called by the query.
Per report from andrew@supernews.
decides whether to use hashed grouping instead of sort-plus-uniq
grouping. The function needs an annoyingly large number of parameters,
but this still seems like a win for legibility, since it removes over
a hundred lines from grouping_planner (which is still too big :-().
the long-term plan for this behavior for quite some time, but it is only
possible now that DELETE has a USING clause so that the user can join
other tables in a DELETE statement without relying on this behavior.
in UPDATE. We also now issue a NOTICE if a query has _any_ implicit
range table entries -- in the past, we would only warn about implicit
RTEs in SELECTs with at least one explicit RTE.
As a result of the warning change, 25 of the regression tests had to
be updated. I also took the opportunity to remove some bogus whitespace
differences between some of the float4 and float8 variants. I believe
I have correctly updated all the platform-specific variants, but let
me know if that's not the case.
Original patch for DELETE ... USING from Euler Taveira de Oliveira,
reworked by Neil Conway.
functions. This patch optimizes int2_sum(), int4_sum(), float4_accum()
and float8_accum() to avoid needing to copy the transition function's
state for each input tuple of the aggregate. In an extreme case
(e.g. SELECT sum(int2_col) FROM table where table has a single column),
it improves performance by about 20%. For more complex queries or tables
with wider rows, the relative performance improvement will not be as
significant.
ExecProcNode() with a NULL value, so the test couldn't do anything
for us except maybe mask bugs. Removing it probably doesn't save
anything much either, but then again this is a hot-spot routine.
few palloc's. I also chose to eliminate the restype and restypmod fields
entirely, since they are redundant with information stored in the node's
contained expression; re-examining the expression at need seems simpler
and more reliable than trying to keep restype/restypmod up to date.
initdb forced due to change in contents of stored rules.
performance hack Tom introduced recently. This means we can avoid
copying the transition array for each input tuple if these functions
are invoked as aggregate transition functions.
To test the performance improvement, I created a 1 million row table
with a single int4 column. Without the patch, SELECT avg(col) FROM
table took about 4.2 seconds (after the data was cached); with the
patch, it took about 3.2 seconds. Naturally, the performance
improvement for a less trivial query (or a table with wider rows)
would be relatively smaller.
cases with binary-compatible relabeling. My first try was implicitly
assuming that all operators scalarineqsel is used for have binary-
compatible datatypes on both sides ... which is very wrong of course.
Per report from Michael Fuhr.
exit. Without this, operations triggered during backend exit (such as
temp table deletions) won't be counted ... which given heavy usage of
temp tables can lead to pg_autovacuum falling way behind on the need
to vacuum pg_class and pg_attribute. Per reports from Steve Crawford
and others.
old comment in the code claimed that this was necessary. Since it is not
actually necessary any more, it is clearer to remove the comment and
just return NULL instead -- the return value of ExecHash() is not used.
proposal for OUT parameter support. The columns don't actually *do*
anything yet, they are just left NULLs. But I thought I'd commit this
part separately as a fairly pure example of the tasks needed when adding
a column to pg_proc or one of the other core system tables.
implement any new feature, it just pushes the 'not implemented' error
message deeper into the backend. I also tweaked the grammar to accept
Oracle-ish parameter syntax (parameter name first), as well as the
SQL99 standard syntax (parameter mode first), since it was easy and
people will doubtless try to use both anyway.
change saves a great deal of space in pg_proc and its primary index,
and it eliminates the former requirement that INDEX_MAX_KEYS and
FUNC_MAX_ARGS have the same value. INDEX_MAX_KEYS is still embedded
in the on-disk representation (because it affects index tuple header
size), but FUNC_MAX_ARGS is not. I believe it would now be possible
to increase FUNC_MAX_ARGS at little cost, but haven't experimented yet.
There are still a lot of vestigial references to FUNC_MAX_ARGS, which
I will clean up in a separate pass. However, getting rid of it
altogether would require changing the FunctionCallInfoData struct,
and I'm not sure I want to buy into that.
transaction rollback via UNDO but I think that's highly unlikely to
happen, so we may as well remove the stubs. (Someday we ought to
rip out the stub xxx_undo routines, too.) Per Alvaro.
really ought to run before canonicalize_qual, because it can now produce
forms that canonicalize_qual knows how to improve (eg, NOT clauses).
Also, because eval_const_expressions already knows about flattening
nested ANDs and ORs into N-argument form, the initial flatten_andors
pass in canonicalize_qual is now completely redundant and can be
removed. This doesn't save a whole lot of code, but the time and
palloc traffic eliminated is a useful gain on large expression trees.
access: define new index access method functions 'amgetmulti' that can
fetch multiple TIDs per call. (The functions exist but are totally
untested as yet.) Since I was modifying pg_am anyway, remove the
no-longer-needed 'rel' parameter from amcostestimate functions, and
also remove the vestigial amowner column that was creating useless
work for Alvaro's shared-object-dependencies project.
Initdb forced due to changes in pg_am.
that is 'x = true' becomes 'x' and 'x = false' becomes 'NOT x'. This isn't
all that amazingly useful in itself, but it ensures that we will recognize
the different forms as being logically equivalent when checking partial
index predicates. Per example from Patrick Clery.
structs. There are many places in the planner where we were passing
both a rel and an index to subroutines, and now need only pass the
index struct. Notationally simpler, and perhaps a tad faster.
for boolean indexes. Previously we would only use such an index with
WHERE clauses like 'indexkey = true' or 'indexkey = false'. The new
code transforms the cases 'indexkey', 'NOT indexkey', 'indexkey IS TRUE',
and 'indexkey IS FALSE' into one of these. While this is only marginally
useful in itself, I intend soon to change constant-expression simplification
so that 'foo = true' and 'foo = false' are reduced to just 'foo' and
'NOT foo' ... which would lose the ability to use boolean indexes for
such queries at all, if the indexscan machinery couldn't make the
reverse transformation.
binary-compatible relabeling of one or both operands. examine_variable
should avoid stripping RelabelType from non-variable expressions, so that
they will continue to have the correct type; and convert_to_scalar should
just use that type and ignore the other input type. This isn't perfect
but it beats failing entirely. Per example from Michael Fuhr.
when a zero-month interval is given. Per discussion with Karel.
Also, some desultory const-labeling of constant tables. More could be
done along that line.
actual number of unremoved tuples as pg_class.reltuples. The idea of
trying to estimate a steady state condition still seems attractive, but
this particular implementation crashed and burned ...
executing a statement that fires triggers. Formerly this time was
included in "Total runtime" but not otherwise accounted for.
As a side benefit, we avoid re-opening relations when firing non-deferred
AFTER triggers, because the trigger code can re-use the main executor's
ResultRelInfo data structure.
when open references remain during normal cleanup of a resource owner.
This restores the system's ability to warn about leaks to what it was
before 8.0. Not really a user-level bug, but helpful for development.
overly strong lock on pg_depend, and it wasn't closing the rel when done.
The latter bug was masked by the ResourceOwner code, which is something
that should be changed.
never-yet-vacuumed relation. This restores the pre-8.0 behavior of
avoiding seqscans during initial data loading, while still allowing
reasonable optimization after a table has been vacuumed. Several
regression test cases revert to 7.4-like behavior, which is probably
a good sign. Per gripes from Keith Browne and others.
* Touch the socket and lock file at least every hour, to
* ensure that they are not removed by overzealous /tmp-cleaning
* tasks. Set to 58 minutes so a cleaner never sees the
* file as an hour old.
currently does. This is now the default Win32 wal sync method because
we perfer o_datasync to fsync.
Also, change Win32 fsync to a new wal sync method called
fsync_writethrough because that is the behavior of _commit, which is
what is used for fsync on Win32.
Backpatch to 8.0.X.