is still needed despite cleanups in setrefs.c, because the point is to
let the inserted Result node compute a different tlist than its input
node does. Per example from Jeremy Drake.
drill down into subplan targetlists to print the referent expression for an
OUTER or INNER var in an upper plan node. Hence, make it do that always, and
banish the old hack of showing "?columnN?" when things got too complicated.
Along the way, fix an EXPLAIN bug I introduced by suppressing subqueries from
execution-time range tables: get_name_for_var_field() assumed it could look at
rte->subquery to find out the real type of a RECORD var. That doesn't work
anymore, but instead we can look at the input plan of the SubqueryScan plan
node.
and quals have varno OUTER, rather than zero, to indicate a reference to
an output of their lefttree subplan. This is consistent with the way
that every other upper-level node type does it, and allows some simplifications
in setrefs.c and EXPLAIN.
useless substructure for its RangeTblEntry nodes. (I chose to keep using the
same struct node type and just zero out the link fields for unneeded info,
rather than making a separate ExecRangeTblEntry type --- it seemed too
fragile to have two different rangetable representations.)
Along the way, put subplans into a list in the toplevel PlannedStmt node,
and have SubPlan nodes refer to them by list index instead of direct pointers.
Vadim wanted to do that years ago, but I never understood what he was on about
until now. It makes things a *whole* lot more robust, because we can stop
worrying about duplicate processing of subplans during expression tree
traversals. That's been a constant source of bugs, and it's finally gone.
There are some consequent simplifications yet to be made, like not using
a separate EState for subplans in the executor, but I'll tackle that later.
I refactored findsplitloc and checksplitloc so that the division of
labor is more clear IMO. I pushed all the space calculation inside the
loop to checksplitloc.
I also fixed the off by 4 in free space calculation caused by
PageGetFreeSpace subtracting sizeof(ItemIdData), even though it was
harmless, because it was distracting and I felt it might come back to
bite us in the future if we change the page layout or alignments.
There's now a new function PageGetExactFreeSpace that doesn't do the
subtraction.
findsplitloc now tries the "just the new item to right page" split as
well. If people don't like the refactoring, I can write a patch to just
add that.
Heikki Linnakangas
storing mostly-redundant Query trees in prepared statements, portals, etc.
To replace Query, a new node type called PlannedStmt is inserted by the
planner at the top of a completed plan tree; this carries just the fields of
Query that are still needed at runtime. The statement lists kept in portals
etc. now consist of intermixed PlannedStmt and bare utility-statement nodes
--- no Query. This incidentally allows us to remove some fields from Query
and Plan nodes that shouldn't have been there in the first place.
Still to do: simplify the execution-time range table; at the moment the
range table passed to the executor still contains Query trees for subqueries.
initdb forced due to change of stored rules.
this code was last gone over, there wasn't really any alternative to
globals because we didn't have the PlannerInfo struct being passed all
through the planner code. Now that we do, we can restructure things
to avoid non-reentrancy. I'm fooling with this because otherwise I'd
have had to add another global variable for the planned compact
range table list.
plan nodes, so that the executor does not need to get these items from
the range table at runtime. This will avoid needing to include these
fields in the compact range table I'm expecting to make the executor use.
portals using PORTAL_UTIL_SELECT strategy. This is currently significant only
for FETCH queries, which are supposed to include a count in the tag. Seems
it's been broken since 7.4, but nobody noticed before Knut Lehre.
equal functions are checked for raw parse trees as well as post-analysis
trees. This was never very important before, but the upcoming plan cache
control module will need to be able to do copyObject() on raw parse trees.
forces a particular relation nonnullable, then we can say that the OR does.
This is worth a little extra trouble since it may allow reduction of
outer joins to plain joins.
an opclass for a generic type such as ANYARRAY. The original coding failed
to check that PK and FK columns were of the same array type. Per discussion
with Tom Dunstan. Also, make the code a shade more readable by not trying
to economize on variables.
JOIN quals, just like WHERE quals, even if they reference every one of the
join's relations. Now that we can reorder outer and inner joins, it's
possible for such a qual to end up being assigned to an outer join plan node,
and we mustn't have it treated as a join qual rather than a filter qual for
the node. (If it were, the join could produce null-extended rows that it
shouldn't.) Per bug report from Pelle Johansson.
be checked at plan levels below the top; namely, we have to allow for Result
nodes inserted just above a nestloop inner indexscan. Should think about
using the general Param mechanism to pass down outer-relation variables, but
for the moment we need a back-patchable solution. Per report from Phil Frost.
to_timestamp():
- ID for day-of-week
- IDDD for day-of-year
This makes it possible to convert ISO week dates to and from text
fully represented in either week ('IYYY-IW-ID') or day-of-year
('IYYY-IDDD') format.
I have also added an 'isoyear' field for use with extract / date_part.
Brendan Jurd
o read global SSL configuration file
o add GUC "ssl_ciphers" to control allowed ciphers
o add libpq environment variable PGSSLKEY to control SSL hardware keys
Victor B. Wagner
considered when it is necessary to do so because of a join-order restriction
(that is, an outer-join or IN-subselect construct). The former coding was a
bit ad-hoc and inconsistent, and it missed some cases, as exposed by Mario
Weilguni's recent bug report. His specific problem was that an IN could be
turned into a "clauseless" join due to constant-propagation removing the IN's
joinclause, and if the IN's subselect involved more than one relation and
there was more than one such IN linking to the same upper relation, then the
only valid join orders involve "bushy" plans but we would fail to consider the
specific paths needed to get there. (See the example case added to the join
regression test.) On examining the code I wonder if there weren't some other
problem cases too; in particular it seems that GEQO was defending against a
different set of corner cases than the main planner was. There was also an
efficiency problem, in that when we did realize we needed a clauseless join
because of an IN, we'd consider clauseless joins against every other relation
whether this was sensible or not. It seems a better design is to use the
outer-join and in-clause lists as a backup heuristic, just as the rule of
joining only where there are joinclauses is a heuristic: we'll join two
relations if they have a usable joinclause *or* this might be necessary to
satisfy an outer-join or IN-clause join order restriction. I refactored the
code to have just one place considering this instead of three, and made sure
that it covered all the cases that any of them had been considering.
Backpatch as far as 8.1 (which has only the IN-clause form of the disease).
By rights 8.0 and 7.4 should have the bug too, but they accidentally fail
to fail, because the joininfo structure used in those releases preserves some
memory of there having once been a joinclause between the inner and outer
sides of an IN, and so it leads the code in the right direction anyway.
I'll be conservative and not touch them.
continuously, and requests vacuum runs of "autovacuum workers" to postmaster.
The workers do the actual vacuum work. This allows for future improvements,
like allowing multiple autovacuum jobs running in parallel.
For now, the code keeps the original behavior of having a single autovac
process at any time by sleeping until the previous worker has finished.
WHERE clauses. createplan.c is now willing to stick a gating Result node
almost anywhere in the plan tree, and in particular one can wind up directly
underneath a MergeJoin node. This means it had better be willing to handle
Mark/Restore. Fortunately, that's trivial in such cases, since we can just
pass off the call to the input node (which the planner has previously ensured
can handle Mark/Restore). Per report from Phil Frost.
equality checks it applies, instead of a random dependence on whatever
operators might be named "=". The equality operators will now be selected
from the opfamily of the unique index that the FK constraint depends on to
enforce uniqueness of the referenced columns; therefore they are certain to be
consistent with that index's notion of equality. Among other things this
should fix the problem noted awhile back that pg_dump may fail for foreign-key
constraints on user-defined types when the required operators aren't in the
search path. This also means that the former warning condition about "foreign
key constraint will require costly sequential scans" is gone: if the
comparison condition isn't indexable then we'll reject the constraint
entirely. All per past discussions.
Along the way, make the RI triggers look into pg_constraint for their
information, instead of using pg_trigger.tgargs; and get rid of the always
error-prone fixed-size string buffers in ri_triggers.c in favor of building up
the RI queries in StringInfo buffers.
initdb forced due to columns added to pg_constraint and pg_trigger.
socket is still read-ready, the code was a tight loop, wasting lots of CPU.
We can't do anything to clear the failure, other than wait, but we should give
other processes more chance to finish and release FDs; so insert a small sleep.
Also, avoid bogus "close(-1)" in this case. Per report from Jim Nasby.
that overlap an outer join's min_righthand but aren't fully contained in it,
to support joining within the RHS after having performed an outer join that
can commute with this one. Aside from the direct fix in make_join_rel(),
fix has_join_restriction() and GEQO's desirable_join() to consider this
possibility. Per report from Ian Harding.
currently have any better strategy for this query than re-running the
sub-select over and over; it seems unlikely that doing so 10000 times
is a more useful test than doing it a few dozen times.
entries for the victim database go away sooner rather than later. We already
did the equivalent thing at the per-relation level, not sure why it's not
been done for whole databases. With this change, pgstat_vacuum_tabstat
should usually not find anything to do; though we still need it as a backstop
in case DROPDB or TABPURGE messages get lost under load.
keeping private state in each backend that has inserted and deleted the same
tuple during its current top-level transaction. This is sufficient since
there is no need to be able to determine the cmin/cmax from any other
transaction. This gets us back down to 23-byte headers, removing a penalty
paid in 8.0 to support subtransactions. Patch by Heikki Linnakangas, with
minor revisions by moi, following a design hashed out awhile back on the
pghackers list.
get away with not (re)initializing a local variable if the variable is marked
"isconst" and not "isnull". Unfortunately it makes this decision after having
already freed the old value, meaning that something like
for i in 1..10 loop
declare c constant text := 'hi there';
leads to subsequent accesses to freed memory, and hence probably crashes.
(In particular, this is why Asif Ali Rehman's bug leads to crash and not
just an unexpectedly-NULL value for SQLERRM: SQLERRM is marked CONSTANT
and so triggers this error.)
The whole thing seems wrong on its face anyway: CONSTANT means that you can't
change the variable inside the block, not that the initializer expression is
guaranteed not to change value across successive block entries. Hence,
remove the "optimization" instead of trying to fix it.
DECLARE section needs to know about it. Formerly, everyplace besides DECLARE
that created variables needed to do "plpgsql_add_initdatums(NULL)" to prevent
those variables from being sucked up as part of a subsequent DECLARE block.
This is obviously error-prone, and in fact the SQLSTATE/SQLERRM patch had
failed to do it for those two variables, leading to the bug recently exhibited
by Asif Ali Rehman: a DECLARE within an exception handler tried to reinitialize
SQLERRM.
Although the SQLSTATE/SQLERRM patch isn't in any pre-8.1 branches, and so
I can't point to a demonstrable failure there, it seems wise to back-patch
this into the older branches anyway, just to keep the logic similar to HEAD.
For win32 in general, this makes it possible to run the regression tests
as an admin user by using the same restricted token method that's used
by pg_ctl and initdb.
For vc++, it adds building of pg_regress.exe, adds a resultmap, and
fixes how it runs the install.
Magnus Hagander
where possible, and fix some sites that apparently thought that fgets()
will overwrite the buffer by one byte.
Also add some strlcpy() to eliminate some weird memory handling.
> Currently, an index split writes all the data on the split page to
> WAL. That's a lot of WAL traffic. The tuples that are copied to the
> right page need to be WAL logged, but the tuples that stay on the
> original page don't.
Heikki Linnakangas
already collected in the current transaction; this allows plpgsql functions to
watch for stats updates even though they are confined to a single transaction.
Use this instead of the previous kluge involving pg_stat_file() to wait for
the stats collector to update in the stats regression test. Internally,
decouple storage of stats snapshots from transaction boundaries; they'll
now stick around until someone calls pgstat_clear_snapshot --- which xact.c
still does at transaction end, to maintain the previous behavior. This makes
the logic a lot cleaner, at the price of a couple dozen cycles per transaction
exit.
changes (with an upper limit of 30 seconds), and record the delay time in
the postmaster log. This should give us some info about what's happening
with the intermittent stats failures in buildfarm. After an idea of
Andrew Dunstan's.
"database system is ready to accept connections", which is issued by the
postmaster when it really is ready to accept connections. Per proposal from
Markus Schiltknecht and subsequent discussion.
thought that it didn't have to reposition the underlying tuplestore if the
portal is atEnd. But this is not so, because tuplestores have separate read
and write cursors ... and the read cursor hasn't moved from the start.
This mistake explains bug #2970 from William Zhang.
Note: the coding here is pretty inefficient, but given that no one has noticed
this bug until now, I'd say hardly anyone uses the case where the cursor has
been advanced before being persisted. So maybe it's not worth worrying about.
out that ExecEvalVar and friends don't necessarily have access to a tuple
descriptor with correct typmod: it definitely can contain -1, and possibly
might contain other values that are different from the Var's value.
Arguably this should be cleaned up someday, but it's not a simple change,
and in any case typmod discrepancies don't pose a security hazard.
Per reports from numerous people :-(
I'm not entirely sure whether the failure can occur in 8.0 --- the simple
test cases reported so far don't trigger it there. But back-patch the
change all the way anyway.
had stopped working for tables buried inside views or sub-selects. This is
because I had gotten rid of the simplify_jointree() preprocessing step, and
optimize_minmax_aggregates() wasn't smart enough to deal with a non-canonical
FromExpr. Per gripe from Bill Howe.
that aren't turned into true joins). Since this is the last missing bit of
infrastructure, go ahead and fill out the hash integer_ops and float_ops
opfamilies with cross-type operators. The operator family project is now
DONE ... er, except for documentation ...
describe the maximum size of index tuples (which is typically AM-dependent
anyway); and consequently remove the bogus deduction for "special space"
that was built into it.
Adjust TOAST_TUPLE_THRESHOLD and TOAST_MAX_CHUNK_SIZE to avoid wasting two
bytes per toast chunk, and to ensure that the calculation correctly tracks any
future changes in page header size. The computation had been inaccurate in a
way that didn't cause any harm except space wastage, but future changes could
have broken it more drastically.
Fix the calculation of BTMaxItemSize, which was formerly computed as 1 byte
more than it could safely be. This didn't cause any harm in practice because
it's only compared against maxalign'd lengths, but future changes in the size
of page headers or btree special space could have exposed the problem.
initdb forced because of change in TOAST_MAX_CHUNK_SIZE, which alters the
storage of toast tables.
threshold for tuple length. On 4-byte-MAXALIGN machines, the toast code
creates tuples that have t_len exactly TOAST_TUPLE_THRESHOLD ... but this
number is not itself maxaligned, so if heap_insert maxaligns t_len before
comparing to TOAST_TUPLE_THRESHOLD, it'll uselessly recurse back to
tuptoaster.c, wasting cycles. (It turns out that this does not happen on
8-byte-MAXALIGN machines, because for them the outer MAXALIGN in the
TOAST_MAX_CHUNK_SIZE macro reduces TOAST_MAX_CHUNK_SIZE so that toast tuples
will be less than TOAST_TUPLE_THRESHOLD in size. That MAXALIGN is really
incorrect, but we can't remove it now, see below.) There isn't any particular
value in maxaligning before comparing to the thresholds, so just don't do
that, which saves a small number of cycles in itself.
These numbers should be rejiggered to minimize wasted space on toast-relation
pages, but we can't do that in the back branches because changing
TOAST_MAX_CHUNK_SIZE would force an initdb (by changing the contents of toast
tables). We can move the toast decision thresholds a bit, though, which is
what this patch effectively does.
Thanks to Pavan Deolasee for discovering the unintended recursion.
Back-patch into 8.2, but not further, pending more testing. (HEAD is about
to get a further patch modifying the thresholds, so it won't help much
for testing this form of the patch.)
observe the xmloption.
Reorganize the representation of the XML option in the parse tree and the
API to make it easier to manage and understand.
Add regression tests for parsing back XML expressions.
generated solution files for what to install, instead of blindly copying
everything as it previously did. With the previous quick-n-dirty
version, it would copy old DLLs if you reconfigured in a way that didn't
include subprojects like a PL for example.
Magnus Hagander.
made query plan. Use of ALTER COLUMN TYPE creates a hazard for cached
query plans: they could contain Vars that claim a column has a different
type than it now has. Fix this by checking during plan startup that Vars
at relation scan level match the current relation tuple descriptor. Since
at that point we already have at least AccessShareLock, we can be sure the
column type will not change underneath us later in the query. However,
since a backend's locks do not conflict against itself, there is still a
hole for an attacker to exploit: he could try to execute ALTER COLUMN TYPE
while a query is in progress in the current backend. Seal that hole by
rejecting ALTER TABLE whenever the target relation is already open in
the current backend.
This is a significant security hole: not only can one trivially crash the
backend, but with appropriate misuse of pass-by-reference datatypes it is
possible to read out arbitrary locations in the server process's memory,
which could allow retrieving database content the user should not be able
to see. Our thanks to Jeff Trout for the initial report.
Security: CVE-2007-0556
we should check that the function code returns the claimed result datatype
every time we parse the function for execution. Formerly, for simple
scalar result types we assumed the creation-time check was sufficient, but
this fails if the function selects from a table that's been redefined since
then, and even more obviously fails if check_function_bodies had been OFF.
This is a significant security hole: not only can one trivially crash the
backend, but with appropriate misuse of pass-by-reference datatypes it is
possible to read out arbitrary locations in the server process's memory,
which could allow retrieving database content the user should not be able
to see. Our thanks to Jeff Trout for the initial report.
Security: CVE-2007-0555
Standard English uses "may", "can", and "might" in different ways:
may - permission, "You may borrow my rake."
can - ability, "I can lift that log."
might - possibility, "It might rain today."
Unfortunately, in conversational English, their use is often mixed, as
in, "You may use this variable to do X", when in fact, "can" is a better
choice. Similarly, "It may crash" is better stated, "It might crash".
Standard English uses "may", "can", and "might" in different ways:
may - permission, "You may borrow my rake."
can - ability, "I can lift that log."
might - possibility, "It might rain today."
Unfortunately, in conversational English, their use is often mixed, as
in, "You may use this variable to do X", when in fact, "can" is a better
choice. Similarly, "It may crash" is better stated, "It might crash".
Also update two error messages mentioned in the documenation to match.
nonportable "hh" sprintf(3) length modifier. Instead, do the parsing
and output by hand. The code to do this isn't ideal, but this is
an interim measure anyway: the uuid type should probably use the
in-memory struct layout specified by RFC 4122. For now, this patch
should hopefully rectify the buildfarm failures for the uuid test.
Along the way, re-add pg_cast entries for uuid <-> varchar, which
I mistakenly removed earlier, and bump the catversion.
In this case extractQuery should returns -1 as nentries. This changes
prototype of extractQuery method to use int32* instead of uint32* for
nentries argument.
Based on that gincostestimate may see two corner cases: nothing will be found
or seqscan should be used.
Per proposal at http://archives.postgresql.org/pgsql-hackers/2007-01/msg01581.php
PS tsearch_core patch should be sightly modified to support changes, but I'm
waiting a verdict about reviewing of tsearch_core patch.
The original coding failed (tried to access deallocated memory) if there were
two active call sites (fn_extra pointers) for the same function and the
function definition was updated. Also, if an update of a recursive function
was detected upon nested entry to the function, the existing compiled version
was summarily deallocated, resulting in crash upon return to the outer
instance. Problem observed while studying a bug report from Sergiy
Vyshnevetskiy.
Bug does not exist before 8.1 since older versions just leaked the memory of
obsoleted compiled functions, rather than trying to reclaim it.
by plpgsql can themselves use SPI --- possibly indirectly, as in the case
of domain_in() invoking plpgsql functions in a domain check constraint.
Per bug #2945 from Sergiy Vyshnevetskiy.
Somewhat arbitrarily, I've chosen to back-patch this as far as 8.0. Given
the lack of prior complaints, it doesn't seem critical for 7.x.
Hashing for aggregation purposes still needs work, so it's not time to
mark any cross-type operators as hashable for general use, but these cases
work if the operators are so marked by hand in the system catalogs.
match because they contain a null join key (and the join operator is
known strict). Improves performance significantly when the inner
relation contains a lot of nulls, as per bug #2930.
handy to prevent core dump files from disappearing, but it's useless now
because (a) we don't drop core in individual DB subdirectories anymore,
and (b) CREATE DATABASE forces an internal checkpoint anyway.
reports; inspired by the misleading CONTEXT lines shown in recent bug report
from Stefan Kaltenbrunner. Also, allow statement-type names shown in these
messages to be translated.
that defined in RFC 4122. This patch includes the basic implementation,
plus regression tests. Documentation and perhaps some additional
functionality will come later. Catversion bumped.
Patch from Gevik Babakhani; review from Peter, Tom, and myself.
safely in the presence of subtransactions. To ensure that any ExprContext
shutdown callbacks are called at the right times, we have to have a separate
EState for each level of subtransaction. Per "TupleDesc reference leak" bug
report from Stefan Kaltenbrunner.
Although I'm convinced the code is wrong as far back as 8.0, it doesn't seem
that there are any ways for the problem to really manifest before 8.2: AFAICS,
8.0 and 8.1 only use the ExprContextCallback mechanism to handle set-returning
functions, which cannot usefully be executed in a "simple expression" anyway.
Hence, no backpatch before 8.2 --- the risk of unforeseen breakage seems
to outweigh the chance of fixing something.
activity quiesce. Possibly this will fix the large increase in
non-reproducible stats test failures we've noted since turning on
stats_row_level by default.
versus varchar[]. This oversight probably explains Ryan Holmes' recent
complaint --- he was getting a generic selectivity estimate instead of
anything intelligent.
exactly at the point where we need to insert a new item, the calculation used
the wrong size for the "high key" of the new left page. This could lead to
choosing an unworkable split, resulting in "PANIC: failed to add item to the
left sibling" (or "right sibling") failure. Although this bug has been there
a long time, it's very difficult to trigger a failure before 8.2, since there
was generally a lot of free space on both sides of a chosen split. In 8.2,
where the user-selected fill factor determines how much free space the code
tries to leave, an unworkable split is much more likely. Report by Joe
Conway, diagnosis and fix by Heikki Linnakangas.
input in the stats collector. Our select() emulation is apparently buggy
for UDP sockets :-(. This should resolve problems with stats collection
(and hence autovacuum) failing under more than minimal load. Diagnosis
and patch by Magnus Hagander.
Patch probably needs to be back-ported to 8.1 and 8.0, but first let's
see if it makes the buildfarm happy...
suffix, to distinguish them from doubles. Make some function declarations
and definitions use the "const" qualifier for arguments consistently.
Ignore warning 4102 ("unreferenced label"), because such warnings
are always emitted by bison-generated code. Patch from Magnus Hagander.
- Add new SQL command SET XML OPTION (also available via regular GUC) to
control the DOCUMENT vs. CONTENT option in implicit parsing and
serialization operations.
- Subtle corrections in the handling of the standalone property in
xmlroot().
- Allow xmlroot() to work on content fragments.
- Subtle corrections in the handling of the version property in
xmlconcat().
- Code refactoring for producing XML declarations.
page about the maximum UTF8 sequence length we support (4 bytes since 8.1,
3 before that). pg_utf2wchar_with_len never got updated to support 4-byte
characters at all, and in any case had a buffer-overrun risk in that it
could produce multiple pg_wchars from what mblen claims to be just one UTF8
character. The only reason we don't have a major security hole is that most
callers allocate worst-case output buffers; the sole exception in released
versions appears to be pre-8.2 iwchareq() (ie, ILIKE), which can be crashed
due to zeroing out its return address --- but AFAICS that can't be exploited
for anything more than a crash, due to inability to control what gets written
there. Per report from James Russell and Michael Fuhr.
Pre-8.1 the risk is much less, but I still think pg_utf2wchar_with_len's
behavior given an incomplete final character risks buffer overrun, so
back-patch that logic change anyway.
This patch also makes sure that UTF8 sequences exceeding the supported
length (whichever it is) are consistently treated as error cases, rather
than being treated like a valid shorter sequence in some places.
involving unions of types having typmods. Variants of the failure are known
to occur in 8.1 and up; not sure if it's possible in 8.0 and 7.4, but since
the code exists that far back, I'll just patch 'em all. Per report from
Brian Hurt.
FAMILY; and add FAMILY option to CREATE OPERATOR CLASS to allow adding a
class to a pre-existing family. Per previous discussion. Man, what a
tedious lot of cutting and pasting ...
which I had removed in the first cut of the EquivalenceClass rewrite to
simplify that patch a little. But it's still important --- in a four-way
join problem mergejoinscansel() was eating about 40% of the planning time
according to gprof. Also, improve the EquivalenceClass code to re-use
join RestrictInfos rather than generating fresh ones for each join
considered. This saves some memory space but more importantly improves
the effectiveness of caching planning info in RestrictInfos.
columns procost and prorows, to allow simple user adjustment of the estimated
cost of a function call, as well as control of the estimated number of rows
returned by a set-returning function. We might eventually wish to extend this
to allow function-specific estimation routines, but there seems to be
consensus that we should try a simple constant estimate first. In particular
this provides a relatively simple way to control the order in which different
WHERE clauses are applied in a plan node, which is a Good Thing in view of the
fact that the recent EquivalenceClass planner rewrite made that much less
predictable than before.
provide just a boolean 'amcanorder', instead of fields that specify the
sort operator strategy numbers. We have decided to require ordering-capable
AMs to use btree-compatible strategy numbers, so the old fields are
overkill (and indeed misleading about what's allowed).
representation of equivalence classes of variables. This is an extensive
rewrite, but it brings a number of benefits:
* planner no longer fails in the presence of "incomplete" operator families
that don't offer operators for every possible combination of datatypes.
* avoid generating and then discarding redundant equality clauses.
* remove bogus assumption that derived equalities always use operators
named "=".
* mergejoins can work with a variety of sort orders (e.g., descending) now,
instead of tying each mergejoinable operator to exactly one sort order.
* better recognition of redundant sort columns.
* can make use of equalities appearing underneath an outer join.
currentMarkData from IndexScanDesc to the opaque structs for the
AMs that need this information (currently gist and hash).
Patch from Heikki Linnakangas, fixes by Neil Conway.
the generated files, to help Visual C++ to run these tests. The tests still
pass in VPATH and normal builds.
Patch from Magnus Hagander, editorialized by me.
is deleted. A backend about to unlink a file now sends a "revoke fsync"
request to the bgwriter to make it clean out pending fsync requests. There
is still a race condition where the bgwriter may try to fsync after the unlink
has happened, but we can resolve that by rechecking the fsync request queue
to see if a revoke request arrived meanwhile. This eliminates the former
kluge of "just assuming" that an ENOENT failure is okay, and lets us handle
the fact that on Windows it might be EACCES too without introducing any
questionable assumptions. After an idea of mine improved by Magnus.
The HEAD patch doesn't apply cleanly to 8.2, but I'll see about a back-port
later. In the meantime this could do with some testing on Windows; I've been
able to force it through the code path via ENOENT, but that doesn't prove that
it actually fixes the Windows problem ...
* After Markos patch, now builds pgcrypto without zlib again
* Updates README with xml info
* xml requires xslt and iconv
* disable unnecessary warning about __cdecl()
* Add a buildenv.bat called from all other bat files to set up things
like PATH for flex/bison. (Can't just set it before calling, doesn't
always work when building from the GUI)
The implementation is somewhat ugly logic-wise, but I don't see an
easy way to make it more concise.
When writing this, I noticed that my previous implementation of
width_bucket() doesn't handle NaN correctly:
postgres=# select width_bucket('NaN', 1, 5, 5);
width_bucket
--------------
6
(1 row)
AFAICS SQL:2003 does not define a NaN value, so it doesn't address how
width_bucket() should behave here. The patch changes width_bucket() so
that ereport(ERROR) is raised if NaN is specified for the operand or the
lower or upper bounds to width_bucket(). For float8, NaN is disallowed
for any of the floating-point inputs, and +/- infinity is disallowed
for the histogram bounds (but allowed for the operand).
Update docs and regression tests, bump the catversion.
it was checking a pg_constraint OID instead of pg_class OID, resulting in
"relation with OID nnnnn does not exist" failures for anyone who wasn't
owner of the table being examined. Per bug #2848 from Laurence Rowe.
Note: for existing 8.2 installations a simple version update won't fix this;
the easiest fix is to CREATE OR REPLACE this view with the corrected
definition.
accessing it, like DROP DATABASE. This allows the regression tests to pass
with autovacuum enabled, which open the gates for finally enabling autovacuum
by default.
standard convention the 21st century runs from 2001-2100, not 2000-2099,
so make it work like that. Per bug #2885 from Akio Iwaasa.
Backpatch to 8.2, but no further, since this is really a definitional
change; users of older branches are probably more interested in stability.
hold true for operators in a btree operator family. This is mostly to
clarify my own thinking about what the planner can assume for optimization
purposes. (blowing dust off an old abstract-algebra textbook...)
(or other types of pg_class entry): the function pgstat_vacuum_tabstat,
invoked during VACUUM startup, had runtime proportional to the number of
stats table entries times the number of pg_class rows; in other words
O(N^2) if the stats collector's information is reasonably complete.
Replace list searching with a hash table to bring it back to O(N)
behavior. Per report from kim at myemma.com.
Back-patch as far as 8.1; 8.0 and before use different coding here.
Made this option mark the .c files, so the environment variable is no longer needed.
Created a special MinGW file with the special error message.
Do not print port into log file when running regression tests.
which comparison operators to use for plan nodes involving tuple comparison
(Agg, Group, Unique, SetOp). Formerly the executor looked up the default
equality operator for the datatype, which was really pretty shaky, since it's
possible that the data being fed to the node is sorted according to some
nondefault operator class that could have an incompatible idea of equality.
The planner knows what it has sorted by and therefore can provide the right
equality operator to use. Also, this change moves a couple of catalog lookups
out of the executor and into the planner, which should help startup time for
pre-planned queries by some small amount. Modify the planner to remove some
other cavalier assumptions about always being able to use the default
operators. Also add "nulls first/last" info to the Plan node for a mergejoin
--- neither the executor nor the planner can cope yet, but at least the API is
in place.
1) gendef works from inside visual studio - use a tempfile instead of
redirection, because for some reason you can't redirect dumpbin from
inside (patch from Joachim Wieland)
2) gendef must process only *.obj, or you get weird errors in some build
scenarios when it tries to process a logfile
Magnus Hagander
the same output level that was used when building a single project
before, and really needed to get reasonable information about what
happens (non-verbose just says "starting build of foo" and "done
building foo", more or less).
Magnus Hagander
management. The paper clearly describes many of the ideas embodied in
our current hashing code, but as far as I could find out there is not
a direct code heritage. (Mike Olsen recalls discussion of this paper
at Postgres meetings but believes it "informed the Postgres implementation
probably just at the design level". Margo herself says she wasn't
involved with Postgres' hash code.) Credit where credit is due 'n all
that, even if fifteen years after the fact.
per-column options for btree indexes. The planner's support for this is still
pretty rudimentary; it does not yet know how to plan mergejoins with
nondefault ordering options. The documentation is pretty rudimentary, too.
I'll work on improving that stuff later.
Note incompatible change from prior behavior: ORDER BY ... USING will now be
rejected if the operator is not a less-than or greater-than member of some
btree opclass. This prevents less-than-sane behavior if an operator that
doesn't actually define a proper sort ordering is selected.
when collapsing of JOIN trees is stopped by join_collapse_limit. For instance
a list of 11 LEFT JOINs with limit 8 now produces something like
((1 2 3 4 5 6 7 8) 9 10 11 12)
instead of
(((1 2 3 4 5 6 7 8) (9)) 10 11 12)
The latter structure is really only required for a FULL JOIN.
Noted while studying an example from Shane Ambler.
hash joins with the estimated-larger relation on the inside. There are
several cases where doing that makes perfect sense, and in cases where it
doesn't, the regular cost computation really ought to be able to figure that
out. Make some marginal tweaks in said computation to try to get results
approximating reality a bit better. Per an example from Shane Ambler.
Also, fix an oversight in the original patch to add seq_page_cost: the costs
of spilling a hash join to disk should be scaled by seq_page_cost.
sets the items, and serializes the value back (rather than adding an
arbitrary number of XML preambles as before).
The libxml memory management via palloc had to be disabled because it
crashes when libxml tries to access memory that was helpfully freed
earlier by PostgreSQL. This needs further thought.
database privileges from a pre-8.2 server. This ensures that the reloaded
database will maintain the same behavior it had in the previous installation,
ie, everybody has connect privilege. Per gripe from L Bayuk.
an optarg). Add some comments noting that code in three different files has
to be kept in sync. Fix erroneous description of -S switch (it sets work_mem
not silent_mode), and do some light copy-editing elsewhere in postgres-ref.
form '^(foo)$'. Before, these could never be optimized into indexscans.
The recent changes to make psql and pg_dump generate such patterns (for \d
commands and -t and related switches, respectively) therefore represented
a big performance hit for people with large pg_class catalogs, as seen in
recent gripe from Erik Jones. While at it, be more paranoid about
case-sensitivity checking in multibyte encodings, and fix some other
corner cases in which a regex might be interpreted too liberally.
having md.c return a success/failure boolean to smgr.c, which was just going
to elog anyway, let md.c issue the elog messages itself. This allows better
error reporting, particularly in cases such as "short read" or "short write"
which Peter was complaining of. Also, remove the kluge of allowing mdread()
to return zeroes from a read-beyond-EOF: this is now an error condition
except when InRecovery or zero_damaged_pages = true. (Hash indexes used to
require that behavior, but no more.) Also, enforce that mdwrite() is to be
used for rewriting existing blocks while mdextend() is to be used for
extending the relation EOF. This restriction lets us get rid of the old
ad-hoc defense against creating huge files by an accidental reference to
a bogus block number: we'll only create new segments in mdextend() not
mdwrite() or mdread(). (Again, when InRecovery we allow it anyway, since
we need to allow updates of blocks that were later truncated away.)
Also, clean up the original makeshift patch for bug #2737: move the
responsibility for padding relation segments to full length into md.c.
The purpose is to allow autovacuum-esq conditional vacuuming and
clustering using SQL to discover the required stats.
No documentation updates required. Catalog version updated.
Glen Parker
valid result from a computation if one of the input values was infinity.
The previous code assumed an operation that returned infinity was an
overflow.
Handle underflow/overflow consistently, and add checks for aggregate
overflow.
Consistently prevent Inf/Nan from being cast to integer data types.
Fix INT_MIN % -1 to prevent overflow.
Update regression results for new error text.
Per report from Roman Kononov.
pg_opclass during LookupOpclassInfo(), I'd turned pg_opclass_oid_index
into a critical system index. However the problem could only manifest
during a backend's first attempt to load opclass data, and then only
if it had successfully loaded pg_internal.init and subsequently received
a relcache flush; which made it impossible to reproduce in sequential
tests and darn hard even in parallel tests. Memo to self: when
exercising cache flush scenarios, must disable LookupOpclassInfo's
internal cache too.
or contradictory keys even in cross-data-type scenarios. This is another
benefit of the opfamily rewrite: we can find the needed comparison
operators now.
the 8.1 SQL function definition for it. Per report from Rajesh Kumar Mallah,
such a DBA error doesn't seem at all improbable, and the cost of checking for
it is not very high compared to the cost of running this function. (It would
have been better to change the C name of the function so it wouldn't be called
by the old SQL definition, but it's too late for that now in the 8.2 branch.)
of increasing size, instead of one at a time. This reduces the memory
management overhead when num_temp_buffers is large: in the previous coding
we would actually waste 50% of the space used for temp buffers, because aset.c
would round the individual requests up to 16K. Problem noted while studying
a performance issue reported by Steven Flatt.
Back-patch as far as 8.1 --- older versions used few enough local buffers
that the issue isn't significant for them.
has a small maxBlockSize: the maximum request size that we will treat as a
"chunk" needs to be limited to fit in maxBlockSize. Otherwise we will round
up the request size to the next power of 2, wasting space, which is a bit
pointless if we aren't going to make the blocks big enough to fit additional
stuff in them. The example motivating this is local buffer management, which
makes repeated allocations of 8K (one BLCKSZ buffer) in TopMemoryContext,
which has maxBlockSize = 8K because for the most part allocations there are
small. This leads to each local buffer actually eating 16K of space, which
adds up when there are thousands of them. I intend to change localbuf.c to
aggregate its requests, which will prevent this particular misbehavior, but
it seems likely that similar scenarios could arise elsewhere, so fixing the
core problem seems wise as well.
PQdsplen()) normally, instead of replacing them by \uXXXX sequences.
Assume that they in fact occupy zero screen space for formatting purposes.
Per gripe from Michael Fuhr and ensuing discussion.
involving HashAggregate over SubqueryScan (this is the known case, there
may well be more). The bug is only latent in releases before 8.2 since they
didn't try to access tupletable slots' descriptors during ExecDropTupleTable.
The least bogus fix seems to be to make subqueries share the parent query's
memory context, so that tupdescs they create will have the same lifespan as
those of the parent query. There are comments in the code envisioning going
even further by not having a separate child EState at all, but that will
require rethinking executor access to range tables, which I don't want to
tackle right now. Per bug report from Jean-Pierre Pelletier.
ps_TupFromTlist in plan nodes that make use of it. This was being done
correctly in join nodes and Result nodes but not in any relation-scan nodes.
Bug would lead to bogus results if a set-returning function appeared in the
targetlist of a subquery that could be rescanned after partial execution,
for example a subquery within EXISTS(). Bug has been around forever :-(
... surprising it wasn't reported before.
were marked canSetTag. While it's certainly correct to return the result
of the last one that is marked canSetTag, it's less clear what to do when
none of them are. Since plpgsql will complain if zero is returned, the
8.2.0 behavior isn't good. I've fixed it to restore the prior behavior of
returning the physically last query's result code when there are no
canSetTag queries.
Use a TRY block instead of (inadequate) ad-hoc coding to ensure that
libxml is cleaned up after a failure. Report the intended SQLCODE
instead of defaulting to XX000. Avoid risking use of a dangling
pointer by keeping the persistent error buffer in TopMemoryContext.
Be less trusting that error messages don't contain %.
This patch doesn't do anything about changing the way the messages
are put together --- this is just about mechanism.
bletcherous and unsafe manipulation of global encoding setting.
Clean up libxml reporting mechanism a bit (it still looks like a
dangling-pointer crash waiting to happen, though, not to mention
being far less than sane from a localization standpoint).
the XmlExpr code in various lists, use a representation that has some hope
of reverse-listing correctly (though it's still a de-escaping function
shy of correctness), generally try to make it look more like Postgres
coding conventions.
cases. Operator classes now exist within "operator families". While most
families are equivalent to a single class, related classes can be grouped
into one family to represent the fact that they are semantically compatible.
Cross-type operators are now naturally adjunct parts of a family, without
having to wedge them into a particular opclass as we had done originally.
This commit restructures the catalogs and cleans up enough of the fallout so
that everything still works at least as well as before, but most of the work
needed to actually improve the planner's behavior will come later. Also,
there are not yet CREATE/DROP/ALTER OPERATOR FAMILY commands; the only way
to create a new family right now is to allow CREATE OPERATOR CLASS to make
one by default. I owe some more documentation work, too. But that can all
be done in smaller pieces once this infrastructure is in place.
operator strategy numbers, ie, GiST and GIN. This is almost cosmetic
enough to not need a catversion bump, but since the opr_sanity regression
test has to change in sync with the catalog entry, I figured I'd better
do one.
are all in new-in-8.2 logic associated with indexability of ScalarArrayOpExpr
(IN-clauses) or amortization of indexscan costs across repeated indexscans
on the inside of a nestloop. In particular:
Fix some logic errors in the estimation for multiple scans induced by a
ScalarArrayOpExpr indexqual.
Include a small cost component in bitmap index scans to reflect the costs of
manipulating the bitmap itself; this is mainly to prevent a bitmap scan from
appearing to have the same cost as a plain indexscan for fetching a single
tuple.
Also add a per-index-scan-startup CPU cost component; while prior releases
were clearly too pessimistic about the cost of repeated indexscans, the
original 8.2 coding allowed the cost of an indexscan to effectively go to zero
if repeated often enough, which is overly optimistic.
Pay some attention to index correlation when estimating costs for a nestloop
inner indexscan: this is significant when the plan fetches multiple heap
tuples per iteration, since high correlation means those tuples are probably
on the same or adjacent heap pages.
joinclause doesn't use any outer-side vars) requires a "bushy" plan to be
created. The normal heuristic to avoid joins with no joinclause has to be
overridden in that case. Problem is new in 8.2; before that we forced the
outer join order anyway. Per example from Teodor.
representing externally-supplied values, since the APIs that carry such
values only specify type not typmod. However, for PARAM_SUBLINK Params
it is handy to carry the typmod of the sublink's output column. This
is a much cleaner solution for the recently reported 'could not find
pathkey item to sort' and 'failed to find unique expression in subplan
tlist' bugs than my original 8.2-compatible patch. Besides, someday we
might want to support typmods for external parameters ...
in normal operation, and we can avoid rewriting pg_control at every log
segment switch if we don't insist that these values be valid. Reducing
the number of pg_control updates is a good idea for both performance and
reliability. It does make pg_resetxlog's life a bit harder, but that seems
a good tradeoff; and anyway the change to pg_resetxlog amounts to automating
something people formerly needed to do by hand, namely look at the existing
pg_xlog files to make sure the new WAL start point was past them.
In passing, change the wording of xlog.c's "database system was interrupted"
messages: describe the pg_control timestamp as "last known up at" rather than
implying it is the exact time of service interruption. With this change the
timestamp will generally be the time of the last checkpoint, which could be
many minutes before the failure; and we've already seen indications that
people tend to misinterpret the old wording.
initdb forced due to change in pg_control layout. Simon Riggs and Tom Lane
release it in a subtransaction abort, but this neglects possibility that
someone outside SPI already did. Fix is for spi.c to forget about a tuptable
as soon as it's handed it back to the caller.
Per bug #2817 from Michael Andreen.
rearrangeable outer joins and the WHERE clause is non-strict and mentions
only nullable-side relations. New bug in 8.2, caused by new logic to allow
rearranging outer joins. Per bug #2807 from Ross Cohen; thanks to Jeff
Davis for producing a usable test case.
a sublink's test expression have the correct vartypmod, rather than defaulting
to -1. There's at least one place where this is important because we're
expecting these Vars to be exactly equal() to those appearing in the subplan
itself. This is a pretty klugy solution --- it would likely be cleaner to
change Param nodes to include a typmod field --- but we can't do that in the
already-released 8.2 branch.
Per bug report from Hubert Fongarnand.
identify long-running transactions. Since we already need to record
the transaction-start time (e.g. for now()), we don't need any
additional system calls to report this information.
Catversion bumped, initdb required.
capitalize the strings like sentences. Remove unnecessarily
specific descriptions of the units used by GUC variables, since
we now allow any reasonable unit to be specified.