foreign keys, one more time. Insist on matching up all three triggers before
we create a constraint; this will avoid creation of duplicate constraints
in scenarios where a broken FK constraint was repaired by re-adding the
constraint without removing the old partial trigger set. Basically, this will
work nicely in all cases where the FK was actually functioning correctly in
the database that was dumped. It will fail to restore an FK in just one case
where we theoretically could restore it: where we find the referenced table's
triggers and not the referencing table's trigger. However, in such a scenario
it's likely that the user doesn't even realize he still has an FK at all
(since the more-likely-to-fail cases aren't enforced), and we'd probably not
accomplish much except to cause the reload to fail because the data doesn't
meet the FK constraint. Also make the NOTICE logging still more verbose, by
adding detail about which of the triggers were found. This seems about all
we can do without solving the problem of getting the user's attention at
session end.
commands into proper foreign-key constraints. Believe the constraint name
given in the trigger arguments in preference to the trigger name --- to judge
from Olivier Prenant's example, pg_dump must at some time have used the
autogenerated trigger name there, though AFAICT no current release branch tip
does. Improve the emitted NOTICEs to provide more detail (PK table's name and
column names). Handle the case where pg_dump forgot to provide the FROM table
(a bug that never did get fixed in 7.0.x apparently). This commit doesn't
do anything about the question of what to do with incomplete trigger groups.
enabled) and autovacuum is on. Since there will be a steady stream of autovac
worker processes exiting and dropping gmon.out files, allowing them to make
separate subdirectories results in serious bloat; and it seems unlikely that
anyone will care about those profiles anyway. Limit the damage by forcing all
autovac workers to dump in one subdirectory, PGDATA/gprof/avworker/.
Per report from Jrg Beyer and subsequent discussion.
trigger definitions into regular foreign key constraints. This seems
necessary given that some people evidently never did get around to
running adddepend on their schemas, and without some sort of hack the
old definitions will no longer work. Per report from Olivier Prenant
and subsequent investigation.
RelabelType nodes when the sort key is binary-compatible with the sort
operator rather than having exactly its input type. We did this correctly
for index columns but not sort keys, leading to failure to notice that
a varchar index matches an ORDER BY request. This requires a bit more work
in make_sort_from_pathkeys, but not anyplace else that I can find.
Per bug report and subsequent discussion.
Instead put in a test to drop a NULL default at the last moment before
storing the catalog entry. This changes the behavior in a couple of ways:
* Specifying DEFAULT NULL when creating an inheritance child table will
successfully suppress inheritance of any default expression from the
parent's column, where formerly it failed to do so.
* Specifying DEFAULT NULL for a column of a domain type will correctly
override any default belonging to the domain; likewise for a sub-domain.
The latter change happens because by the time the clause is checked,
it won't be a simple null Const but a CoerceToDomain expression.
Personally I think this should be back-patched, but there doesn't seem to
be consensus for that on pgsql-hackers, so refraining.
ginRedoInsert(), because other ginRedo* functions rewrite whole page or
make changes which could be applied several times without consistent's loss
- Remove check of identifying of corresponding split record:
it's possible that replaying of WAL starts after actual page split, but before
removing of that split from incomplete splits list. In this case, that check
cause FATAL error.
Per stress test which reproduces bug reported by Craig McElroy
<craig.mcelroy@contegix.com>
usage of any information from system catalog, because it could be called during
replay of WAL.
Per bug report from Craig McElroy <craig.mcelroy@contegix.com>. Patch doesn't
change on-disk storage.
containing decimal points aren't considered part of a hyphenated word.
Sync the hyphenated-word lookahead states with the subsequent part-by-part
reparsing states so that we don't get different answers about how much text
is part of the hyphenated word. Per my gripe of a few days ago.
in debugging its state-machine rules. Const-ify all the constant tables.
Minor other code cleanup, including using "token" rather than "lexeme" to
describe the output strings.
per recommendation from Alvaro. This doesn't force initdb since the
numeric token type in the catalogs doesn't change; but note that
the expected regression test output changed.
This doubles the planning workload for mergejoins while not actually
accomplishing much. The only useful case is where one of the directions
matches the query's ORDER BY request; therefore, put a thumb on the scales
in that direction, and otherwise arbitrarily consider only the ASC direction.
(This is a lot easier now than it would've been before 8.3, since we have
more semantic knowledge embedded in PathKeys now.)
childprocess deaths instead of using one thread per child. This drastastically
reduces the address space usage and should allow for more backends running.
Also change the win32_waitpid functionality to use an IO Completion Port for
queueing child death notices instead of using a fixed-size array.
if either of the input relations can legally be joined to any other rels using
join clauses. This avoids uselessly (and expensively) considering a lot of
really stupid join paths when there is a join restriction with a large
footprint, that is, lots of relations inside its LHS or RHS. My patch of
15-Feb-2007 had been causing the code to consider joining *every* combination
of rels inside such a group, which is exponentially bad :-(. With this
behavior, clauseless bushy joins will be done if necessary, but they'll be
put off as long as possible. Per report from Jakub Ouhrabka.
Backpatch to 8.2. We might someday want to backpatch to 8.1 as well, but 8.1
does not have the problem for OUTER JOIN nests, only for IN-clauses, so it's
not clear anyone's very likely to hit it in practice; and the current patch
doesn't apply cleanly to 8.1.
the sequence. Also, make setval() with is_called = false not affect the
currval state, either. Per report from Kris Jurka that an implicit
ALTER SEQUENCE OWNED BY unexpectedly caused currval() to become valid.
Since this isn't 100% backwards compatible, it will go into HEAD only;
I'll put a more limited patch into 8.2.
in corner cases such as re-fetching a just-deleted row. We may be able to
relax this someday, but let's find out how many people really care before
we invest a lot of work in it. Per report from Heikki and subsequent
discussion.
While in the neighborhood, make the combination of INSENSITIVE and FOR UPDATE
throw an error, since they are semantically incompatible. (Up to now we've
accepted but just ignored the INSENSITIVE option of DECLARE CURSOR.)
having several of them. Add two more flags: whether the process is
executing an ANALYZE, and whether a vacuum is for Xid wraparound (which
is obviously only set by autovacuum).
Sneakily move the worker's recently-acquired PostAuthDelay to a more useful
place.
neglected to test whether an outer join's join-condition actually refers to
the lower outer join it is looking at. (The comment correctly described what
was supposed to happen, but the code didn't do it...) This often resulted in
adding an unnecessary constraint on the join order of the two outer joins,
which was bad enough. However, it also seems to expose a performance
problem in an older patch (from 15-Feb): once we've decided that there is a
join ordering constraint, we will start trying clauseless joins between every
combination of rels within the constraint, which pointlessly eats up lots of
time and space if there are numerous rels below the outer join. That probably
needs to be revisited :-(. Per gripe from Jakub Ouhrabka.
with the next table on schedule instead of exiting, in all cases instead of
just on query cancel.
Add a errcontext() line indicating the activity of the worker to the error
message when it is cancelled.
Change the WorkerInfo struct to contain a pointer to the worker's PGPROC
instead of just the PID.
Add forgotten post-auth delays, per Simon Riggs. Also to autovac launcher.
then-delete on the current cursor row. The basic fix is that nodeTidscan.c
has to apply heap_get_latest_tid() to the current-scan-TID obtained from the
cursor query; this ensures we get the latest row version to work with.
However, since that only works if the query plan is a TID scan, we also have
to hack the planner to make sure only that type of plan will be selected.
(Formerly, the planner might decide to apply a seqscan if the table is very
small. This change is probably a Good Thing anyway, since it's hard to see
how a seqscan could really win.) That means the execQual.c code to support
CurrentOfExpr as a regular expression type is dead code, so replace it with
just an elog(). Also, add regression tests covering these cases. Note
that the added tests expose the fact that re-fetching an updated row
misbehaves if the cursor used FOR UPDATE. That's an independent bug that
should be fixed later. Per report from Dharmendra Goyal.
and ts_stat(), per my recent suggestion. Also add a possibly-not-needed-
but-can't-hurt check for NULL SPI_tuptable, before we try to dereference
same.
if there are zero rows to aggregate over, and the API seems both conceptually
and notationally ugly anyway. We should look for something that improves
on the tsquery-and-text-SELECT version (which is also pretty ugly but at
least it works...), but it seems that will take query infrastructure that
doesn't exist today. (Hm, I wonder if there's anything in or near SQL2003
window functions that would help?) Per discussion.
categories, as per discussion. asciiword (formerly lword) is still
ASCII-letters-only, and numword (formerly word) is still the most general
mixed-alpha-and-digits case. But word (formerly nlword) is now
any-group-of-letters-with-at-least-one-non-ASCII, rather than all-non-ASCII as
before. This is no worse than before for parsing mixed Russian/English text,
which seems to have been the design center for the original coding; and it
should simplify matters for parsing most European languages. In particular
it will not be necessary for any language to accept strings containing digits
as being regular "words". The hyphenated-word categories are adjusted
similarly.
SHGetFolderPath.
This removes the direct dependency on shell32.dll and user32.dll, which
eats a lot of "desktop heap" for each backend that's started. The
desktop heap is a very limited resource, causing backends to no
longer start once it's been exhausted.
We still have indirect depdendencies on user32.dll through third party
libraries, but those can't easily be removed.
Dave Page
miscomputation of required palloc size. The crash could only occur if the
input contained lexemes both with and without positions, which is probably not
common in practice. The miscomputation would definitely result in wasted
space. Also fix some inconsistent coding around alignment of strings and
positions in a tsvector value; these errors could also lead to crashes given
mixed with/without position data and a machine that's picky about alignment.
And be more careful about checking for overflow of string offsets.
Patch is only against HEAD --- I have not looked to see if same bugs are
in back-branch contrib/tsearch2 code.
active dictionary and its output lexemes as separate columns, instead
of smashing them into one text column, and lowercase the column names.
Also, define the output rowtype using OUT parameters instead of a
composite type, to be consistent with the other built-in functions.
versions of gcc (I'm seeing it with Apple's gcc 4.0.1). I think the
reason we did not see this before was that the assert() macros in the
regex code were all no-ops till recently.
are really redundant, since we invented a regdictionary alias type.
We can have just one function, declared as taking regdictionary, and
it will handle both behaviors. Noted while working on documentation.
when relkind = RELKIND_RELATION. This syncs these tests with the Asserts
in tuptoaster.c, and ensures that we won't ever try to, for example,
compress a sequence's tuple. Problem found by Greg Stark while stress-testing
with much-smaller-than-normal page sizes.
coding this was seen as useless, but the problem with not including them
is that the error message will often be something about authentication
failure, rather than the more helpful one about 'role is not permitted
to log in'. Per discussion.
renumbering of encoding IDs done between 8.2 and 8.3 turns out to break 8.2
initdb and psql if they are run with an 8.3beta1 libpq.so. For the moment
we can rearrange the order of enum pg_enc to keep the same number for
everything except PG_JOHAB, which isn't a problem since there are no direct
references to it in the 8.2 programs anyway. (This does force initdb
unfortunately.)
Going forward, we want to fix things so that encoding IDs can be changed
without an ABI break, and this commit includes the changes needed to allow
libpq's encoding IDs to be treated as fully independent of the backend's.
The main issue is that libpq clients should not include pg_wchar.h or
otherwise assume they know the specific values of libpq's encoding IDs,
since they might encounter version skew between pg_wchar.h and the libpq.so
they are using. To fix, have libpq officially export functions needed for
encoding name<=>ID conversion and validity checking; it was doing this
anyway unofficially.
It's still the case that we can't renumber backend encoding IDs until the
next bump in libpq's major version number, since doing so will break the
8.2-era client programs. However the code is now prepared to avoid this
type of problem in future.
Note that initdb is no longer a libpq client: we just pull in the two
source files we need directly. The patch also fixes a few places that
were being sloppy about checking for an unrecognized encoding name.
it affects. The original coding neglected tablespace entirely (causing
the indexes to move to the database's default tablespace) and for an index
belonging to a UNIQUE or PRIMARY KEY constraint, it would actually try to
assign the parent table's reloptions to the index :-(. Per bug #3672 and
subsequent investigation.
8.0 and 8.1 did not have reloptions, but the tablespace bug is present.
used to perform MIN(foo) or MAX(foo), since we want to discard null rows in
the indexscan anyway. (This would probably fall out for free if we were
injecting the IS NOT NULL clause somewhere earlier, but given the current
anatomy of the MIN/MAX optimization code we have to do it explicitly.
Fortunately, very little added code is needed.) Per a discussion with
Henk de Wit.
has been consumed, recheck against the latest value of RedoRecPtr before
really sending the signal. This avoids useless checkpoint activity if
XLogWrite is executed when we have a very stale local copy of RedoRecPtr.
The potential for useless checkpoint is very much worse in 8.3 because of
the walwriter process (which never does XLogInsert), so while this behavior
was intentional, it needs to be changed. Per report from Itagaki Takahiro.
on pg_global even to superusers, and replace it with checks in various
other places to complain about invalid uses of pg_global. This ends
up being a bit more code but it allows a more specific error message
to be given, and it un-breaks pg_tablespace_size() on pg_global.
Per discussion.
simplification gets detoasted before it is incorporated into a Const node.
Otherwise, if an immutable function were to return a TOAST pointer (an
unlikely case, but it can be made to happen), we would end up with a plan
that depends on the continued existence of the out-of-line toast datum.
a relation as a reason to invalidate a plan when the relation changes. This
handles scenarios such as dropping/recreating a sequence that is referenced by
nextval('seq') in a cached plan. Rather than teach plancache.c all about
digging through plan trees to find regclass Consts, we charge the planner's
setrefs.c with making a list of the relation OIDs on which each plan depends.
That way the list can be built cheaply during a plan tree traversal that has
to happen anyway. Per bug #3662 and subsequent discussion.
as we do (and upstream Tcl doesn't). The loop limit might be subject
to negotiation if anyone ever tries to do regex debugging in Far
Eastern languages, but for now 1000 seems plenty. CHR_MAX was right out :-(
eval_const_expressions simplifies this to just "WHERE false", but we have
already done pull_up_IN_clauses so the IN join will be done, or at least
planned, anyway. The trouble case comes when the sub-SELECT is itself a join
and we decide to implement the IN by unique-ifying the sub-SELECT outputs:
with no remaining reference to the output Vars in WHERE, we won't have
propagated the Vars up to the upper join point, leading to "variable not found
in subplan target lists" error. Fix by adding an extra scan of in_info_list
and forcing all Vars mentioned therein to be propagated up to the IN join
point. Per bug report from Miroslav Sulc.
compiler --- at least on ARM, it does. I suspect that the varvarlena patch
has been creating larger-than-intended toast pointers all along on ARM,
but it wasn't exposed until the latest tweak added some Asserts that
calculated the expected size in a different way. We could probably have
fixed this by adding __attribute__((packed)) as is done for ItemPointerData,
but struct varattrib_pointer isn't really all that useful anyway, so it
seems cleanest to just get rid of it and have only struct varattrib_1b_e.
Per results from buildfarm member quagga.
explicitly. This means a TOAST pointer takes 18 bytes instead of 17 --- still
smaller than in 8.2 --- which seems a good tradeoff to ensure we won't have
painted ourselves into a corner if we want to support multiple types of TOAST
pointer later on. Per discussion with Greg Stark.
while the restore_command does its thing, then 'recovering XXX' while
processing the segment file. These operations are heavyweight enough
that an extra PS display set shouldn't bother anyone.
CREATE INDEX CONCURRENTLY). Such an index might not have entries for every
heap row and thus clustering with it would result in silent data loss.
The scenario requires a pretty foolish DBA, but still ...
ALTER TABLE on a composite type or ALTER TYPE on a table's rowtype.
We already rejected these cases, but the error messages were a bit
random and didn't always provide a HINT to use the other command type.
recovery stop time was used. This avoids a corner-case risk of trying to
overwrite an existing archived copy of the last WAL segment, and seems
simpler and cleaner all around than the original definition. Per example
from Jon Colverson and subsequent analysis by Simon.
databases with encodings that are incompatible with the server's LC_CTYPE
locale, when we can determine that (which we can on most modern platforms,
I believe). C/POSIX locale is compatible with all encodings, of course,
so there is still some usefulness to CREATE DATABASE's ENCODING option,
but this will insulate us against all sorts of recurring complaints
caused by mismatched settings.
I moved initdb's existing LC_CTYPE-to-encoding mapping knowledge into
a new src/port/ file so it could be shared by CREATE DATABASE.
the same transaction can be identified even when no regular XID was assigned.
This seems essential after addition of the lazy-XID patch. Also some
minor code cleanup in write_csvlog().
happen condition can happen given incorrect input. The real problem is that
gram.y should try harder to distinguish * from "*" --- the latter is a legal
column name per spec, and someday we ought to treat it that way. However
fixing that is too invasive for a back-patch, and it's too late for the 8.3
cycle too. So just reduce the Assert to a plain elog for now. Per report
from NikhilS.
decompression of an already-compressed external value when we have to copy
it; save a few cycles when a value is too short for compression; and
annotate various lines that are currently unreachable.
- create a separate archive_mode GUC, on which archive_command is dependent
- %r option in recovery.conf sends last restartpoint to recovery command
- %r used in pg_standby, updated README
- minor other code cleanup in pg_standby
- doc on Warm Standby now mentions pg_standby and %r
- log_restartpoints recovery option emits LOG message at each restartpoint
- end of recovery now displays last transaction end time, as requested
by Warren Little; also shown at each restartpoint
- restart archiver if needed to carry away WAL files at shutdown
Simon Riggs
join search order portion of the planner; this is specifically intended to
simplify developing a replacement for GEQO planning. Patch by Julius
Stroffek, editorialized on by me. I renamed make_one_rel_by_joins to
standard_join_search and make_rels_by_joins to join_search_one_level to better
reflect their place within this scheme.
bgwriter_lru_maxpages is exceeded leaves the loop variables in the
expected state. In the original coding, we'd fail to advance
next_to_clean, causing that buffer to be probably-uselessly rechecked next
time, and also have an off-by-one idea of the number of buffers scanned.
buffers that cannot possibly need to be cleaned, and estimates how many
buffers it should try to clean based on moving averages of recent allocation
requests and density of reusable buffers. The patch also adds a couple
more columns to pg_stat_bgwriter to help measure the effectiveness of the
bgwriter.
Greg Smith, building on his own work and ideas from several other people,
in particular a much older patch from Itagaki Takahiro.
table, by allocating just enough for a hardcoded number of dead tuples per
page. The current estimate is 200 dead tuples per page.
Per reports from Jeff Amiel, Erik Jones and Marko Kreen, and subsequent
discussion.
CVS: ----------------------------------------------------------------------
CVS: Enter Log. Lines beginning with `CVS:' are removed automatically
CVS:
CVS: Committing in .
CVS:
CVS: Modified Files:
CVS: commands/vacuumlazy.c
CVS: ----------------------------------------------------------------------
* stats_start_collector goes away; we always start the collector process,
unless prevented by a problem with setting up the stats UDP socket.
* stats_reset_on_server_start goes away; it seems useless in view of the
availability of pg_stat_reset().
* stats_block_level and stats_row_level are merged into a single variable
"track_counts", which controls all reports sent to the collector process.
* stats_command_string is renamed to track_activities.
* log_autovacuum is renamed to log_autovacuum_min_duration to better reflect
its meaning.
The log_autovacuum change is not a compatibility issue since it didn't exist
before 8.3 anyway. The other changes need to be release-noted.
later than latestCompletedXid, per Florian Pflug. Also some minor
improvements in the XIDCACHE_DEBUG code --- make sure each call of
TransactionIdIsInProgress is counted one way or another.
out at erratic times, because it is creating a totally unacceptable level
of noise in our buildfarm results. This patch can be reverted when and if
the code is fixed to not issue notices during cache reload events.
(because they are uncorrelated with the immediate parent query). We were
charging the full run cost to the parent node, disregarding the fact that
only one row need be fetched for EXISTS. While this would only be a
cosmetic issue in most cases, it might possibly affect planning outcomes
if the parent query were itself a subquery to some upper query.
Per recent discussion with Steve Crawford.
buildfarm member grebe, I see no reason to revert the 1-byte-header-friendly
changes I made in varlena.c. Instead, tweak the code a little bit to
get more advantage out of that.
character encodings that doesn't involve calling lower(). This should
cure the performance regression in this case complained of by Guillaume
Smet. It still leaves the horrid performance for multi-byte encodings
introduced in 8.2, but there's no obvious solution for that in sight.
demonstrably necessary for text_substring() since regexp_split functions
may pass it such a value; and we might as well convert the whole file
at once. Per buildfarm results (though I wonder why most machines aren't
showing a failure).
to not cause needless copying of text datums that have 1-byte headers.
Greg Stark, in response to performance gripe from Guillaume Smet and
ITAGAKI Takahiro.
unpruned XMAX in its header. At the cost of 4 bytes per page, this keeps us
from performing heap_page_prune when there's no chance of pruning anything.
Seems to be necessary per Heikki's preliminary performance testing.
TransactionIdDidAbort, when handling the case that xmin is one of the current
transaction's XIDs and the tuple has been deleted. xmax must also be one of
the current transaction's XIDs, since no one else can see it yet, and it's
cheaper to look at local state than shared state to find out if xmax aborted.
Per an idea of Heikki's.
For XIDs of our own transaction and subtransactions, it's cheaper to ask
TransactionIdIsCurrentTransactionId() than to look in shared memory.
Also, the xids[] work array is always the same size within any given
process, so malloc it just once instead of doing a palloc/pfree on every
call; aside from being faster this lets us get rid of some goto's, since
we no longer have any end-of-function pfree to do. Both ideas by Heikki.
columns, and the new version can be stored on the same heap page, we no longer
generate extra index entries for the new version. Instead, index searches
follow the HOT-chain links to ensure they find the correct tuple version.
In addition, this patch introduces the ability to "prune" dead tuples on a
per-page basis, without having to do a complete VACUUM pass to recover space.
VACUUM is still needed to clean up dead index entries, however.
Pavan Deolasee, with help from a bunch of other people.
values. The previous coding essentially assumed that x = sqrt(x*x), which
does not hold for x < 0.
Thanks to Jie Zhang at Greenplum and Gavin Sherry for reporting this
issue.
database via builtin functions, as recently discussed on -hackers.
chr() now returns a character in the database encoding. For UTF8 encoded databases
the argument is treated as a Unicode code point. For other multi-byte encodings
the argument must designate a strict ascii character, or an error is raised,
as is also the case if the argument is 0.
ascii() is adjusted so that it remains the inverse of chr().
The two argument form of convert() is gone, and the three argument form now
takes a bytea first argument and returns a bytea. To cover this loss three new
functions are introduced:
. convert_from(bytea, name) returns text - converts the first argument from the
named encoding to the database encoding
. convert_to(text, name) returns bytea - converts the first argument from the
database encoding to the named encoding
. length(bytea, name) returns int - gives the length of the first argument in
characters in the named encoding
we'd dump core anyway immediately afterward if it were null; and it
seems to confuse some versions of icc into generating bad code.
Per report from Sergey Koposov. Patched in HEAD only, for the moment,
since this is only likely to affect developers.
no-longer-needed pages at the end of a table. We thought we could throw away
pages containing HEAPTUPLE_DEAD tuples; but this is not so, because such
tuples very likely have index entries pointing at them, and we wouldn't have
removed the index entries. The problem only emerges in a somewhat unlikely
race condition: the dead tuples have to have been inserted by a transaction
that later aborted, and this has to have happened between VACUUM's initial
scan of the page and then rechecking it for empty in count_nondeletable_pages.
But that timespan will include an index-cleaning pass, so it's not all that
hard to hit. This seems to explain a couple of previously unsolved bug
reports.
than two independent bits (one of which was never used in heap pages anyway,
or at least hadn't been in a very long time). This gives us flexibility to
add the HOT notions of redirected and dead item pointers without requiring
anything so klugy as magic values of lp_off and lp_len. The state values
are chosen so that for the states currently in use (pre-HOT) there is no
change in the physical representation.
recover from elog(ERROR). Problem was created by introduction of hash seq
search tracking awhile back, and affects all branches that have bgwriter;
in HEAD the disease has snuck into autovacuum and walwriter too. (Not sure
that the latter two use hash_seq_search at the moment, but surely they might
someday.) Per report from Sergey Koposov.
Rename synonym.syn.sample and thesaurs.ths.sample to
synonym_sample.syn and thesaurs_sample.ths accordingly to be able to use they
in regression test.
Ispell dictionary uses synthetic simple dictionary files.
* Defined new struct WordEntryPosVector that holds a uint16 length and a
variable size array of WordEntries. This replaces the previous
convention of a variable size uint16 array, with the first element
implying the length. WordEntryPosVector has the same layout in memory,
but is more readable in source code. The POSDATAPTR and POSDATALEN
macros are still used, though it would now be more readable to access
the fields in WordEntryPosVector directly.
* Removed needfree field from DocRepresentation. It was always set to false.
* Miscellaneous other commenting and refactoring
transaction, unless rolled back or overridden by a SET clause for the same
variable attached to a surrounding function call. Per discussion, these
seem the best semantics. Note that this is an INCOMPATIBLE CHANGE: in 8.0
through 8.2, SET LOCAL's effects disappeared at subtransaction commit
(leading to behavior that made little sense at the SQL level).
I took advantage of the opportunity to rewrite and simplify the GUC variable
save/restore logic a little bit. The old idea of a "tentative" value is gone;
it was a hangover from before we had a stack. Also, we no longer need a stack
entry for every nesting level, but only for those in which a variable's value
actually changed.
an exclusive lock on the table at this point, which we want to release as soon
as possible. This is called in the phase of lazy vacuum where we truncate the
empty pages at the end of the table.
An alternative solution would be to lower the vacuum delay settings before
starting the truncating phase, but this doesn't work very well in autovacuum
due to the autobalancing code (which can cause other processes to change our
cost delay settings). This case could be considered in the balancing code, but
it is simpler this way.
Apparently it's a bug I introduced when I refactored spell.c to use the
readline function for reading and recoding the input file. I didn't
notice that some calls to STRNCMP used the non-lowercased version of the
input line.
and in passing, fix some bogosities dating from the custom_variable_classes
patch. Fix guc-file.l to correctly check changes in custom_variable_classes
that are attempted concurrently with additions/removals of custom variables,
and don't allow the new setting to be applied in advance of checking it.
Clean up messy and undocumented situation for string variables with NULL
boot_val. Fix DefineCustomVariable functions to initialize boot_val
correctly. Prevent find_option from inserting bogus placeholders for custom
variables that are simply inquired about rather than being set.
ReadNewTransactionId from GetSnapshotData --- with a "latestCompletedXid"
variable that is updated during transaction commit or abort. Since
latestCompletedXid is written only in places that had to lock ProcArrayLock
exclusively anyway, and is read only in places that had to lock ProcArrayLock
shared anyway, it adds no new locking requirements to the system despite being
cluster-wide. Moreover, removing ReadNewTransactionId from snapshot
acquisition eliminates the need to take both XidGenLock and ProcArrayLock at
the same time. Since XidGenLock is sometimes held across I/O this can be a
significant win. Some preliminary benchmarking suggested that this patch has
no effect on average throughput but can significantly improve the worst-case
transaction times seen in pgbench. Concept by Florian Pflug, implementation
by Tom Lane.
no need for serialization against snapshot-taking because the xact doesn't
affect anyone else's snapshot anyway. Per discussion. Also, move various
info about the interlocking of transactions and snapshots out of code comments
and into a hopefully-more-cohesive discussion in access/transam/README.
Also, remove a couple of now-obsolete comments about having to force some WAL
to be written to persuade RecordTransactionCommit to do its thing.
big misalignement, then it tries to split page basing on distribution
of boxe's centers.
Per report from Dolafi, Tom <dolafit@janelia.hhmi.org>
Backpatch is needed, change doesn't affect on-disk storage.
- change the alignment requirement of lexemes in TSVector slightly.
Lexeme strings were always padded to 2-byte aligned length to make sure
that if there's position array (uint16[]) it has the right alignment.
The patch changes that so that the padding is not done when there's no
positions. That makes the storage of tsvectors without positions
slightly more compact.
- added some #include "miscadmin.h" lines I missed in the earlier when I
added calls to check_stack_depth().
- Reimplement the send/recv functions, and added a comment
above them describing the on-wire format. The CRC is now recalculated in
tsquery as well per previous discussion.
- add code to check that the query tree is well-formed. It was indeed
possible to send malformed queries in binary mode, which produced all
kinds of strange results.
- make the left-field a uint32. There's no reason to
arbitrarily limit it to 16-bits, and it won't increase the disk/memory
footprint either now that QueryOperator and QueryOperand are separate
structs.
- add check_stack_depth() call to all recursive functions I found.
Some of them might have a natural limit so that you can't force
arbitrarily deep recursions, but check_stack_depth() is cheap enough
that seems best to just stick it into anything that might be a problem.
small editorization by me
- Brake the QueryItem struct into QueryOperator and QueryOperand.
Type was really the only common field between them. QueryItem still
exists, and is used in the TSQuery struct as before, but it's now a
union of the two. Many other changes fell from that, like separation
of pushval_asis function into pushValue, pushOperator and pushStop.
- Moved some structs that were for internal use only from header files
to the right .c-files.
- Moved tsvector parser to a new tsvector_parser.c file. Parser code was
about half of the size of tsvector.c, it's also used from tsquery.c, and
it has some data structures of its own, so it seems better to separate
it. Cleaned up the API so that TSVectorParserState is not accessed from
outside tsvector_parser.c.
- Separated enumerations (#defines, really) used for QueryItem.type
field and as return codes from gettoken_query. It was just accidental
code sharing.
- Removed ParseQueryNode struct used internally by makepol and friends.
push*-functions now construct QueryItems directly.
- Changed int4 variables to just ints for variables like "i" or "array
size", where the storage-size was not significant.
null::char(3) to a simple Const node. (It already worked for non-null values,
but not when we skipped evaluation of a strict coercion function.) This
prevents loss of typmod knowledge in situations such as exhibited in bug
#3598. Unfortunately there seems no good way to fix that bug in 8.1 and 8.2,
because they simply don't carry a typmod for a plain Const node.
In passing I made all the other callers of makeNullConst supply "real" typmod
values too, though I think it probably doesn't matter anywhere else.
that examine fields that could change under them. This is just to make
really sure that when we are fetching a value 'only once', that's what
actually happens. Possibly this is a bug that should be back-patched,
but in the absence of solid evidence that it's needed, I won't bother.
rows will normally never obtain an XID at all. We already did things this way
for subtransactions, but this patch extends the concept to top-level
transactions. In applications where there are lots of short read-only
transactions, this should improve performance noticeably; not so much from
removal of the actual XID-assignments, as from reduction of overhead that's
driven by the rate of XID consumption. We add a concept of a "virtual
transaction ID" so that active transactions can be uniquely identified even
if they don't have a regular XID. This is a much lighter-weight concept:
uniqueness of VXIDs is only guaranteed over the short term, and no on-disk
record is made about them.
Florian Pflug, with some editorialization by Tom.
(Actually, it works as a plain statement too, but I didn't document that
because it seems a bit useless.) Unify VariableResetStmt with
VariableSetStmt, and clean up some ancient cruft in the representation of
same.
initcap style --- the vast majority of the existing descriptions do not use
an initial cap. I didn't change places where the first word was all-cap.
initdb not forced because this doesn't change any regression test results.
There are still some loose ends: I didn't do anything about the SET FROM
CURRENT idea yet, and it's not real clear whether we are happy with the
interaction of SET LOCAL with function-local settings. The documentation
is a bit spartan, too.
regardless of the number of tuples involved, it's incorrect to skip it
when memtupcount = 1; the number of cycles saved is minuscule anyway.
An alternative solution would be to pull the state changes out to the
call site in tuplesort_performsort, but keeping them near the corresponding
changes in make_bounded_heap seems marginally cleaner. Noticed by
Greg Stark.
the number of rows likely to be produced by a query such as
SELECT * FROM t1 LEFT JOIN t2 USING (key) WHERE t2.key IS NULL;
What this is doing is selecting for t1 rows with no match in t2, and thus
it may produce a significant number of rows even if the t2.key table column
contains no nulls at all. 8.2 thinks the table column's null fraction is
relevant and thus may estimate no rows out, which results in terrible plans
if there are more joins above this one. A proper fix for this will involve
passing much more information about the context of a clause to the selectivity
estimator functions than we ever have. There's no time left to write such a
patch for 8.3, and it wouldn't be back-patchable into 8.2 anyway. Instead,
put in an ad-hoc test to defeat the normal table-stats-based estimation when
an IS NULL test is evaluated at an outer join, and just use a constant
estimate instead --- I went with 0.5 for lack of a better idea. This won't
catch every case but it will catch the typical ways of writing such queries,
and it seems unlikely to make things worse for other queries.
generating the tuples has resjunk output columns. This is not possible for
simple table scans but can happen when evaluating a whole-row Var for a view.
Per example from Patryk Kordylewski. The problem exists back to 8.0 but
I'm not going to risk back-patching further than 8.2 because of the many
changes in this area.
sets for outer joins, in the light of bug #3588 and additional thought and
experimentation. The original methodology was fatally flawed for nests of
more than two outer joins: it got the relationships between adjacent joins
right, but didn't always come to the right conclusions about whether a join
could be interchanged with one two or more levels below it. This was largely
caused by a mistaken idea that we should use the min_lefthand + min_righthand
sets of a sub-join as the minimum left or right input set of an upper join
when we conclude that the sub-join can't commute with the upper one. If
there's a still-lower join that the sub-join *can* commute with, this method
led us to think that that one could commute with the topmost join; which it
can't. Another problem (not directly connected to bug #3588) was that
make_outerjoininfo's processing-order-dependent method for enforcing outer
join identity #3 didn't work right: if we decided that join A could safely
commute with lower join B, we dropped all information about sub-joins under B
that join A could perhaps not safely commute with, because we removed B's
entire min_righthand from A's.
To fix, make an explicit computation of all inner join combinations that occur
below an outer join, and add to that the full syntactic relsets of any lower
outer joins that we determine it can't commute with. This method gives much
more direct enforcement of the outer join rearrangement identities, and it
turns out not to cost a lot of additional bookkeeping.
Thanks to Richard Harris for the bug report and test case.
checks for individual-table-size functions, since anyone in the database could
get approximate values from pg_class.relpages anyway. Allow database-size to
users with CONNECT privilege for the target database (note that this is
granted by default). Allow tablespace-size if the user has CREATE privilege
on the tablespace (which is *not* granted by default), or if the tablespace is
the default tablespace for the current database (since we treat that as
implicitly allowing use of the tablespace).
even if the "deadlock detected" ERROR message is suppressed by an exception
catcher. Be clearer about the event sequence when a soft deadlock is fixed:
the fixing process might or might not still have to wait, so log that
separately. Fix race condition when someone releases us from the lock partway
through printing all this junk --- we'd not get confused about our state, but
the log message sequence could have been misleading, ie, a "still waiting"
message with no subsequent "acquired" message. Greg Stark and Tom Lane.
namespace isn't necessarily first in the search path (there could be implicit
schemas ahead of it). Examples are
test=# set search_path TO s1;
test=# create view pg_timezone_names as select * from pg_timezone_names();
ERROR: "pg_timezone_names" is already a view
test=# create table pg_class (f1 int primary key);
ERROR: permission denied: "pg_class" is a system catalog
You'd expect these commands to create the requested objects in s1, since
names beginning with pg_ aren't supposed to be reserved anymore. What is
happening is that we create the requested base table and then execute
additional commands (here, CREATE RULE or CREATE INDEX), and that code is
passed the same RangeVar that was in the original command. Since that
RangeVar has schemaname = NULL, the secondary commands think they should do a
path search, and that means they find system catalogs that are implicitly in
front of s1 in the search path.
This is perilously close to being a security hole: if the secondary command
failed to apply a permission check then it'd be possible for unprivileged
users to make schema modifications to system catalogs. But as far as I can
find, there is no code path in which a check doesn't occur. Which makes it
just a weird corner-case bug for people who are silly enough to want to
name their tables the same as a system catalog.
The relevant code has changed quite a bit since 8.2, which means this patch
wouldn't work as-is in the back branches. Since it's a corner case no one
has reported from the field, I'm not going to bother trying to back-patch.
days that was obsolete the moment we had IN (SELECT ...) capability.
It's arguably a security hole since it applied no permissions check to
the table it searched, and since it was never documented anywhere,
removing it seems more appropriate than fixing it.
and pg_tablespace_size to superusers. Perhaps we could weaken the first
case to just require SELECT privilege, but that doesn't work for the
other cases, so use ownership as the common concept.
While it's not clear that TID linkage info is of any great use to a
nefarious user, it's certainly unexpected that these functions wouldn't
insist on read privileges.
relcache entry after having heap_close'd it. This could lead to misbehavior
if a relcache flush wiped out the cache entry meanwhile. In 8.2 there is a
very real risk of CREATE INDEX CONCURRENTLY using the wrong relid for locking
and waiting purposes. I think the bug is only cosmetic in 8.0 and 8.1,
because their transgression is limited to using RelationGetRelationName(rel)
in an ereport message immediately after heap_close, and there's no way (except
with special debugging options) for a cache flush to occur in that interval.
Not quite sure that it's cosmetic in 7.4, but seems best to patch anyway.
Found by trying to run the regression tests with CLOBBER_CACHE_ALWAYS enabled.
Maybe we should try to do that on a regular basis --- it's awfully slow,
but perhaps some fast buildfarm machine could do it once in awhile.
- ispell initialization crashed on empty dictionary file
- ispell initialization crashed on affix file with prefixes but no suffixes
- stop words file was run through pg_verify_mbstr, with database
encoding, but it's supposed to be UTF-8; similar bug for synonym files
- bunch of comments added, typos fixed, and other cleanup
Introduced consistent encoding checking/conversion of data read from tsearch
configuration files, by doing this in a single t_readline() subroutine
(replacing direct usages of fgets). Cleaned up API for readstopwords too.
Heikki Linnakangas
This prevents needing to do complex and poorly-defined updates of the
mapping table if the new parser has different token types than the old.
Per discussion.
init options of the template as top-level options in the syntax. This also
makes ALTER a bit easier to use, since options can be replaced individually.
I also made these statements verify that the tmplinit method will accept
the new settings before they get stored; in the original coding you didn't
find out about mistakes until the dictionary got invoked.
Under the hood, init methods now get options as a List of DefElem instead
of a raw text string --- that lets tsearch use existing options-pushing code
instead of duplicating functionality.
'with map' parameter; as things now stand there's really not much point
in specifying a config-to-copy if you don't copy its map. Also, use
COPY instead of TEMPLATE as the key word for a config-to-copy, so as
to avoid confusion with text search templates. Per discussion; the
just-committed reference page for the command already describes it
this way.
byte after the last full byte of the bit array, regardless of whether that
byte was part of the valid data or not. Found by buildfarm testing.
Thanks to Stefan Kaltenbrunner for nailing down the cause.
Oleg Bartunov and Teodor Sigaev, but I did a lot of editorializing,
so anything that's broken is probably my fault.
Documentation is nonexistent as yet, but let's land the patch so we can
get some portability testing done.
are not one of the query's defined result relations, but nonetheless have
triggers fired against them while the query is active. This was formerly
impossible but can now occur because of my recent patch to fix the firing
order for RI triggers. Caching a ResultRelInfo avoids duplicating work by
repeatedly opening and closing the same relation, and also allows EXPLAIN
ANALYZE to "see" and report on these extra triggers. Use the same mechanism
to cache open relations when firing deferred triggers at transaction shutdown;
this replaces the former one-element-cache strategy used in that case, and
should improve performance a bit when there are deferred triggers on a number
of relations.
row within one query: we were firing check triggers before all the updates
were done, leading to bogus failures. Fix by making the triggers queued by
an RI update go at the end of the outer query's trigger event list, thereby
effectively making the processing "breadth-first". This was indeed how it
worked pre-8.0, so the bug does not occur in the 7.x branches.
Per report from Pavel Stehule.
that still thought they could set HEAP_XMAX_COMMITTED immediately after
seeing the other transaction commit. Make them use the same logic as
tqual.c does to determine if the hint bit can be set yet.
First, we cannot assume that XLogAsyncCommitFlush guarantees hint bits will be
settable, because clog.c's inexact LSN bookkeeping results in windows where a
previously flushed transaction is considered unhintable because it shares an
LSN slot with a later unflushed transaction. But repair_frag requires
XMIN_COMMITTED to be correct so that it can distinguish tuples moved by the
current vacuum. Since not being able to set the bit is an uncommon corner
case, the most practical way of dealing with it seems to be to abandon
shrinking (ie, don't invoke repair_frag) when we find a non-dead tuple whose
XMIN_COMMITTED bit couldn't be set.
Second, it is possible for the same reason that a RECENTLY_DEAD tuple does not
get its XMAX_COMMITTED bit set during scan_heap. But by the time repair_frag
examines the tuple it might be possible to set the bit. We therefore must
take buffer content lock when calling HeapTupleSatisfiesVacuum a second time,
else we can get an Assert failure in SetBufferCommitInfoNeedsSave. This
latter bug is latent in existing releases, but I think it cannot actually
occur without async commit, since the first HeapTupleSatisfiesVacuum call
should always have set the bit. So I'm not going to back-patch it.
In passing, reduce the existing "cannot shrink relation" messages from NOTICE
to LOG level. The new message must be no higher than LOG if we don't want
unpredictable regression test failures, and consistency seems like a good
idea. Also arrange that only one such message is reported per VACUUM FULL;
in typical scenarios you could get spammed with many such messages, which
seems a bit useless.
enlarge the memory chunk in-place when it was feasible to do so. This turns
out to not work well at all for scenarios involving repeated cycles of
palloc/repalloc/pfree: the eventually freed chunks go into the wrong freelist
for the next initial palloc request, and so we consume memory indefinitely.
While that could be defended against, the number of cases where the
optimization can still be applied drops significantly, and adjusting the
initial sizes of StringInfo buffers makes it drop to almost nothing.
Seems better to just remove the extra complexity.
Per recent discussion and testing.
likewise increase the initial size of the scanner's literal buffer to 1024
(from 128). Instrumentation of the regression tests suggests that this
saves a useful amount of repalloc() traffic --- the number of calls occurring
during one set of tests drops from about 6900 to about 3900. The old sizes
were chosen in the late 90's with an eye to machines much smaller than
are common today.
regexp_split_to_table() within a single query. This is only a partial
solution, as it turns out that with enough matches per string these
functions can also tickle a repalloc() misbehavior. But fixing that
is a topic for a separate patch.
that cached compiled patterns will still be there when the function is next
called. Clean up looping logic, thereby fixing bug identified by Pavel
Stehule. Share setup code between the two functions, add some comments, and
avoid risky mixing of int and size_t variables. Clean up the documentation a
tad, and accept all the flag characters mentioned in table 9-19 rather than
just a subset.
constant flow of new connection requests could prevent the postmaster from
completing a shutdown or crash restart. This is done by labeling child
processes that are "dead ends", that is, we know that they were launched only
to tell a client that it can't connect. These processes are managed
separately so that they don't confuse us into thinking that we can't advance
to the next stage of a shutdown or restart sequence, until the very end
where we must wait for them to drain out so we can delete the shmem segment.
Per discussion of a misbehavior reported by Keaton Adams.
Since this code was baroque already, and my first attempt at fixing the
problem made it entirely impenetrable, I took the opportunity to rewrite it
in a state-machine style. That eliminates some duplicated code sections and
hopefully makes everything a bit clearer.
hash table is allocated in a child context of the agg node's memory
context, MemoryContextReset() will reset but *not* delete the child
context. Since ExecReScanAgg() proceeds to build a new hash table
from scratch (in a new sub-context), this results in leaking the
header for the previous memory context. Therefore, use
MemoryContextResetAndDeleteChildren() instead.
Credit: My colleague Sailesh Krishnamurthy at Truviso for isolating
the cause of the leak.
child memory contexts is indented two spaces to the right of its
parent context. This should make it easier to deduce the memory
context hierarchy from the output of MemoryContextStats().
between the setting of log_line_prefix and the setting of log_timezone. We
can't realistically set log_timezone any earlier than we do now, so the best
behavior seems to be to use GMT zone if any timestamps are to be logged during
early startup. Create a dummy zone variable with a minimal definition of GMT
(in particular it will never know about leap seconds), so that we can set it
up without reference to any external files.
as well as regular backends: if no regular backend launches before the autovac
launcher tries to start an autovac worker, the postmaster would get an Assert
fault due to calling PostmasterRandom before random_seed was initialized.
Cleanest solution seems to be to take the initialization of random_seed out
of ServerLoop and let PostmasterRandom do it for itself.
displayed in the postmaster log. This avoids Windows-specific problems with
localized time zone names that are in the wrong encoding, and generally seems
like a good idea to forestall other potential platform-dependent issues.
To preserve the existing behavior that all backends will log in the same time
zone, create a new GUC variable log_timezone that can only be changed on a
system-wide basis, and reference log-related calculations to that zone instead
of the TimeZone variable.
This fixes the issue reported by Hiroshi Saito that timestamps printed by
xlog.c startup could be improperly localized on Windows. We still need a
simpler patch for that problem in the back branches, however.
not bothering to initialize is_autovacuum for regular backends, meaning there
was a significant chance of the postmaster prematurely sending them SIGTERM
during database shutdown. Also, leaving the cancel key unset for an autovac
worker meant that any client could send it SIGINT, which doesn't sound
especially good either.
so that we will be able to create a cookie for all processes for CSVlogs.
It is set wherever MyProcPid is set. Take the opportunity to remove the now
unnecessary session-only restriction on the %s and %c escapes in log_line_prefix.
before reporting a transaction committed. Data consistency is still
guaranteed (unlike setting fsync = off), but a crash may lose the effects
of the last few transactions. Patch by Simon, some editorialization by Tom.
with the recent patch to log temp file sizes at removal time. Doesn't seem
worth fixing since it's unused.
In passing, make a few elog messages conform to the message style guide.
named pg_toast_temp_nnn, alongside the pg_temp_nnn schemas used for the temp
tables themselves. This allows low-level code such as the relcache to
recognize that these tables are indeed temporary, which enables various
optimizations such as not WAL-logging changes and using local rather than
shared buffers for access. Aside from obvious performance benefits, this
provides a solution to bug #3483, in which other backends unexpectedly held
open file references to temporary tables. The scheme preserves the property
that TOAST tables are not in any schema that's normally in the search path,
so they don't conflict with user table names.
initdb forced because of changes in system view definitions.
checking whether an IS NULL/IS NOT NULL clause is implied or refuted by
a strict function. Per example from Dawid Kuroczko.
Backpatch to 8.2 since this is arguably a performance bug.
and fsync WAL at convenient intervals. For the moment it just tries to
offload this work from backends, but soon it will be responsible for
guaranteeing a maximum delay before asynchronously-committed transactions
will be flushed to disk.
This is a portion of Simon Riggs' async-commit patch, committed to CVS
separately because a background WAL writer seems like it might be a good idea
independently of the async-commit feature. I rebased walwriter.c on
bgwriter.c because it seemed like a more appropriate way of handling signals;
while the startup/shutdown logic in postmaster.c is more like autovac because
we want walwriter to quit before we start the shutdown checkpoint.
I/O utilization, per discussion.
While at it, lower the autovacuum vacuum and analyze threshold values to 50
tuples. It is a bit higher (i.e. more conservative) than what I originally
proposed but much better than the old values for small tables.
against a Unix server, and Windows-specific server-side authentication
using SSPI "negotiate" method (Kerberos or NTLM).
Only builds properly with MSVC for now.
log_min_error_statement is active and there is some problem in logging the
current query string; for example, that it's too long to include in the log
message without running out of memory. This problem has existed since the
log_min_error_statement feature was introduced. No doubt the reason it
wasn't detected long ago is that 8.2 is the first release that defaults
log_min_error_statement to less than PANIC level.
Per report from Bill Moran.
truncated relation was deleted later in the WAL sequence. Since replay
normally auto-creates a relation upon its first reference by a WAL log entry,
failure is seen only if the truncate entry happens to be the first reference
after the checkpoint we're restarting from; which is a pretty unusual case but
of course not impossible. Fix by making truncate entries auto-create like
the other ones do. Per report and test case from Dharmendra Goyal.
when handed an invalidly-encoded pattern. The previous coding could get
into an infinite loop if pg_mb2wchar_with_len() returned a zero-length
string after we'd tested for nonempty pattern; which is exactly what it
will do if the string consists only of an incomplete multibyte character.
This led to either an out-of-memory error or a backend crash depending
on platform. Per report from Wiktor Wodecki.
a MIN or MAX aggregate call into an indexscan: the initplan is being made at
the current query nesting level and so we shouldn't increment query_level.
Though usually harmless, this mistake could lead to bogus "plan should not
reference subplan's variable" failures on complex queries. Per bug report
from David Sanchez i Gregori.
referencing table does not change the tuple's FK column(s), we don't bother
to check the PK table since the constraint was presumably already valid.
However, the check is still necessary if the tuple was inserted by our own
transaction, since in that case the INSERT trigger will conclude it need not
make the check (since its version of the tuple has been deleted). We got this
right for simple cases, but not when the insert and update are in different
subtransactions of the current top-level transaction; in such cases the FK
check would never be made at all. (Hence, problem dates back to 8.0 when
subtransactions were added --- it's actually the subtransaction version of a
bug fixed in 7.3.5.) Fix, and add regression test cases. Report and fix by
Affan Salman.
been broken since forever, but was not noticed because people seldom look
at raw parse trees. AFAIK, no impact on users except that debug_print_parse
might fail; but patch it all the way back anyway. Per report from Jeff Ross.
define pg_dlsym() as returning a PGFunction pointer, not just any
pointer-to-function. But many are not. Suppress compiler warnings
on platforms that aren't careful by inserting explicit casts at the
two call sites that didn't have a cast already. Per Stefan.
SIGQUIT) will be recognized and processed while waiting for input,
rather than only after something has been typed. Also make SIGQUIT
do the same thing as SIGTERM in single-user mode, ie, do a normal
shutdown and exit. Since it's relatively easy to provoke SIGQUIT
from the keyboard, people may try that instead of control-D, and we'd
rather this leads to orderly shutdown. Per report from Leon Mergen
and subsequent discussion.
we don't know at that point which relation OID to tell pgstat to forget.
The code was passing the relfilenode, which is incorrect, and could possibly
cause some other relation's stats to be zeroed out. While we could try to
clean this up, it seems much simpler and more reliable to let the next
invocation of pgstat_vacuum_tabstat() fix things; which indeed is how it
worked before I introduced the buggy code into 8.1.3 and later :-(.
Problem noticed by Itagaki Takahiro, fix is per subsequent discussion.
ORDER BY <constant> as redundant. One is that this means query_planner()
has to canonicalize pathkeys even when the query jointree is empty;
the canonicalization was always a no-op in such cases before, but no more.
Also, we have to guard against thinking that a set-returning function is
"constant" for this purpose. Add a couple of regression tests for these
evidently under-tested cases. Per report from Greg Stark and subsequent
experimentation.
unwarranted liberties with int8 vs float8 values for these types.
Specifically, be sure to apply either hashint8 or hashfloat8 depending
on HAVE_INT64_TIMESTAMP. Per my gripe of even date.
checkpoint. The comment claimed that we could do this anytime after
setting the checkpoint REDO point, but actually BufferSync is relying
on the assumption that buffers dumped by other backends will be fsync'd
too. So we really could not do it any sooner than we are doing it.
Sequences and views could previously be renamed using ALTER TABLE, but
this was a repeated source of confusion for users. Update the docs,
and psql tab completion. Patch from David Fetter; various minor fixes
by myself.
This is a Linux kernel bug that apparently exists in every extant kernel
version: sometimes shmctl() will fail with EIDRM when EINVAL is correct.
We were assuming that EIDRM indicates a possible conflict with pre-existing
backends, and refusing to start the postmaster when this happens. Fortunately,
there does not seem to be any case where Linux can legitimately return EIDRM
(it doesn't track shmem segments in a way that would allow that), so we can
get away with just assuming that EIDRM means EINVAL on this platform.
Per reports from Michael Fuhr and Jon Lapham --- it's a bit surprising
we have not seen more reports, actually.
so that it responds to SIGQUIT reasonably promptly even on machines where
SA_RESTART signals restart a sleep from scratch. (This whole area could
stand some rethinking, but for now make it work like the other processes
do.) Also some marginal stylistic cleanups.
for it to die before telling the bgwriter to initiate shutdown checkpoint.
Since it's connected to shared memory, this seems more prudent than the
alternative of letting it quit asynchronously. Resolves my complaint
of yesterday about repeated shutdown checkpoints in CVS HEAD.
that are fired at end-of-statement (as is the normal case for foreign keys,
for example). In this situation the per-subxact deferred trigger context
is always empty when subtransaction exit is reached; so we could free it,
but were not doing so, leading to an intratransaction leak of 8K or more
per subtransaction. Per off-list example from Viatcheslav Kalinin
subsequent to bug #3418 (his original bug report omitted a foreign key
constraint needed to cause this leak).
Back-patch to 8.2; prior versions were not using per-subxact contexts
for deferred triggers, so did not have this leak.
memory context pointing at a context not long lived enough.
Also, create a fake PortalContext where to store the vac_context, if only
to avoid having it be a top-level memory context.
continue with the schedule. Change current uses of SIGINT to abort a worker
into SIGTERM, which keeps the old behaviour of terminating the process.
Patch from ITAGAKI Takahiro, with some editorializing of my own.
overruns (neither of which seem likely to be exploitable as security holes,
fortunately, since the provoker can't control the data written). One of
these is due to choosing to stomp on the output of a called function, which
is bad news in any case; make it treat the called functions' results as
read-only. Avoid some unnecessary palloc/pfree traffic too; it's not
really helpful to free small temporary objects, and again this is presuming
more than it ought to about the nature of the results of called functions.
Per report from Patrick Welche and additional code-reading by Imad.
over a fairly long period of time, rather than being spat out in a burst.
This happens only for background checkpoints carried out by the bgwriter;
other cases, such as a shutdown checkpoint, are still done at full speed.
Remove the "all buffers" scan in the bgwriter, and associated stats
infrastructure, since this seems no longer very useful when the checkpoint
itself is properly throttled.
Original patch by Itagaki Takahiro, reworked by Heikki Linnakangas,
and some minor API editorialization by me.
by having the postmaster signal it when certain failures occur. This requires
the postmaster setting a flag in shared memory, but should be as safe as the
pmsignal.c code is.
Also make sure the launcher honor's a postgresql.conf change turning it off
on SIGHUP.
(which now deals only in optimizable statements), and put that code
into a new file parser/parse_utilcmd.c. This helps clarify and enforce
the design rule that utility statements shouldn't be processed during
the regular parse analysis phase; all interpretation of their meaning
should happen after they are given to ProcessUtility to execute.
(We need this because we don't retain any locks for a utility statement
that's in a plan cache, nor have any way to detect that it's stale.)
We are also able to simplify the API for parse_analyze() and related
routines, because they will now always return exactly one Query structure.
In passing, fix bug #3403 concerning trying to add a serial column to
an existing temp table (this is largely Heikki's work, but we needed
all that restructuring to make it safe).
parse_int() and with itself (strtod allows leading whitespace, so it
seems odd not to allow trailing whitespace). parse_bool remains
not-whitespace-friendly, but this is generically true for non-numeric
GUC variables, so I'll desist from changing it.
contain a wrong unit specification, per discussion.
In passing, fix the code to avoid unnecessary integer overflows when
converting units, and to detect overflows when they do occur.
actually works sanely, viz not 0 and not more than INT_MAX/1000
(else TimestampTzPlusMilliseconds can overflow). Per discussion with
Greg Stark. Since this is a superuser-only setting and there was not
previously any big reason to change it, not worth back-patching.
create table foo (bar int default null default 3);
due to not thinking about the special-case handling of DEFAULT NULL.
Problem noticed while investigating bug #3396.
test seems inessential right now since the only control path for not
getting the lock is via CHECK_FOR_INTERRUPTS which won't return control
to ProcSleep, but it would be important if we ever allow the deadlock
code to kill someone else's transaction instead of our own.
within a signal handler (this might be safe given the relatively narrow code
range in which the interrupt is enabled, but it seems awfully risky); do issue
more informative log messages that tell what is being waited for and the exact
length of the wait; minor other code cleanup. Greg Stark and Tom Lane
unreserved according to the grammar. The list of unreserved words has gotten
extensive enough that the unnecessary quoting is becoming a bit of an eyesore.
To do this, add knowledge of the keyword category to keywords.c's table.
(Someday we might be able to generate keywords.c's table and the keyword lists
in gram.y from a common source.) For the moment, lie about WITH's status in
the table so it will still get quoted --- this is because of the expectation
that WITH will become reserved when the SQL recursive-queries patch gets done.
I didn't force initdb because this affects nothing on-disk; but note that a
few regression tests have changed expected output.
profiling that CopyAttributeOutText was taking an unreasonable fraction of
the backend run time (like 66%!) on the following trivial test case:
$ time psql -c "copy (select repeat('xyzzy',50) from generate_series(1,10000000)) to stdout" regression >/dev/null
The time is all being spent on scanning the string for characters to be
escaped, which most of the time there aren't any of. Some tweaking to take
as many tests as possible out of the inner loop reduced the runtime of this
example by more than 10%. In a real-world case it wouldn't be as useful
a speedup, but it still seems worth adding a few lines here.
few lines in sql_exec_error_callback() by using the function source string
field that the patch added to SQL function cache entries. This doesn't work
because the fn_extra field isn't filled in yet during init_sql_fcache().
Probably it could be made to work, but it doesn't seem appropriate to contort
the main code paths to make an error-reporting path a tad faster. Per report
from Pavel Stehule.
an array of strings rather than an array of integers, and allow any simple
constant or identifier to be used in typmods; for example
create table foo (f1 widget(42,'23skidoo',point));
Of course the typmodin function has still got to pack this info into a
non-negative int32 for storage, but it's still a useful improvement in
flexibility, especially considering that you can do nearly anything if you
are willing to keep the info in a side table. We can get away with this
change since we have not yet released a version providing user-definable
typmods. Per discussion.
reassembled in the syslogger before writing to the log file. This prevents
partial messages from being written, which mucks up log rotation, and
messages from different backends being interleaved, which causes garbled
logs. Backport as far as 8.0, where the syslogger was introduced.
Tom Lane and Andrew Dunstan
historically worked in some but not all cases, but as of 8.2 it failed for all
timezone formats. Fix, and add regression test cases to catch future
regressions in this area. Per gripe from Adam Witney.
with a plpgsql-defined cursor. The underlying mechanism for this is that the
main SQL engine will now take "WHERE CURRENT OF $n" where $n is a refcursor
parameter. Not sure if we should document that fact or consider it an
implementation detail. Per discussion with Pavel Stehule.
Along the way, allow FOR UPDATE in non-WITH-HOLD cursors; there may once
have been a reason to disallow that, but it seems to work now, and it's
really rather necessary if you want to select a row via a cursor and then
update it in a concurrent-safe fashion.
Original patch by Arul Shaji, rather heavily editorialized by Tom Lane.
pseudo HeapScanDesc created for a bitmap heap scan. This avoids some useless
overhead during a bitmap scan startup, in particular invoking the syncscan
code. (We might someday want to do that, but right now it's merely useless
contention for shared memory, to say nothing of possibly pushing useful
entries out of syncscan's small LRU list.) This also allows elimination of
ugly pgstat_discount_heap_scan() kluge.
large inputs. Also cause it to error out immediately if the result will
overflow, instead of grinding through a lot of calculation first.
Per gripe from Jim Nasby.
causes a division-by-zero error in the vacuum code. This can happen when there
are more workers than cost limit units.
Per report from Galy Lee in
<200705310914.l4V9E6JA094603@wwwmaster.postgresql.org>.
value for the vacuum code. Instead, make zero signify getting the value from a
higher level configuration facility, just like -1 in the original coding. We
still document that -1 is the value that disables the feature, to avoid
confusing the user unnecessarily.
Reported by Galy Lee in <200705310914.l4V9E6JA094603@wwwmaster.postgresql.org>;
per subsequent discussion.
which is the only state in which it's safe to initiate database queries.
It turns out that all but two of the callers thought that's what it meant;
and the other two were using it as a proxy for "will GetTopTransactionId()
return a nonzero XID"? Since it was in fact an unreliable guide to that,
make those two just invoke GetTopTransactionId() always, then deal with a
zero result if they get one.
for each temp file, rather than once per sort or hashjoin; this allows
spreading the data of a large sort or join across multiple tablespaces.
(I remain dubious that this will make any difference in practice, but certain
people insisted.) Arrange to cache the results of parsing the GUC variable
instead of recomputing from scratch on every demand, and push usage of the
cache down to the bottommost fd.c level.
were accepted by prior Postgres releases. This takes care of the loose end
left by the preceding patch to downgrade implicit casts-to-text. To avoid
breaking desirable behavior for array concatenation, introduce a new
polymorphic pseudo-type "anynonarray" --- the added concatenation operators
are actually text || anynonarray and anynonarray || text.
from the other string-category types; this eliminates a lot of surprising
interpretations that the parser could formerly make when there was no directly
applicable operator.
Create a general mechanism that supports casts to and from the standard string
types (text,varchar,bpchar) for *every* datatype, by invoking the datatype's
I/O functions. These new casts are assignment-only in the to-string direction,
explicit-only in the other, and therefore should create no surprising behavior.
Remove a bunch of thereby-obsoleted datatype-specific casting functions.
The "general mechanism" is a new expression node type CoerceViaIO that can
actually convert between *any* two datatypes if their external text
representations are compatible. This is more general than needed for the
immediate feature, but might be useful in plpgsql or other places in future.
This commit does nothing about the issue that applying the concatenation
operator || to non-text types will now fail, often with strange error messages
due to misinterpreting the operator as array concatenation. Since it often
(not always) worked before, we should either make it succeed or at least give
a more user-friendly error; but details are still under debate.
Peter Eisentraut and Tom Lane
a session regardless of the existence of cached plans. The plancache
only needs to be invalidated so that rules affected by the new setting
will be reflected in the new query plans.
Jan
- Fix possible deadlock between UPDATE and VACUUM queries. Bug never was
observed in 8.2, but it still exist there. HEAD is more sensitive to
bug after recent "ring" of buffer improvements.
- Fix WAL creation: if parent page is stored as is after split then
incomplete split isn't removed during replay. This happens rather rare, only
on large tables with a lot of updates/inserts.
- Fix WAL replay: there was wrong test of XLR_BKP_BLOCK_* for left
page after deletion of page. That causes wrong rightlink field: it pointed
to deleted page.
- add checking of match of clearing incomplete split
- cleanup incomplete split list after proceeding
All of this chages doesn't change on-disk storage, so backpatch...
But second point may be an issue for replaying logs from previous version.
to be cases when at least Windows 2000 can do this even though select
just indicated that the socket is readable.
Per report and analysis from Cyril VELTER.
tablespace(s) in which to store temp tables and temporary files. This is a
list to allow spreading the load across multiple tablespaces (a random list
element is chosen each time a temp object is to be created). Temp files are
not stored in per-database pgsql_tmp/ directories anymore, but per-tablespace
directories.
Jaime Casanova and Albert Cervera, with review by Bernd Helmle and Tom Lane.
and most especially for UTF8. Remove unnecessary special cases for bytea
processing and single-byte charset ILIKE. a ILIKE b is now processed as
lower(a) LIKE lower(b) in all cases. The code is now considerably simpler. All
comparisons are now performed byte-wise, and the text and pattern are also
advanced byte-wise where it is safe to do so - essentially where a wildcard is
not being matched.
Andrew Dunstan, from an original patch by ITAGAKI Takahiro, with ideas from
Tom Lane and Mark Mielke.
wrong data when dumping a bufferload that crosses a component-file boundary.
This probably has not been seen in the wild because (a) component files are
normally 1GB apiece and (b) non-block-aligned buffer usage is relatively
rare. But it's fairly easy to reproduce a problem if one reduces RELSEG_SIZE
in a test build. Kudos to Kurt Harriman for spotting the bug.
type. Also, add explicit casts between boolean and text/varchar. Both
of these changes are for conformance with SQL:2003.
Update the regression tests, bump the catversion.
will exit before failing because of conflicting DB usage. Per discussion,
this seems a good idea to help mask the fact that backend exit takes nonzero
time. Remove a couple of thereby-obsoleted sleeps in contrib and PL
regression test sequences.
selecting power-of-2, rather than prime, numbers of buckets in hash joins.
If the hash functions are doing their jobs properly by making all hash bits
equally random, this is good enough, and it saves expensive integer division
and modulus operations.
delivering a well-randomized hash value. I got religion on this after
observing that performance of multi-batch hash join degrades terribly if the
higher-order bits of hash values aren't random, as indeed was true for say
hashes of small integer values. It's now expected and documented that hash
functions should use hash_any or some comparable method to ensure that all
bits of their output are about equally random.
initdb forced because this change invalidates existing hash indexes. For the
same reason, this isn't back-patchable; the hash join performance problem
will get a band-aid fix in the back branches.
EXPLAIN-only operation was a little too short; it skipped initializing the
node's result tuple type, which may be needed depending on what's above the
indexscan node. Call ExecAssignResultTypeFromTL before exiting. (For good
luck I moved up the ExecAssignScanProjectionInfo call as well, so that
everything except indexscan-specific initialization will still be done.)
Per example from Grant Finnemore.
index key columns always have the type expected by the index's associated
operators, ie, we add RelabelType nodes when dealing with binary-compatible
index opclasses. This is needed to get varchar indexes to play nicely with
the new EquivalenceClass machinery, as per recent gripe from Josh Berkus that
CVS HEAD was failing to match a varchar index column to a constant restriction
in the query.
It seems likely that this change will allow removal of a lot of ugly ad-hoc
RelabelType-stripping that the planner has traditionally done while matching
expressions to other expressions, but I'll worry about that some other day.
fail when used in a deferred trigger. Bug goes back to 8.0; no doubt the
reason it hadn't been noticed is that we've been discouraging use of
user-defined constraint triggers. Per report from Frank van Vugt.
buffers, rather than blowing out the whole shared-buffer arena. Aside from
avoiding cache spoliation, this fixes the problem that VACUUM formerly tended
to cause a WAL flush for every page it modified, because we had it hacked to
use only a single buffer. Those flushes will now occur only once per
ring-ful. The exact ring size, and the threshold for seqscans to switch into
the ring usage pattern, remain under debate; but the infrastructure seems
done. The key bit of infrastructure is a new optional BufferAccessStrategy
object that can be passed to ReadBuffer operations; this replaces the former
StrategyHintVacuum API.
This patch also changes the buffer usage-count methodology a bit: we now
advance usage_count when first pinning a buffer, rather than when last
unpinning it. To preserve the behavior that a buffer's lifetime starts to
decrease when it's released, the clock sweep code is modified to not decrement
usage_count of pinned buffers.
Work not done in this commit: teach GiST and GIN indexes to use the vacuum
BufferAccessStrategy for vacuum-driven fetches.
Original patch by Simon, reworked by Heikki and again by Tom.
"microsecond" and "millisecond" units were not considered valid input
by themselves, which caused inputs like "1 millisecond" to be rejected
erroneously.
Update the docs, add regression tests, and backport to 8.2 and 8.1
or abort within a backend; rearrange InitPostgres processing to make it so.
Revealed by just-added Asserts along with ECPG regression tests (hm, I wonder
why the core regression tests didn't expose it?). This possibly is another
reason for missing stats updates ...
and aborted transactions have different effects; also teach it not to assume
that prepared transactions are always committed.
Along the way, simplify the pgstats API by tying counting directly to
Relations; I cannot detect any redeeming social value in having stats
pointers in HeapScanDesc and IndexScanDesc structures. And fix a few
corner cases in which counts might be missed because the relation's
pgstat_info pointer hadn't been set.
inheritance child of an UPDATE/DELETE target relation can be excluded by
constraints. I had rearranged some code in set_append_rel_pathlist() to
avoid "useless" work when a child is excluded, but overdid it and left
the child with no cheapest_path entry, causing possible failure later
if the appendrel was involved in a join. Also, it seems that the dummy
plan generated by inheritance_planner() when all branches are excluded
has to be a bit less dummy now than was required in 8.2.
Per report from Jan Wieck. Add his test case to the regression tests.
and/or create plans for hypothetical situations; in particular, investigate
plans that would be generated using hypothetical indexes. This is a
heavily-rewritten version of the hooks proposed by Gurjeet Singh for his
Index Advisor project. In this formulation, the index advisor can be
entirely a loadable module instead of requiring a significant part to be
in the core backend, and plans can be generated for hypothetical indexes
without requiring the creation and rolling-back of system catalog entries.
The index advisor patch as-submitted is not compatible with these hooks,
but it needs significant work anyway due to other 8.2-to-8.3 planner
changes. With these hooks in the core backend, development of the advisor
can proceed as a pgfoundry project.
what a Var node refers to. This is no longer necessary because the new
flat-range-table representation of plan trees makes it relatively easy to dig
down through child plan levels to find the original reference; and to keep
doing it that way, we'd have to store joinaliasvars lists in flattened RTEs,
as demonstrated by bug report from Leszek Trenkner. This change makes
varnoold/varoattno truly just debug aids, which wasn't quite the case before.
Perhaps we should drop them, or only have them in assert-enabled builds?
in cases where a sub-SELECT inserts a WHERE clause between two outer joins,
that clause may prevent us from re-ordering the two outer joins. The code
was considering only the joins' own ON-conditions in determining reordering
safety, which is not good enough. Add a "delay_upper_joins" flag to
OuterJoinInfo to flag that we have detected such a clause and higher-level
outer joins shouldn't be permitted to commute with this one. (This might
seem overly coarse, but given the current rules for OJ reordering, it's
sufficient AFAICT.)
The failure case is actually pretty narrow: it needs a WHERE clause within
the RHS of a left join that checks the RHS of a lower left join, but is not
strict for that RHS (else we'd have simplified the lower join to a plain
join). Even then no failure will be manifest unless the planner chooses to
rearrange the join order.
Per bug report from Adam Terrey.
cheapest-startup-cost innerjoin indexscans, and make joinpath.c consider
both of these (when different) as the inside of a nestloop join. The
original design was based on the assumption that indexscan paths always
have negligible startup cost, and so total cost is the only important
figure of merit; an assumption that's obviously broken by bitmap
indexscans. This oversight could lead to choosing poor plans in cases
where fast-start behavior is more important than total cost, such as
LIMIT and IN queries. 8.1-vintage brain fade exposed by an example from
Chuck D.
is using mark/restore but not rewind or backward-scan capability. Insert a
materialize plan node between a mergejoin and its inner child if the inner
child is a sort that is expected to spill to disk. The materialize shields
the sort from the need to do mark/restore and thereby allows it to perform
its final merge pass on-the-fly; while the materialize itself is normally
cheap since it won't spill to disk unless the number of tuples with equal
key values exceeds work_mem.
Greg Stark, with some kibitzing from Tom Lane.
- Function renamed to "xpath".
- Function is now strict, per discussion.
- Return empty array in case when XPath expression detects nothing
(previously, NULL was returned in such case), per discussion.
- (bugfix) Work with fragments with prologue: select xpath('/a',
'<?xml version="1.0"?><a /><b />'); // now XML datum is always wrapped
with dummy <x>...</x>, XML prologue simply goes away (if any).
- Some cleanup.
Nikolay Samokhvalov
Some code cleanup and documentation work by myself.
WAL records that shows whether it is safe to remove full-page images
(ie, whether or not an on-line backup was in progress when the WAL entry
was made). Also make provision for an XLOG_NOOP record type that can be
used to fill in the extra space when decompressing the data for restore.
This is the portion of Koichi Suzuki's "full page writes" patch that
has to go into the core database. The remainder of that work is two
external compression and decompression programs, which for the time being
will undergo separate development on pgfoundry. Per discussion.
Also, twiddle the handling of BTREE_SPLIT records to ensure it'll be
possible to compress them (the previous coding caused essential info
to be omitted). The other commonly-used record types seem OK already,
with the possible exception of GIN and GIST WAL records, which I don't
understand well enough to opine on.
FreezeXid introduced in a recent commit, so there isn't any data loss in this
approach.
Doing it causes ALTER TABLE (or rather, the forms of it that cause a full table
rewrite) to be affected as well. In this case, the frozen point is RecentXmin,
because after the rewrite all the tuples are relabeled with the rewriting
transaction's Xid.
TOAST tables are fixed automatically as well, as fallout of the way they were
already being handled in the respective code paths.
With this patch, there is no longer need to VACUUM tables for Xid wraparound
purposes that have been cleaned up via TRUNCATE or CLUSTER.
had been taught not to do that ages ago, the SSL code was helpfully bleating
anyway. Resolves some recent reports such as bug #3266; however the
underlying cause of the related bug #2829 is still unclear.
and inet_server_addr() fail if the client connected over a "scoped" IPv6
address. In this case getnameinfo() will return a string ending with
a poorly-standardized "%something" zone specifier, which these functions
try to feed to network_in(), which won't take it. So that we don't lose
functionality altogether, suppress the zone specifier before giving the
string to network_in(). Per report from Brian Hirt.
TODO: probably someday the inet type should support scoped IPv6 addresses,
and then this patch should be reverted.
Backpatch to 8.2 ... is it worth going further?
recompute the limit/offset immediately, so that the updated values are
available when the child's ReScan function is invoked. Add a regression
test for this, too. Bug is new in HEAD (due to the bounded-sorting patch)
so no need for back-patch.
I did not do anything about merging this signaling with chgParam processing,
but if we were to do that we'd still need to compute the updated values
at this point rather than during the first ProcNode call.
Per observation and test case from Greg Stark, though I didn't use his patch.
to avoid losing useful Xid information in not-so-old tuples. This makes
CLUSTER behave the same as VACUUM as far a tuple-freezing behavior goes
(though CLUSTER does not yet advance the table's relfrozenxid).
While at it, move the actual freezing operation in rewriteheap.c to a more
appropriate place, and document it thoroughly. This part of the patch from
Tom Lane.
avoid a later needless VACUUM for Xid-wraparound purposes. We can do this
since the table is known to be left empty, so no Xid remains on it.
Per discussion.
applied to live tuples older than a recent Xmin, not to tuples that may be part
of an update chain. Those still keep their original markings.
This patch makes it possible for CLUSTER to advance relfrozenxid, thus avoiding
the need of vacuuming the table for Xid wraparound purposes. That will be
patched separately.
Patch from Heikki Linnakangas.
there's an indirect dependency on the owner via the parent table. We were
already handling indexes that way, but not toast tables for some reason.
Saves a little catalog space and cuts down the verbosity of checkSharedDependencies
reports.
ActiveSnapshot. Having it affect ActiveSnapshot only in the unusual
case of needing to replan seems a bad idea, and there's also the problem
that the created snap might be in a relatively short-lived context, as
noted by Jan Wieck. Also, there's no need to force a new snap at all
unless we are called with no snap currently set, which is an unusual
case in itself.
and only a truncated log of the objects in the current database to the client.
Also, instead of reporting object counts for all databases on which the user
might own objects, report only as many as fit in the predefined line count.
This is to avoid flooding the client when the user owns too many objects,
which could cause problems.
Per report from Ed L. on April 4th and subsequent discussion.
more completely. The motivation for having it understand IS NULL at all was
to allow use of "foo IS NULL" as one of the subsets of a partitioning on
"foo", but as reported by Aleksander Kmetec, it wasn't really getting the job
done. Backpatch to 8.2 since this is arguably a performance bug.
named foo, would work but the other ordering would not. If a user-specified
type or table name collides with an existing auto-generated array name, just
rename the array type out of the way by prepending more underscores. This
should not create any backward-compatibility issues, since the cases in which
this will happen would have failed outright in prior releases.
Also fix an oversight in the arrays-of-composites patch: ALTER TABLE RENAME
renamed the table's rowtype but not its array type.
needs to check the new constraint against columns of derived domains too.
Also, make it error out if the domain to be modified is used within any
composite-type columns. Eventually we should support that case, but it seems
a bit painful, and not suitable for a back-patch. For the moment just let the
user know we can't do it.
Backpatch to 8.2, which is the only released version that allows nested
domains. Possibly the other part should be back-patched further.
and views (but not system catalogs, nor sequences or toast tables). Get rid
of the hardwired convention that a type's array type is named exactly "_type",
instead using a new column pg_type.typarray to provide the linkage. (It still
will be named "_type", though, except in odd corner cases such as
maximum-length type names.)
Along the way, make tracking of owner and schema dependencies for types more
uniform: a type directly created by the user has these dependencies, while a
table rowtype or auto-generated array type does not have them, but depends on
its parent object instead.
David Fetter, Andrew Dunstan, Tom Lane
numerics as "oprcanhash", and make the corresponding system catalog
updates. As a result, hash indexes, hashed aggregation, and hash
joins can now be used with the numeric type. Bump the catversion.
The only tricky aspect to doing this is writing a correct hash
function: it's possible for two Numerics to be equal according to
their equality operator, but have different in-memory bit patterns.
To cope with this, the hash function doesn't consider the Numeric's
"scale" or "sign", and explictly skips any leading or trailing
zeros in the Numeric's digit buffer (the current implementation
should suppress any such zeros, but it seems unwise to rely upon
this). See discussion on pgsql-patches for more details.
and for other compilers, insert a dummy exit() call so that they understand
PG_RE_THROW() doesn't return. Insert fflush(stderr) in ExceptionalCondition,
per recent buildfarm evidence that that might not happen automatically on some
platforms. And const-ify ExceptionalCondition's declaration while at it.
need be returned. We keep a heap of the current best N tuples and sift-up
new tuples into it as we scan the input. For M input tuples this means
only about M*log(N) comparisons instead of M*log(M), not to mention a lot
less workspace when N is small --- avoiding spill-to-disk for large M
is actually the most attractive thing about it. Patch includes planner
and executor support for invoking this facility in ORDER BY ... LIMIT
queries. Greg Stark, with some editorialization by moi.
pages it intends to zero immediately. Just to show there is some use for that
function besides WAL recovery :-).
Along the way, fold _hash_checkpage and _hash_pageinit calls into _hash_getbuf
and friends, instead of expecting callers to do that separately.
ReadOrZeroBuffer to fetch pages from beyond physical EOF. This would
usually work, but would cause problems for md.c if writes occurred
beyond a segment boundary when the previous segment file hadn't been
fully extended.
from the WAL data, don't bother to physically read it; just have bufmgr.c
return a zeroed-out buffer instead. This speeds recovery significantly,
and also avoids unnecessary failures when a page-to-be-overwritten has corrupt
page headers on disk. This replaces a former kluge that accomplished the
latter by pretending zero_damaged_pages was always ON during WAL recovery;
which was OK when the kluge was put in, but is unsafe when restoring a WAL
log that was written with full_page_writes off.
Heikki Linnakangas
true at the very end of its processing, the update is broadcast via a
shared-cache-inval message for the index; without this, existing backends that
already have relcache entries for the index might never see it become valid.
Also, force a relcache inval on the index's parent table at the same time,
so that any cached plans for that table are re-planned; this ensures that
the newly valid index will be used if appropriate. Aside from making
C.I.C. behave more reasonably, this is necessary infrastructure for some
aspects of the HOT patch. Pavan Deolasee, with a little further stuff from
me.
and TimestampDifference, to make coding clearer. I think this should also fix
the failure to start workers in platforms with low resolution timers, as
reported by Itagaki Takahiro.
isn't any place to throw the error to. If so, we should treat the error
as FATAL, just as we would have if it'd been thrown outside the PG_TRY
block to begin with.
Although this is clearly a *potential* source of bugs, it is not clear
at the moment whether it is an *actual* source of bugs; there may not
presently be any PG_TRY blocks in code that can be reached with no outer
longjmp catcher. So for the moment I'm going to be conservative and not
back-patch this. The change breaks ABI for users of PG_RE_THROW and hence
might create compatibility problems for loadable modules, so we should not
put it into released branches without proof that it's needed.
wrong thing when inlining polymorphic SQL functions, because it was using the
function's declared return type where it should have used the actual result
type of the current call. In 8.1 and 8.2 this causes obvious failures even if
you don't have assertions turned on; in 8.0 and 7.4 it would only be a problem
if the inlined expression were used as an input to a function that did
run-time type determination on its inputs. Add a regression test, since this
is evidently an under-tested area.
from time_t to TimestampTz representation. This provides full gettimeofday()
resolution of the timestamps, which might be useful when attempting to
do point-in-time recovery --- previously it was not possible to specify
the stop point with sub-second resolution. But mostly this is to get
rid of TimestampTz-to-time_t conversion overhead during commit. Per my
proposal of a day or two back.
messages to the stats collector. This avoids the problem that enabling
stats_row_level for autovacuum has a significant overhead for short
read-only transactions, as noted by Arjen van der Meijden. We can avoid
an extra gettimeofday call by piggybacking on the one done for WAL-logging
xact commit or abort (although that doesn't help read-only transactions,
since they don't WAL-log anything).
In my proposal for this, I noted that we could change the WAL log entries
for commit/abort to record full TimestampTz precision, instead of only
time_t as at present. That's not done in this patch, but will be committed
separately.
We can just palloc, instead of using makeNode, when we are going to
overwrite the whole node anyway in the FLATCOPY macro. Also, use
FLATCOPY instead of copyObject for common node types Var and Const.
look through a freelist for a chunk of adequate size. For a long time
now, all elements of a given freelist have been exactly the same
allocated size, so we don't need a loop. Since the loop never iterated
more than once, you'd think this wouldn't matter much, but it makes a
noticeable savings in a simple test --- perhaps because the compiler
isn't optimizing on a mistaken assumption that the loop would repeat.
AllocSetAlloc is called often enough that saving even a couple of
instructions is worthwhile.
types of unspecified parameters when submitted via extended query protocol.
This worked in 8.2 but I had broken it during plancache changes. DECLARE
CURSOR is now treated almost exactly like a plain SELECT through parse
analysis, rewrite, and planning; only just before sending to the executor
do we divert it away to ProcessUtility. This requires a special-case check
in a number of places, but practically all of them were already special-casing
SELECT INTO, so it's not too ugly. (Maybe it would be a good idea to merge
the two by treating IntoClause as a form of utility statement? Not going to
worry about that now, though.) That approach doesn't work for EXPLAIN,
however, so for that I punted and used a klugy solution of running parse
analysis an extra time if under extended query protocol.
is in progress on the same hashtable. This seems the least invasive way to
fix the recently-recognized problem that a split could cause the scan to
visit entries twice or (with much lower probability) miss them entirely.
The only field-reported problem caused by this is the "failed to re-find
shared lock object" PANIC in COMMIT PREPARED reported by Michel Dorochevsky,
which was caused by multiply visited entries. However, it seems certain
that mdsync() is vulnerable to missing required fsync's due to missed
entries, and I am fearful that RelationCacheInitializePhase2() might be at
risk as well. Because of that and the generalized hazard presented by this
bug, back-patch all the supported branches.
Along the way, fix pg_prepared_statement() and pg_cursor() to not assume
that the hashtables they are examining will stay static between calls.
This is risky regardless of the newly noted dynahash problem, because
hash_seq_search() has never promised to cope with deletion of table entries
other than the just-returned one. There may be no bug here because the only
supported way to call these functions is via ExecMakeTableFunctionResult()
which will cycle them to completion before doing anything very interesting,
but it seems best to get rid of the assumption. This affects 8.2 and HEAD
only, since those functions weren't there earlier.
RESET SESSION, RESET PLANS, and RESET TEMP are now DISCARD ALL,
DISCARD PLANS, and DISCARD TEMP, respectively. This is to avoid
confusion with the pre-existing RESET variants: the DISCARD
commands are not actually similar to RESET. Patch from Marko
Kreen, with some minor editorialization.
(it's so nice to have a buildfarm member that actively rejects naked
uses of strcasecmp). This coding is still pretty awful, though, since
it's going to be O(N^2) in the number of guc variables. May I direct
your attention to bsearch?
are mostly excluded by constraints: do the CE test a bit earlier to save
some adjust_appendrel_attrs() work on excluded children, and arrange to
use array indexing rather than rt_fetch() to fetch RTEs in the main body
of the planner. The latter is something I'd wanted to do for awhile anyway,
but seeing list_nth_cell() as 35% of the runtime gets one's attention.
child attnums are the same, before it grovels through each and every child
column looking for a name match. Saves some time in large inheritance trees,
per example from Greg.
values: don't throw away perfectly good hash bits, and increase the shift
distances so as to provide more separation in the common case where some of
the key values are small integers (and so their hashes are too, because
hashfunc.c doesn't try all that hard). This reduces the runtime of
SearchCatCache by a factor of 4 in an example provided by Greg Stark,
in which the planner spends a whole lot of time searching the two-key
STATRELATT cache. It seems unlikely to hurt in other cases, but maybe
we could do even better?
when a relation is opened multiple times in the same transaction. This is
particularly useful for system catalogs, which we may heap_open or index_open
many times in a transaction, and it doesn't really cost anything extra even
if the rel is touched but once. Motivated by study of an example from Greg
Stark, in which pgstat_initstats() accounted for an unreasonably large
fraction of the runtime.
This is needed to allow a security-definer function to set a truly secure
value of search_path. Without it, a malicious user can use temporary objects
to execute code with the privileges of the security-definer function. Even
pushing the temp schema to the back of the search path is not quite good
enough, because a function or operator at the back of the path might still
capture control from one nearer the front due to having a more exact datatype
match. Hence, disable searching the temp schema altogether for functions and
operators.
Security: CVE-2007-2138
failed (due to lock conflicts or out-of-space). We might have already
extended the index's filesystem EOF before failing, causing the EOF to be
beyond what the metapage says is the last used page. Hence the invariant
maintained by the code needs to be "EOF is at or beyond last used page",
not "EOF is exactly the last used page". Problem was created by my patch
of 2006-11-19 that attempted to repair bug #2737. Since that was
back-patched to 7.4, this needs to be as well. Per report and test case
from Vlastimil Krejcir.
competing alternatives for indexes to use in a bitmap scan. The former
coding took estimated selectivity as an overriding factor, causing it to
sometimes choose indexes that were much slower to scan than ones with a
slightly worse selectivity. It was also too narrow-minded about which
combinations of indexes to consider ANDing. The rewrite makes it pay more
attention to index scan cost than selectivity; this seems sane since it's
impossible to have very bad selectivity with low cost, whereas the reverse
isn't true. Also, we now consider each index alone, as well as adding
each index to an AND-group led by each prior index, for a total of about
O(N^2) rather than O(N) combinations considered. This makes the results
much less dependent on the exact order in which the indexes are
considered. It's still a lot cheaper than an O(2^N) exhaustive search.
A prefilter step eliminates all but the cheapest of those indexes using
the same set of WHERE conditions, to keep the effective value of N down in
scenarios where the DBA has created lots of partially-redundant indexes.
processes to be running simultaneously. Also, now autovacuum processes do not
count towards the max_connections limit; they are counted separately from
regular processes, and are limited by the new GUC variable
autovacuum_max_workers.
The launcher now has intelligence to launch workers on each database every
autovacuum_naptime seconds, limited only on the max amount of worker slots
available.
Also, the global worker I/O utilization is limited by the vacuum cost-based
delay feature. Workers are "balanced" so that the total I/O consumption does
not exceed the established limit. This part of the patch was contributed by
ITAGAKI Takahiro.
Per discussion.
a replan. I had originally thought this was not necessary, but the new
SPI facilities create a path whereby queries planned with non-default
options can get into the cache, so it is necessary.
access to the planner's cursor-related planning options, and provide new
FETCH/MOVE routines that allow access to the full power of those commands.
Small refactoring of planner(), pg_plan_query(), and pg_plan_queries()
APIs to make it convenient to pass the planning options down from SPI.
This is the core-code portion of Pavel Stehule's patch for scrollable
cursor support in plpgsql; I'll review and apply the plpgsql changes
separately.
possibly be any useful pathkeys --- to wit, queries with neither any
join clauses nor any ORDER BY request. It's nearly free to check for
this case and it saves a useful fraction of the planning time for simple
queries.
fast flow of new fsync requests can prevent mdsync() from ever completing.
This was an unforeseen consequence of a patch added in Mar 2006 to prevent
the fsync request queue from overflowing. Problem identified by Heikki
Linnakangas and independently by ITAGAKI Takahiro; fix based on ideas from
Takahiro-san, Heikki, and Tom.
Back-patch as far as 8.1 because a previous back-patch introduced the problem
into 8.1 ...
reviewed by Neil Conway. This patch adds the following DDL command
variants: RESET SESSION, RESET TEMP, RESET PLANS, CLOSE ALL, and
DEALLOCATE ALL. RESET SESSION is intended for use by connection
pool software and the like, in order to reset a client session
to something close to its initial state.
Note that while most of these command variants can be executed
inside a transaction block (but are not transaction-aware!),
RESET SESSION cannot. While this is inconsistent, it is intended
to catch programmer mistakes: RESET SESSION in an open transaction
block is probably unintended.
(original code *always* created a full-page image for the left page, thus
leaving the intended savings unrealized), avoid risk of not having enough room
on the page during xlog restore, squeeze out another couple bytes in the xlog
record, clean up neglected comments.
index types can be reliably distinguished by examining the special space
on an index page. Per my earlier proposal, plus the realization that
there's no need for btree's vacuum cycle ID to cycle through every possible
16-bit value. Restricting its range a little costs nearly nothing and
eliminates the possibility of collisions.
Memo to self: remember to make bitmap indexes play along with this scheme,
assuming that patch ever gets accepted.
right, there seems precious little reason to have a pile of hand-maintained
endianness definitions in src/include/port/*.h. Get rid of those, and make
the couple of places that used them depend on WORDS_BIGENDIAN instead.
This commit breaks any code that assumes that the mere act of forming a tuple
(without writing it to disk) does not "toast" any fields. While all available
regression tests pass, I'm not totally sure that we've fixed every nook and
cranny, especially in contrib.
Greg Stark with some help from Tom Lane
are in their commit critical sections via flags in the ProcArray. Checkpoint
can watch the ProcArray to determine when it's safe to proceed. This is
a considerably better solution to the original problem of race conditions
between checkpoint and transaction commit: it speeds up commit, since there's
one less lock to fool with, and it prevents the problem of checkpoint being
delayed indefinitely when there's a constant flow of commits. Heikki, with
some kibitzing from Tom.
Add the latter to the values checked in pg_control, since it can't be changed
without invalidating toast table content. This commit in itself shouldn't
change any behavior, but it lays some necessary groundwork for experimentation
with these toast-control numbers.
Note: while TOAST_TUPLE_THRESHOLD can now be changed without initdb, some
thought still needs to be given to needs_toast_table() in toasting.c before
unleashing random changes.
return void ends with a SELECT, if that SELECT has a single result that is
also of type void. Without this, it's hard to write a void function that
calls another void function. Per gripe from Peter.
Back-patch as far as 8.0.
will be released by transaction abort before _bt_end_vacuum gets called.
If either of these "can't happen" errors actually happened, we'd freeze up
trying to acquire an already-held lock. Latest word is that this does
not explain Martin Pitt's trouble report, but it still looks like a bug.
its table list and then rechecks pgstat before vacuuming each table to
verify that no one has vacuumed the table in the meantime.
In the current autovacuum world this only means that a worker will not
vacuum a table that a user has vacuumed manually after the worker started.
When support for multiple autovacuum workers is introduced, this will reduce
the probability of simultaneous workers on the same database doing redundant
work.
seen by code inspecting the expression. The best way to do this seems
to be to drop the original representation as a function invocation, and
instead make a special expression node type that represents applying
the element-type coercion function to each array element. In this way
the element function is exposed and will be checked for volatility.
Per report from Guillaume Smet.
A DBA is allowed to create a language in his database if it's marked
"tmpldbacreate" in pg_pltemplate. The factory default is that this is set
for all standard trusted languages, but of course a superuser may adjust
the settings. In service of this, add the long-foreseen owner column to
pg_language; renaming, dropping, and altering owner of a PL now follow
normal ownership rules instead of being superuser-only.
Jeremy Drake, with some editorialization by Tom Lane.
reset event, namely invalidate everything. This oversight probably
explains the rare failures that some buildfarm machines have been
showing for the plancache regression test.
if possible. I had left this undone in the first pass at the API change
for ProcessUtility, but forgot to revisit it after the plancache changes
made it possible to do it.
Vadim had included this restriction in the original design of the SPI code,
but I'm darned if I can see a reason for it.
I left the macro definition of SPI_ERROR_CURSOR in place, so as not to
needlessly break any SPI callers that are checking for it, but that code
will never actually be returned anymore.
pointer" in every Snapshot struct. This allows removal of the case-by-case
tests in HeapTupleSatisfiesVisibility, which should make it a bit faster
(I didn't try any performance tests though). More importantly, we are no
longer violating portable C practices by assuming that small integers are
distinct from all pointer values, and HeapTupleSatisfiesDirty no longer
has a non-reentrant API involving side-effects on a global variable.
There were a couple of places calling HeapTupleSatisfiesXXX routines
directly rather than through the HeapTupleSatisfiesVisibility macro.
Since these places had to be changed anyway, I chose to make them go
through the macro for uniformity.
Along the way I renamed HeapTupleSatisfiesSnapshot to HeapTupleSatisfiesMVCC
to emphasize that it's only used with MVCC-type snapshots. I was sorely
tempted to rename HeapTupleSatisfiesVisibility to HeapTupleSatisfiesSnapshot,
but forebore for the moment to avoid confusion and reduce the likelihood that
this patch breaks some of the pending patches. Might want to reconsider
doing that later.
search_path that was active when the plan was first made. To do this,
improve namespace.c to support a stack of "override" search path settings
(we must have a stack since nested replan events are entirely possible).
This facility replaces the "special namespace" hack formerly used by
CREATE SCHEMA, and should be able to support per-function search path
settings as well.
of a multi-statement simple-Query message. This bug goes all the way
back, but unfortunately is not nearly so easy to fix in existing releases;
it is only the recent ProcessUtility API change that makes it fixable in
HEAD. Per report from William Garrison.
doesn't exist. This allows DROP to be used to clean out the pg_tablespace
catalog entry in a situation where a previous DROP attempt failed before
committing but after having removed the directories and symlink.
Per report from William Garrison. Even though his test case depends on an
unrelated bug in PreventTransactionChain, it's certainly possible for this
situation to arise due to other problems, eg a system crash at just the
right time.
First, genericcostestimate() was being way too liberal about including
partial-index conditions in its selectivity estimate, resulting in
substantial underestimates for situations such as an indexqual "x = 42"
used with an index on x "WHERE x >= 40 AND x < 50". While the code is
intentionally set up to favor selecting partial indexes when available,
this was too much...
Second, choose_bitmap_and() was likewise easily fooled by cases of this
type, since it would similarly think that the partial index had selectivity
independent of the indexqual.
Fixed by using predicate_implied_by() rather than simple equality checks
to determine redundancy. This is a good deal more expensive but I don't
see much alternative. At least the extra cost is only paid when there's
actually a partial index under consideration.
Per report from Jeff Davis. I'm not going to risk back-patching this,
though.
and regexp_split_to_table. These functions provide access to the
capture groups resulting from a POSIX regular expression match,
and provide the ability to split a string on a POSIX regular
expression, respectively. Patch from Jeremy Drake; code review by
Neil Conway, additional comments and suggestions from Tom and
Peter E.
This patch bumps the catversion, adds some regression tests,
and updates the docs.
rules to be defined with different, per session controllable, behaviors
for replication purposes.
This will allow replication systems like Slony-I and, as has been stated
on pgsql-hackers, other products to control the firing mechanism of
triggers and rewrite rules without modifying the system catalog directly.
The firing mechanisms are controlled by a new superuser-only GUC
variable, session_replication_role, together with a change to
pg_trigger.tgenabled and a new column pg_rewrite.ev_enabled. Both
columns are a single char data type now (tgenabled was a bool before).
The possible values in these attributes are:
'O' - Trigger/Rule fires when session_replication_role is "origin"
(default) or "local". This is the default behavior.
'D' - Trigger/Rule is disabled and fires never
'A' - Trigger/Rule fires always regardless of the setting of
session_replication_role
'R' - Trigger/Rule fires when session_replication_role is "replica"
The GUC variable can only be changed as long as the system does not have
any cached query plans. This will prevent changing the session role and
accidentally executing stored procedures or functions that have plans
cached that expand to the wrong query set due to differences in the rule
firing semantics.
The SQL syntax for changing a triggers/rules firing semantics is
ALTER TABLE <tabname> <when> TRIGGER|RULE <name>;
<when> ::= ENABLE | ENABLE ALWAYS | ENABLE REPLICA | DISABLE
psql's \d command as well as pg_dump are extended in a backward
compatible fashion.
Jan
executed in read_only mode. This could lead to various relatively-subtle
failures, such as an allegedly stable function returning non-stable results.
Bug goes all the way back to the introduction of read-only mode in 8.0.
Per report from Gaetano Mendola.
available information about the typmod of an expression; namely, Const,
ArrayRef, ArrayExpr, and EXPR and ARRAY SubLinks. In the ArrayExpr and
SubLink cases it wasn't really the data structure's fault, but exprTypmod()
being lazy. This seems like a good idea in view of the expected increase in
typmod usage from Teodor's work to allow user-defined types to have typmods.
In particular this responds to the concerns we had about eliminating the
special-purpose hack that exprTypmod() used to have for BPCHAR Consts.
We can now tell whether or not such a Const has been cast to a specific
length, and report or display properly if so.
initdb forced due to changes in stored rules.
uses SPI plans, this finally fixes the ancient gotcha that you can't
drop and recreate a temp table used by a plpgsql function.
Along the way, clean up SPI's API a little bit by declaring SPI plan
pointers as "SPIPlanPtr" instead of "void *". This is cosmetic but
helps to forestall simple programming mistakes. (I have changed some
but not all of the callers to match; there are still some "void *"'s
in contrib and the PL's. This is intentional so that we can see if
anyone's compiler complains about it.)
did not expect that a DEAD tuple could follow a RECENTLY_DEAD tuple in an
update chain, but because the OldestXmin rule for determining deadness is a
simplification of reality, it is possible for this situation to occur
(implying that the RECENTLY_DEAD tuple is in fact dead to all observers,
but this patch does not attempt to exploit that). The code would follow a
chain forward all the way, but then stop before a DEAD tuple when backing
up, meaning that not all of the chain got moved. This could lead to copying
the chain multiple times (resulting in duplicate copies of the live tuple at
its end), or leaving dangling index entries behind (which, aside from
generating warnings from later vacuums, creates a risk of wrong query
results or bogus duplicate-key errors once the heap slot the index entry
points to is repopulated).
The fix is to recheck HeapTupleSatisfiesVacuum while following a chain
forward, and to stop if a DEAD tuple is reached. Each contiguous group
of RECENTLY_DEAD tuples will therefore be copied as a separate chain.
The patch also adds a couple of extra sanity checks to verify correct
behavior.
Per report and test case from Pavan Deolasee.
module and teach PREPARE and protocol-level prepared statements to use it.
In service of this, rearrange utility-statement processing so that parse
analysis does not assume table schemas can't change before execution for
utility statements (necessary because we don't attempt to re-acquire locks
for utility statements when reusing a stored plan). This requires some
refactoring of the ProcessUtility API, but it ends up cleaner anyway,
for instance we can get rid of the QueryContext global.
Still to do: fix up SPI and related code to use the plan cache; I'm tempted to
try to make SQL functions use it too. Also, there are at least some aspects
of system state that we want to ensure remain the same during a replan as in
the original processing; search_path certainly ought to behave that way for
instance, and perhaps there are others.
even if none of the fields in the pg_class row change. This behavior is
necessary to ensure other backends flush rd_targblock values that might
point to truncated-away pages. We got this right pre-8.2 but it was broken
by overoptimistic change to not write out the pg_class row if unchanged.
Per report from Pavan Deolasee.
check_sql_fn_retval allows binary-compatibility cases, the expression
extracted from an inline-able SQL function might have a type that is only
binary-compatible with the declared function result type. To avoid possibly
changing the semantics of the expression, we should insert a RelabelType node
in such cases. This has only been shown to have bad consequences in recent
8.1 and up releases, but I suspect there may be failure cases in the older
branches too, so patch it all the way back. Per bug #3116 from Greg Mullane.
Along the way, fix an omission in eval_const_expressions_mutator: it failed
to copy the relabelformat field when processing a RelabelType. No known
observable failures from this, but it definitely isn't intended behavior.
fixup various places in the tree that were clearing a StringInfo by hand.
Making this function a part of the API simplifies client code slightly,
and avoids needlessly peeking inside the StringInfo interface.
log_min_messages does; and arrange to suppress the duplicative output
that would otherwise result from log_statement and log_duration messages.
Bruce Momjian and Tom Lane.
this, add a 16-bit "flags" field to page headers by stealing some bits from
pd_tli. We use one flag bit as a hint to indicate whether there are any
unused line pointers; the remaining 15 are available for future use.
This is a cut-down form of an idea proposed by Hiroki Kataoka in July 2005.
At the time it was rejected because the original patch increased the size of
page headers and it wasn't clear that the benefit outweighed the distributed
cost. The flag-bit approach gets most of the benefit without requiring an
increase in the page header size.
Heikki Linnakangas and Tom Lane
Get rid of VARATT_SIZE and VARATT_DATA, which were simply redundant with
VARSIZE and VARDATA, and as a consequence almost no code was using the
longer names. Rename the length fields of struct varlena and various
derived structures to catch anyplace that was accessing them directly;
and clean up various places so caught. In itself this patch doesn't
change any behavior at all, but it is necessary infrastructure if we hope
to play any games with the representation of varlena headers.
Greg Stark and Tom Lane
parent query's EState. Now that there's a single flat rangetable for both
the main plan and subplans, there's no need anymore for a separate EState,
and removing it allows cleaning up some crufty code in nodeSubplan.c and
nodeSubqueryscan.c. Should be a tad faster too, although any difference
will probably be hard to measure. This is the last bit of subsidiary
mop-up work from changing to a flat rangetable.
is still needed despite cleanups in setrefs.c, because the point is to
let the inserted Result node compute a different tlist than its input
node does. Per example from Jeremy Drake.
drill down into subplan targetlists to print the referent expression for an
OUTER or INNER var in an upper plan node. Hence, make it do that always, and
banish the old hack of showing "?columnN?" when things got too complicated.
Along the way, fix an EXPLAIN bug I introduced by suppressing subqueries from
execution-time range tables: get_name_for_var_field() assumed it could look at
rte->subquery to find out the real type of a RECORD var. That doesn't work
anymore, but instead we can look at the input plan of the SubqueryScan plan
node.
and quals have varno OUTER, rather than zero, to indicate a reference to
an output of their lefttree subplan. This is consistent with the way
that every other upper-level node type does it, and allows some simplifications
in setrefs.c and EXPLAIN.
useless substructure for its RangeTblEntry nodes. (I chose to keep using the
same struct node type and just zero out the link fields for unneeded info,
rather than making a separate ExecRangeTblEntry type --- it seemed too
fragile to have two different rangetable representations.)
Along the way, put subplans into a list in the toplevel PlannedStmt node,
and have SubPlan nodes refer to them by list index instead of direct pointers.
Vadim wanted to do that years ago, but I never understood what he was on about
until now. It makes things a *whole* lot more robust, because we can stop
worrying about duplicate processing of subplans during expression tree
traversals. That's been a constant source of bugs, and it's finally gone.
There are some consequent simplifications yet to be made, like not using
a separate EState for subplans in the executor, but I'll tackle that later.
I refactored findsplitloc and checksplitloc so that the division of
labor is more clear IMO. I pushed all the space calculation inside the
loop to checksplitloc.
I also fixed the off by 4 in free space calculation caused by
PageGetFreeSpace subtracting sizeof(ItemIdData), even though it was
harmless, because it was distracting and I felt it might come back to
bite us in the future if we change the page layout or alignments.
There's now a new function PageGetExactFreeSpace that doesn't do the
subtraction.
findsplitloc now tries the "just the new item to right page" split as
well. If people don't like the refactoring, I can write a patch to just
add that.
Heikki Linnakangas
storing mostly-redundant Query trees in prepared statements, portals, etc.
To replace Query, a new node type called PlannedStmt is inserted by the
planner at the top of a completed plan tree; this carries just the fields of
Query that are still needed at runtime. The statement lists kept in portals
etc. now consist of intermixed PlannedStmt and bare utility-statement nodes
--- no Query. This incidentally allows us to remove some fields from Query
and Plan nodes that shouldn't have been there in the first place.
Still to do: simplify the execution-time range table; at the moment the
range table passed to the executor still contains Query trees for subqueries.
initdb forced due to change of stored rules.
this code was last gone over, there wasn't really any alternative to
globals because we didn't have the PlannerInfo struct being passed all
through the planner code. Now that we do, we can restructure things
to avoid non-reentrancy. I'm fooling with this because otherwise I'd
have had to add another global variable for the planned compact
range table list.
plan nodes, so that the executor does not need to get these items from
the range table at runtime. This will avoid needing to include these
fields in the compact range table I'm expecting to make the executor use.
portals using PORTAL_UTIL_SELECT strategy. This is currently significant only
for FETCH queries, which are supposed to include a count in the tag. Seems
it's been broken since 7.4, but nobody noticed before Knut Lehre.
equal functions are checked for raw parse trees as well as post-analysis
trees. This was never very important before, but the upcoming plan cache
control module will need to be able to do copyObject() on raw parse trees.
forces a particular relation nonnullable, then we can say that the OR does.
This is worth a little extra trouble since it may allow reduction of
outer joins to plain joins.
an opclass for a generic type such as ANYARRAY. The original coding failed
to check that PK and FK columns were of the same array type. Per discussion
with Tom Dunstan. Also, make the code a shade more readable by not trying
to economize on variables.
JOIN quals, just like WHERE quals, even if they reference every one of the
join's relations. Now that we can reorder outer and inner joins, it's
possible for such a qual to end up being assigned to an outer join plan node,
and we mustn't have it treated as a join qual rather than a filter qual for
the node. (If it were, the join could produce null-extended rows that it
shouldn't.) Per bug report from Pelle Johansson.
be checked at plan levels below the top; namely, we have to allow for Result
nodes inserted just above a nestloop inner indexscan. Should think about
using the general Param mechanism to pass down outer-relation variables, but
for the moment we need a back-patchable solution. Per report from Phil Frost.
to_timestamp():
- ID for day-of-week
- IDDD for day-of-year
This makes it possible to convert ISO week dates to and from text
fully represented in either week ('IYYY-IW-ID') or day-of-year
('IYYY-IDDD') format.
I have also added an 'isoyear' field for use with extract / date_part.
Brendan Jurd
o read global SSL configuration file
o add GUC "ssl_ciphers" to control allowed ciphers
o add libpq environment variable PGSSLKEY to control SSL hardware keys
Victor B. Wagner
considered when it is necessary to do so because of a join-order restriction
(that is, an outer-join or IN-subselect construct). The former coding was a
bit ad-hoc and inconsistent, and it missed some cases, as exposed by Mario
Weilguni's recent bug report. His specific problem was that an IN could be
turned into a "clauseless" join due to constant-propagation removing the IN's
joinclause, and if the IN's subselect involved more than one relation and
there was more than one such IN linking to the same upper relation, then the
only valid join orders involve "bushy" plans but we would fail to consider the
specific paths needed to get there. (See the example case added to the join
regression test.) On examining the code I wonder if there weren't some other
problem cases too; in particular it seems that GEQO was defending against a
different set of corner cases than the main planner was. There was also an
efficiency problem, in that when we did realize we needed a clauseless join
because of an IN, we'd consider clauseless joins against every other relation
whether this was sensible or not. It seems a better design is to use the
outer-join and in-clause lists as a backup heuristic, just as the rule of
joining only where there are joinclauses is a heuristic: we'll join two
relations if they have a usable joinclause *or* this might be necessary to
satisfy an outer-join or IN-clause join order restriction. I refactored the
code to have just one place considering this instead of three, and made sure
that it covered all the cases that any of them had been considering.
Backpatch as far as 8.1 (which has only the IN-clause form of the disease).
By rights 8.0 and 7.4 should have the bug too, but they accidentally fail
to fail, because the joininfo structure used in those releases preserves some
memory of there having once been a joinclause between the inner and outer
sides of an IN, and so it leads the code in the right direction anyway.
I'll be conservative and not touch them.
continuously, and requests vacuum runs of "autovacuum workers" to postmaster.
The workers do the actual vacuum work. This allows for future improvements,
like allowing multiple autovacuum jobs running in parallel.
For now, the code keeps the original behavior of having a single autovac
process at any time by sleeping until the previous worker has finished.
WHERE clauses. createplan.c is now willing to stick a gating Result node
almost anywhere in the plan tree, and in particular one can wind up directly
underneath a MergeJoin node. This means it had better be willing to handle
Mark/Restore. Fortunately, that's trivial in such cases, since we can just
pass off the call to the input node (which the planner has previously ensured
can handle Mark/Restore). Per report from Phil Frost.
equality checks it applies, instead of a random dependence on whatever
operators might be named "=". The equality operators will now be selected
from the opfamily of the unique index that the FK constraint depends on to
enforce uniqueness of the referenced columns; therefore they are certain to be
consistent with that index's notion of equality. Among other things this
should fix the problem noted awhile back that pg_dump may fail for foreign-key
constraints on user-defined types when the required operators aren't in the
search path. This also means that the former warning condition about "foreign
key constraint will require costly sequential scans" is gone: if the
comparison condition isn't indexable then we'll reject the constraint
entirely. All per past discussions.
Along the way, make the RI triggers look into pg_constraint for their
information, instead of using pg_trigger.tgargs; and get rid of the always
error-prone fixed-size string buffers in ri_triggers.c in favor of building up
the RI queries in StringInfo buffers.
initdb forced due to columns added to pg_constraint and pg_trigger.
socket is still read-ready, the code was a tight loop, wasting lots of CPU.
We can't do anything to clear the failure, other than wait, but we should give
other processes more chance to finish and release FDs; so insert a small sleep.
Also, avoid bogus "close(-1)" in this case. Per report from Jim Nasby.
that overlap an outer join's min_righthand but aren't fully contained in it,
to support joining within the RHS after having performed an outer join that
can commute with this one. Aside from the direct fix in make_join_rel(),
fix has_join_restriction() and GEQO's desirable_join() to consider this
possibility. Per report from Ian Harding.