heap_form_tuple. Since this removes the last remaining caller of
heap_addheader, remove it.
Extracted from the column privileges patch from Stephen Frost, with further
code cleanups by me.
anyelement. This lacks the WITH ORDINALITY option, as well as the multiple
input arrays option added in the most recent SQL specs. But it's still a
pretty useful subset of the spec's functionality, and it is enough to
allow obsoleting contrib/intagg.
and thereby in the pg_timezone_names view. Although we allow such zones
to be used in certain limited contexts like AT TIME ZONE, we don't allow
them in SET TIME ZONE, and bug #4528 shows that they're more likely to
confuse users than do anything useful. So hide 'em. (Note that we don't
even generate these zones when installing our own timezone database.
But they are likely to be present when using a system-provided database.)
for inserting tuples in increasing TID order. It's not clear whether this
fully explains Ivan Sergio Borgonovo's complaint, but simple testing
confirms that a scan that doesn't start at block 0 can slow GIN build by
a factor of three or four.
Backpatch to 8.3. Sync scan didn't exist before that.
- FloatOnly: only used by NumericOnly, instead put the FloatOnly production into NumericOnly
- IntegerOnly: only used by NumericOnly and one ALTER TABLE rule, replacement SignedIconst is already used in several other places
operator. The result depends only on the two input operators and the proof
direction (imply or refute), so it's easy to cache. This provides a very
large savings in cases such as Sergey Konoplev's long NOT-IN-list example,
where predtest spends all its time repeatedly figuring out that the same pair
of operators cannot be used to prove anything. (But of course the O(N^2)
behavior still catches up with you eventually.) I'm not convinced it buys
a whole lot when constraint_exclusion isn't turned on, but it's not a lot
of added code so we might as well cache all the time.
AND, OR, or equivalent clauses: if there are too many (more than 100) just
exit without proving anything. This ensures that we don't spend O(N^2) time
trying (and most likely failing) to prove anything about very long IN lists
and similar cases.
Also, install a couple of CHECK_FOR_INTERRUPTS calls to ensure that a long
proof attempt can be interrupted.
Per gripe from Sergey Konoplev.
Back-patch the whole patch to 8.2 and just the CHECK_FOR_INTERRUPTS addition
to 8.1. (The rest of the patch doesn't apply cleanly, and since 8.1 doesn't
show the complained-of behavior anyway, it doesn't seem necessary to work
hard on it.)
function as a special case.
This version still has the suspicious behavior of returning null for an
empty array (rather than zero), but this may need a wholesale revision of
empty array behavior, currently under discussion.
Jim Nasby, Robert Haas, Peter Eisentraut
in "postgres_verbose" intervalstyle, and the equally arbitrary decision to
show at least two fractional-seconds digits in most other datetime display
styles. This results in some minor changes in the expected regression test
outputs.
Also, coalesce a lot of repetitive code in datetime.c into subroutines,
for clarity and ease of maintenance. In particular this roughly halves
the number of #ifdef HAVE_INT64_TIMESTAMP segments.
Ron Mayer, with some additional kibitzing from Tom Lane
translated_vars list get updated when pulling up an appendrel member. It's
not clear that this really matters at present, since relatively little gets
done with the outputs of an appendrel child relation; but it probably will
come back to bite us sometime if we leave them with the wrong values.
we extended the appendrel mechanism to support UNION ALL optimization. The
reason nobody noticed was that we are not actually using attr_needed data for
appendrel children; hence it seems more reasonable to rip it out than fix it.
Back-patch to 8.2 because an Assert failure is possible in corner cases.
Per examination of an example from Jim Nasby.
In HEAD, also get rid of AppendRelInfo.col_mappings, which is quite inadequate
to represent UNION ALL situations; depend entirely on translated_vars instead.
up a SSL connection, but psql is compiled without support for it.
Not a really realistic use-case, but the patch also cuts down on
the number of places with #ifdef's...
"base/11517/3767_fsm", instead of symbolic names like "1663/11517/3767/1",
per Alvaro's suggestion. I didn't change the messages in the higher-level
index, heap and FSM routines, though, where the fork is implicit.
specifically, we can input either the "format with designators" or the
"alternative format", and we can output the former when IntervalStyle is set
to iso_8601.
Ron Mayer
the length of a UTF8 character with pg_mblen (wrong if DB encoding isn't
UTF8), and the latter was blithely assuming that a static buffer would somehow
revert to all zeroes for each use.
different locales. This is just syntactical sweetener over --lc-collate and
--lc-ctype. Per discussion.
While at it, properly document --lc-ctype and --lc-collate in SGML docs,
which apparently were forgotten (or purposefully ommited?) when they were
created.
("there might be triggers") rather than an exact count. This is necessary
catalog infrastructure for the upcoming patch to reduce the strength of
locking needed for trigger addition/removal. Split out and committed
separately for ease of reviewing/testing.
In passing, also get rid of the unused pg_class columns relukeys, relfkeys,
and relrefs, which haven't been maintained in many years and now have no
chance of ever being maintained (because of wishing to avoid locking).
Simon Riggs
from DateStyle, and create a new interval style that produces output matching
the SQL standard (at least for interval values that fall within the standard's
restrictions). IntervalStyle is also used to resolve the conflict between the
standard and traditional Postgres rules for interpreting negative interval
input.
Ron Mayer
(but not locked, as that would risk deadlocks). Also, make it work in a small
ring of buffers to avoid having bulk inserts trash the whole buffer arena.
Robert Haas, after an idea of Simon Riggs'.
data type. This patch takes the approach of allowing an optional hyphen after
each group of four hex digits.
Author: Robert Haas <robertmhaas@gmail.com>
from COMMITTED to SUBCOMMITTED during recovery. This wasn't previously
possible, but it is now due to the recent changes on clog commit protocol for
subtransactions.
Simon Riggs
to dump sequence values cope with sequences outside the search path and/or
having names that need quoting. No back-patch needed because these are new
problems in 8.4.
Kris Jurka (also a little bit of code beautification by tgl)
upon requests from backends, rather than on a fixed 500msec cycle. (There's
still throttling logic to ensure it writes no more often than once per
500msec, though.) This should result in a significant reduction in stats file
write traffic in typical scenarios where the stats are demanded only
infrequently.
This approach also means that the former difficulty with changing
stats_temp_directory on-the-fly has gone away, so remove the caution about
that as well as the thrashing we did to minimize the trouble window.
In passing, also fix pgstat_report_stat() so that we will send a stats
message if we have function call stats but not table stats to report;
this fixes a bug in the recent patch to support function-call stats.
Martin Pihlak
allowed different processes to have different addresses for the shmem segment
in quite a long time, but there were still a few places left that used the
old coding convention. Clean them up to reduce confusion and improve the
compiler's ability to detect pointer type mismatches.
Kris Jurka
and heap_deformtuple in favor of the newer functions heap_form_tuple et al
(which do the same things but use bool control flags instead of arbitrary
char values). Eliminate the former duplicate coding of these functions,
reducing the deprecated functions to mere wrappers around the newer ones.
We can't get rid of them entirely because add-on modules probably still
contain many instances of the old coding style.
Kris Jurka
it just return void instead of sometimes returning a TupleTableSlot. SQL
functions don't need that anymore, and noplace else does either. Eliminating
the return value also means one less hassle for the ExecutorRun hook functions
that will be supported beginning in 8.4.
on non-full-page-image WAL records, and quite arbitrarily, only if there's
less than 20% free space on the page after the insert/update (not on HOT
updates, though). The 20% cutoff should avoid most of the overhead, when
replaying a bulk insertion, for example, while ensuring that pages that
are full are marked as full in the FSM.
This is mostly to avoid the nasty worst case scenario, where you replay
from a PITR archive, and the FSM information in the base backup is really
out of date. If there was a lot of pages that the outdated FSM claims to
have free space, but don't actually have any, the first unlucky inserter
after the recovery would traverse through all those pages, just to find
out that they're full. We didn't have this problem with the old FSM
implementation, because we simply threw the FSM information away on a
non-clean shutdown.
RETURNING clause, not just a SELECT as formerly.
A side effect of this patch is that when a set-returning SQL function is used
in a FROM clause, performance is improved because the output is collected into
a tuplestore within the function, rather than using the less efficient
value-per-call mechanism.
functions into one ReadBufferExtended function, that takes the strategy
and mode as argument. There's three modes, RBM_NORMAL which is the default
used by plain ReadBuffer(), RBM_ZERO, which replaces ZeroOrReadBuffer, and
a new mode RBM_ZERO_ON_ERROR, which allows callers to read corrupt pages
without throwing an error. The FSM needs the new mode to recover from
corrupt pages, which could happend if we crash after extending an FSM file,
and the new page is "torn".
Add fork number to some error messages in bufmgr.c, that still lacked it.
in the Global\ namespace, because it caused permission errors on
a lot of platforms.
We need to come up with something better for 8.4, but for now
revert to the pre-8.3.4 behaviour.
This basically takes some build system code that was previously labeled
"Solaris" and ties it to the compiler rather than the operating system.
Author: Julius Stroffek <Julius.Stroffek@Sun.COM>
backwards scan could actually happen. In particular, pass a flag to
materialize-mode SRFs that tells them whether they need to require random
access. In passing, also suppress unneeded backward-scan overhead for a
Portal's holdStore tuplestore. Per my proposal about reducing I/O costs for
tuplestores.
via a tuplestore instead of value-per-call. Refactor a few things to reduce
ensuing code duplication with nodeFunctionscan.c. This represents the
reasonably noncontroversial part of my proposed patch to switch SQL functions
over to returning tuplestores. For the moment, SQL functions still do things
the old way. However, this change enables PL SRFs to be called in targetlists
(observe changes in plperl regression results).
didn't actually work, because nodeRecursiveunion.c creates the underlying
tuplestore with backward scan disabled; which is a decision that we shouldn't
reverse because of performance cost. We could imagine adding signaling from
WorkTableScan to RecursiveUnion about whether backward scan is needed ...
but in practice it'd be a waste of effort, because there simply isn't any
current or plausible future scenario where WorkTableScan would be called on
to scan backward. So just dike out the code that claims to support it.
written to temp files by tuplesort.c and tuplestore.c. This saves 2 bytes per
row for 32-bit machines, and 6 bytes per row for 64-bit machines, which seems
worth the slight additional uglification of the tuple read/write routines.
recursion when we are unable to convert a localized error message to the
client's encoding. We've been over this ground before, but as reported by
Ibrar Ahmed, it still didn't work in the case of conversion failures for
the conversion-failure message itself :-(. Fix by installing a "circuit
breaker" that disables attempts to localize this message once we get into
recursion trouble.
Patch all supported branches, because it is in fact broken in all of them;
though I had to add some missing translations to the older branches in
order to expose the failure in the particular test case I was using.
after each other (since we already add a newline on each, this makes them
multiline).
Previously a new error would just overwrite the old one, so for example any
error caused when trying to connect with SSL enabled would be overwritten
by the error message form the non-SSL connection when using sslmode=prefer.
treat Var and non-Var IN-list items differently. Only non-Var items are
candidates to go into an ANY(ARRAY) construct --- we put all Vars as separate
OR conditions on the grounds that that leaves more scope for optimization.
Per suggestion from Robert Haas.
into an OR of equality comparisons, rather than x = ANY(ARRAY[...]), when there
are Vars in the right-hand side. This avoids a performance regression compared
to pre-8.2 releases, in cases where the OR form can be optimized into scans
of multiple indexes. Limit the possible downside by preferring this form only
when the list isn't very long (I set the cutoff at 32 elements, which is a
bit arbitrary but in the right ballpark). Per discussion with Jim Nasby.
In passing, also make it try the OR form if it cannot select a common type
for the array elements; we've seen a complaint or two about how the OR form
worked for such cases and ARRAY doesn't.
recent proposal. In typical cases, we now need 12 bytes per insert or delete
event and 16 bytes per update event; previously we needed 40 bytes per
event on 32-bit hardware and 80 bytes per event on 64-bit hardware. Even
in the worst case usage pattern with a large number of distinct triggers being
fired in one query, usage is at most 32 bytes per event. It seems to be a
bit faster than the old code as well, due to reduction of palloc overhead.
This commit doesn't address the TODO item of allowing the event list to spill
to disk; rather it's trying to stave off the need for that. However, it
probably makes that task a bit easier by reducing the data structure's
dependency on pointers. It would now be practical to dump an event list to
disk by "chunks" instead of individual events.
of copy/paste:d emails. Much of the contents had already been migrated
into the main documentation, some was out of date and some just plain
wrong.
Keep the "protocol-flowchart" which can still be useful.
in their targetlists had better reset ps_TupFromTlist during ReScan calls.
There's no need to back-patch here since nodeAgg and nodeGroup didn't
even pretend to support SRFs in prior releases.
* make LDAP use this instead of the hacky previous method to specify
the DN to bind as
* make all auth options behave the same when they are not compiled
into the server
* rename "ident maps" to "user name maps", and support them for all
auth methods that provide an external username
This makes a backwards incompatible change in the format of pg_hba.conf
for the ident, PAM and LDAP authentication methods.
inputs is unique or nearly so), make eqjoinsel() clamp the ndistinct estimates
to be not more than the estimated number of rows coming from the input
relations. This allows the estimate to change in response to the selectivity
of restriction conditions on the inputs.
This is a pretty narrow patch and maybe we should be more aggressive about
similarly clamping ndistinct in other cases; but I'm worried about
double-counting the effects of the restriction conditions. However, it seems
to help for the case exhibited by Grzegorz Jaskiewicz (antijoin against a
small subset of a relation), so let's try this for awhile.
until vars are distributed to rels during query_planner() startup. We don't
really need it before that, and not building it early has some advantages.
First, we don't need to put it through the various preprocessing steps, which
saves some cycles and eliminates the need for a number of routines to support
PlaceHolderInfo nodes at all. Second, this means one less unused plan for any
sub-SELECT appearing in a placeholder's expression, since we don't build
placeholder_list until after sublink expansion is complete.
correctly set. As result, killtuple() marks as dead
wrong tuple on page. Bug was introduced by me while fixing
possible duplicates during GiST index scan.
that represent some expression that we desire to compute below the top level
of the plan, and then let that value "bubble up" as though it were a plain
Var (ie, a column value).
The immediate application is to allow sub-selects to be flattened even when
they are below an outer join and have non-nullable output expressions.
Formerly we couldn't flatten because such an expression wouldn't properly
go to NULL when evaluated above the outer join. Now, we wrap it in a
PlaceHolderVar and arrange for the actual evaluation to occur below the outer
join. When the resulting Var bubbles up through the join, it will be set to
NULL if necessary, yielding the correct results. This fixes a planner
limitation that's existed since 7.1.
In future we might want to use this mechanism to re-introduce some form of
Hellerstein's "expensive functions" optimization, ie place the evaluation of
an expensive function at the most suitable point in the plan tree.
This patch eliminates the marking of subtransactions as SUBCOMMITTED in pg_clog
during their commit; instead they remain in-progress until main transaction
commit. At main transaction commit, the commit protocol is atomic-by-page
instead of one transaction at a time. To avoid a race condition with some
subtransactions appearing committed before others in the case where they span
more than one pg_clog page, we conserve the logic that marks them subcommitted
before marking the parent committed.
Simon Riggs with minor help from me
scanning; GiST and GIN do not, and it seems like too much trouble to make
them do so. By teaching ExecSupportsBackwardScan() about this restriction,
we ensure that the planner will protect a scroll cursor from the problem
by adding a Materialize node.
In passing, fix another longstanding bug in the same area: backwards scan of
a plan with set-returning functions in the targetlist did not work either,
since the TupFromTlist expansion code pays no attention to direction (and
has no way to run a SRF backwards anyway). Again the fix is to make
ExecSupportsBackwardScan check this restriction.
Also adjust the index AM API specification to note that mark/restore support
is unnecessary if the AM can't produce ordered output.
set_rel_width(). The code had been catering for the possibility of different
varnos in the relation targetlist, but this is impossible for a base relation
(and if it were possible, putting all the widths in the same RelOptInfo would
be wrong anyway).
is NULL but SK_SEARCHNULL is not set. Add checking IS NULL of keys
to set during key initialization. If key is NULL and SK_SEARCHNULL is not
set then nothnig can be satisfied.
With assert-enabled compilation that causes coredump.
Bug was introduced in 8.3 by support of IS NULL index scan.
In the previous coding, the list of columns that needed to be hashed on
was allocated in the per-query context, but we reallocated every time
the Agg node was rescanned. Since this information doesn't change over
a rescan, just construct the list of columns once during ExecInitAgg().
according to the TupleDesc's natts, not the number of physical columns in the
tuple. The previous coding would do the wrong thing in cases where natts is
different from the tuple's column count: either incorrectly report error when
it should just treat the column as null, or actually crash due to indexing off
the end of the TupleDesc's attribute array. (The second case is probably not
possible in modern PG versions, due to more careful handling of inheritance
cases than we once had. But it's still a clear lack of robustness here.)
The incorrect error indication is ignored by all callers within the core PG
distribution, so this bug has no symptoms visible within the core code, but
it might well be an issue for add-on packages. So patch all the way back.
lengthof(SysAtt) not FirstLowInvalidHeapAttributeNumber, for consistency with
the other uses of the SysAtt array, and to make it clearer that it doesn't
walk off the end of that array.
Formerly, the lack of any opclasses that could accept such data was enough
of a defense, but now with a "record" opclass we need to check more carefully.
(You can still use that opclass for an index, but you have to store a named
composite type not an anonymous one.)
the timestamp types. Turns out this doesn't even reduce the available
range of dates, since the restriction to dates that work for Julian-date
arithmetic is much tighter than the int32 range anyway. Per a longstanding
TODO item.
depth-first search order. Upon close reading of SQL:2008, it seems that the
spec's SEARCH DEPTH FIRST and SEARCH BREADTH FIRST options do not actually
guarantee any particular result order: what they do is provide a constructed
column that the user can then sort on in the outer query. So this is actually
just as much functionality ...
pseudo-type record[] to represent arrays of possibly-anonymous composite
types. Since composite datums carry their own type identification, no
extra knowledge is needed at the array level.
The main reason for doing this right now is that it is necessary to support
the general case of detection of cycles in recursive queries: if you need to
compare more than one column to detect a cycle, you need to compare a ROW()
to an array built from ROW()s, at least if you want to do it as the spec
suggests. Add some documentation and regression tests concerning the cycle
detection issue.
RecursiveUnion to which it refers. It turns out that we can just postpone the
relevant initialization steps until the first exec call for the node, by which
time the ancestor node must surely be initialized. Per report from Greg Stark.
the isTrigger state explicitly, not rely on nonzero-ness of trigrelOid
to indicate trigger-hood, because trigrelOid will be left zero when compiling
for validation. The (useless) function hash entry built by the validator
was able to match an ordinary non-trigger call later in the same session,
thereby bypassing the check that is supposed to prevent such a call.
Per report from Alvaro.
It might be worth suppressing the useless hash entry altogether, but
that's a bigger change than I want to consider back-patching.
Back-patch to 8.0. 7.4 doesn't have the problem because it doesn't
have validation mode.
to process any pending unlinks for the source database.
Before, if you dropped a relation in the template database just before
CREATE DATABASE, and a checkpoint happened during copydir(), the checkpoint
might delete a file that we're just about to copy, causing lstat() in
copydir() to fail with ENOENT.
Backpatch to 8.3, where the pending unlinks were introduced.
Per report by Matthew Wakeling and analysis by Tom Lane.
of referencing a WITH item that's not yet in scope according to the SQL
spec's semantics. This seems to be an easy error to make, and the bare
"relation doesn't exist" message doesn't lead one's mind in the correct
direction to fix it.
implementation uses an in-memory hash table, so it will poop out for very
large recursive results ... but the performance characteristics of a
sort-based implementation would be pretty unpleasant too.
use the old relfilenode in the new tablespace. There might be another relation
in the new tablespace with the same relfilenode, so we must generate a fresh
relfilenode in the new tablespace.
The 8.3 patch to let deleted relation files linger as zero-length files until
the next checkpoint made this more obvious: moving a relation from one table
space another, and then back again, caused a collision with the lingering
file.
Back-patch to 8.1. The issue is present in 8.0 as well, but it doesn't seem
worth fixing there, because we didn't have protection from OID collisions
after OID wraparound before 8.1.
Report by Guillaume Lelarge.
supplies an expression that can't be coerced to the target column type.
The code previously attempted to point at the target column name, which
doesn't work at all in an INSERT with omitted column name list, and is
also not remarkably helpful when the problem is buried somewhere in a
long INSERT-multi-VALUES command. Make it point at the failed expression
instead.
get_name_for_var_field didn't have enough context to interpret a reference to
a CTE query's output. Fixing this requires separate hacks for the regular
deparse case (pg_get_ruledef) and for the EXPLAIN case, since the available
context information is quite different. It's pretty nearly parallel to the
existing code for SUBQUERY RTEs, though. Also, add code to make sure we
qualify a relation name that matches a CTE name; else the CTE will mistakenly
capture the reference when reloading the rule.
In passing, fix a pre-existing problem with get_name_for_var_field not working
on variables in targetlists of SubqueryScan plan nodes. Although latent all
along, this wasn't a problem until we made EXPLAIN VERBOSE try to print
targetlists. To do this, refactor the deparse_context_for_plan API so that
the special case for SubqueryScan is all on ruleutils.c's side.
the column alias names of the RTE referenced by the Var to the RowExpr.
This is needed to allow ruleutils.c to correctly deparse FieldSelect nodes
referencing such a construct. Per my recent bug report.
Adding a field to RowExpr forces initdb (because of stored rules changes)
so this solution is not back-patchable; which is unfortunate because 8.2
and 8.3 have this issue. But it only affects EXPLAIN for some pretty odd
corner cases, so we can probably live without a solution for the back
branches.
relation forks. While the file names are not visible to users, for those
that do peek into the data directory, it's nice to have more descriptive
names. Per Greg Stark's suggestion.
well as regular tables. Per discussion, this seems necessary to meet the
principle of least astonishment.
In passing, simplify the error messages in warnAutoRange(). Now that we
have parser error position info for these errors, it doesn't seem very
useful to word the error message differently depending on whether we are
inside a sub-select or not.
machine produces zero (rather than the more usual minimum-possible-integer)
for the only possible overflow case. This has been seen to occur for at least
some word widths on some hardware, and it's cheap enough to check for
everywhere. Per Peter's analysis of buildfarm reports.
This could be back-patched, but in the absence of any gripes from the field
I doubt it's worth the trouble.
recursive CTE that we're still in progress of analyzing. Add a similar guard
to the similar code in expandRecordVariable(), and tweak regression tests to
cover this case. Per report from Dickson S. Guedes.
There are some unimplemented aspects: recursive queries must use UNION ALL
(should allow UNION too), and we don't have SEARCH or CYCLE clauses.
These might or might not get done for 8.4, but even without them it's a
pretty useful feature.
There are also a couple of small loose ends and definitional quibbles,
which I'll send a memo about to pgsql-hackers shortly. But let's land
the patch now so we can get on with other development.
Yoshiyuki Asaba, with lots of help from Tatsuo Ishii and Tom Lane