Although the key-combining code claimed to work correctly if its input
contained both lossy and exact pointers for a single page in a single TID
stream, in fact this did not work, and could not work without pretty
fundamental redesign. Modify keyGetItem so that it will not return such a
stream, by handling lossy-pointer cases a bit more explicitly than we did
before.
Per followup investigation of a gripe from Artur Dabrowski.
An example of a query that failed given his data set is
select count(*) from search_tab where
(to_tsvector('german', keywords ) @@ to_tsquery('german', 'ee:* | dd:*')) and
(to_tsvector('german', keywords ) @@ to_tsquery('german', 'aa:*'));
Back-patch to 8.4 where the lossy pointer code was introduced.
struct representing a tree entry, rather than being a separately allocated
piece of storage. This API is at least as clean as the old one (if not
more so --- there were some bizarre choices in there) and it permits a
very substantial memory savings, on the order of 2X in ginbulk.c's usage.
Also, fix minor memory leaks in code called by ginEntryInsert, in
particular in ginInsertValue and entryFillRoot, as well as ginEntryInsert
itself. These leaks resulted in the GIN index build context continuing
to bloat even after we'd filled it to maintenance_work_mem and started
to dump data out to the index.
In combination these fixes restore the GIN index build code to honoring
the maintenance_work_mem limit about as well as it did in 8.4. Speed
seems on par with 8.4 too, maybe even a bit faster, for a non-pathological
case in which HEAD was formerly slower.
Back-patch to 9.0 so we don't have a performance regression from 8.4.
possible (ie, whenever the tsquery is a constant), even when no statistics
are available for the tsvector. For example, foo @@ 'a & b'::tsquery
can be expected to be more selective than foo @@ 'a'::tsquery, whether
or not we know anything about foo. We use DEFAULT_TS_MATCH_SEL as the assumed
selectivity of individual query terms when no stats are available, then
combine the terms according to the query's AND/OR structure as usual.
Per experimentation with Artur Dabrowski's example. (The fact that there
are no stats available in that example is a problem in itself, but
nonetheless tsmatchsel should be smarter about the case.)
Back-patch to 8.4 to keep all versions of tsmatchsel() in sync.
routines to make them behave better in the presence of "lossy" index pointers.
The previous coding was outright incorrect for some cases, as recently
reported by Artur Dabrowski: scanGetItem would fail to return index entries in
cases where one index key had multiple exact pointers on the same page as
another key had a lossy pointer. Also, keyGetItem was extremely inefficient
for cases where a single index key generates multiple "entry" streams, such as
an @@ operator with a multiple-clause tsquery. The presence of a lossy page
pointer in any one stream defeated its ability to use the opclass
consistentFn, resulting in probing many heap pages that didn't really need to
be visited. In Artur's example case, a query like
WHERE tsvector @@ to_tsquery('a & b')
was about 50X slower than the theoretically equivalent
WHERE tsvector @@ to_tsquery('a') AND tsvector @@ to_tsquery('b')
The way that I chose to fix this was to have GIN call the consistentFn
twice with both TRUE and FALSE values for the in-doubt entry stream,
returning a hit if either call produces TRUE, but not if they both return
FALSE. The code handles this for the case of a single in-doubt entry stream,
but punts (falling back to the stupid behavior) if there's more than one lossy
reference to the same page. The idea could be scaled up to deal with multiple
lossy references, but I think that would probably be wasted complexity. At
least to judge by Artur's example, such cases don't occur often enough to be
worth trying to optimize.
Back-patch to 8.4. 8.3 did not have lossy GIN index pointers, so not
subject to these problems.
look through join alias Vars to avoid breaking join queries, and
move the test to someplace where it will catch more possible ways
of calling a function. We still ought to throw away the whole thing
in favor of a data-type-based solution, but that's not feasible in
the back branches.
This needs to be back-patched further than 9.0, but I don't have time
to do so today. Committing now so that the fix gets into 9.0beta4.
Transaction aborts now record their LSN to avoid corner case
behaviour in SR/HS, hence change of name of variables and functions.
As pointed out by Fujii Masao. Cosmetic changes only.
related functions. Per today's discussion, we will henceforth assume
that datatype I/O functions are either stable or immutable, never volatile.
(This implies in particular that domain CHECK constraint expressions shouldn't
be volatile, since domain_in executes them.) In turn, functions that execute
the I/O functions of arbitrary datatypes should always be labeled stable.
This affects the labeling of array_to_string, which was unsafely marked
immutable, and record_in, record_out, record_recv, record_send,
domain_in, domain_recv, which were over-conservatively marked volatile.
The array I/O functions were already marked stable, which is correct
per this policy but would have been wrong if we maintained domain_in
as volatile.
Back-patch to 9.0, along with an earlier fix to correctly mark cash_in
and cash_out as stable not immutable (since they depend on lc_monetary).
No catversion bump --- the implications of this are not currently
severe enough to justify a forced initdb.
assuming that a local char[] array would be aligned on at least a word
boundary. There are architectures on which that is pretty much guaranteed to
NOT be the case ... and those arches also don't like non-aligned memory
accesses, meaning that log_newpage() would crash if it ever got invoked.
Even on Intel-ish machines there's a potential for a large performance penalty
from doing I/O to an inadequately aligned buffer. So palloc it instead.
Backpatch to 8.0 --- 7.4 doesn't have this code.
If a zeroed page is present in the heap, ALTER TABLE .. SET TABLESPACE will
set the LSN and TLI while copying it, which is wrong, and heap_xlog_newpage()
will do the same thing during replay, so the corruption propagates to any
standby. Note, however, that the bug can't be demonstrated unless archiving
is enabled, since in that case we skip WAL logging altogether, and the LSN/TLI
are not set.
Back-patch to 8.0; prior releases do not have tablespaces.
Analysis and patch by Jeff Davis. Adjustments for back-branches and minor
wordsmithing by me.
list in ExecLockRows() forgot to allow for the possibility that some of the
rowmarks are for child tables that aren't relevant to the current row.
Per report from Kenichiro Tanaka.
Avoid hard-coding lockmode used for many altering DDL commands, allowing easier
future changes of lock levels. Implementation of initial analysis on DDL
sub-commands, so that many lock levels are now at ShareUpdateExclusiveLock or
ShareRowExclusiveLock, allowing certain DDL not to block reads/writes.
First of number of planned changes in this area; additional docs required
when full project complete.
a pass-by-reference datatype with a nontrivial projection step.
We were using the same memory context for the projection operation as for
the temporary context used by the hashtable routines in execGrouping.c.
However, the hashtable routines feel free to reset their temp context at
any time, which'd lead to destroying input data that was still needed.
Report and diagnosis by Tao Ma.
Back-patch to 8.1, where the problem was introduced by the changes that
allowed us to work with "virtual" tuples instead of materializing intermediate
tuple values everywhere. The earlier code looks quite similar, but it doesn't
suffer the problem because the data gets copied into another context as a
result of having to materialize ExecProject's output tuple.
We used to be consistent about this, but my recent patch to add a
restart_after_crash GUC failed to follow the existing convention.
Report and patch from Fujii Masao.
We now use the phrase 'via local socket in' rather than 'on host' in both
\c and \conninfo output, when applicable.
Fujii Masao, with some kibitzing by me.
I've added a quote_all_identifiers GUC which affects the behavior
of the backend, and a --quote-all-identifiers argument to pg_dump
and pg_dumpall which sets the GUC and also affects the quoting done
internally by those applications.
Design by Tom Lane; review by Alex Hunsaker; in response to bug #5488
filed by Hartmut Goebel.
Remove bespoke code in DoCopy and RI_Initial_Check, which now instead
fabricate call ExecCheckRTPerms with a manufactured RangeTblEntry.
This is intended to make it feasible for an enhanced security provider
to actually make use of ExecutorCheckPerms_hook, but also has the
advantage that RI_Initial_Check can allow use of the fast-path when
column-level but not table-level permissions are present.
KaiGai Kohei. Reviewed (in an earlier version) by Stephen Frost, and by me.
Some further changes to the comments by me.
Per discussion with David Christensen, there can be multiple
instances of PG accessible via local sockets, and you need the port
to see which one you're actually connected to. David's original
patch worked this way, but I inadvertently ripped it out during
commit.
Normally, we automatically restart after a backend crash, but in some
cases when PostgreSQL is invoked by clusterware it may be desirable to
suppress this behavior, so we provide an option which does this.
Since no existing GUC group quite fits, create a new group called
"error handling options" for this and the previously undocumented GUC
exit_on_error, which is now documented.
Review by Fujii Masao.
path when CSV logging is configured but not yet operational. It's sufficient
to send the message to stderr, as we were already doing, and the "Not safe"
gripe has already confused at least two core members ...
Backpatch to 9.0, but not further --- doesn't seem appropriate to change
this behavior in stable branches.
any implicit casting previously applied to the targetlist item. This is
reasonable because the implicit cast, by definition, wasn't written by the
user; so we are preserving the expected behavior that ORDER BY items match
textually equivalent tlist items. The case never arose before because there
couldn't be any implicit casting of a top-level SELECT item before we process
ORDER BY etc. But now it can arise in the context of aggregates containing
ORDER BY clauses, since the "targetlist" is the already-casted list of
arguments for the aggregate. The net effect is that the datatype used for
ORDER BY/DISTINCT purposes is the aggregate's declared input type, not that
of the original input column; which is a bit debatable but not horrendous,
and to do otherwise would require major rework that doesn't seem justified.
Per bug #5564 from Daniel Grace. Back-patch to 9.0 where aggregate ORDER BY
was implemented.
This adds a libpq connection parameter requirepeer that specifies the user
name that the server process is expected to run under.
reviewed by KaiGai Kohei
log files created by the syslogger process.
In passing, make unix_file_permissions display its value in octal, same
as log_file_mode now does.
Martin Pihlak
from defining non-self-conflicting constraints.
Jeff Davis
Note: I (tgl) objected to removing this check in 9.0 on the grounds that it
was an important sanity check in new, poorly tested code. However, it should
be all right to remove it for 9.1, since we'll get field testing from the
9.0 branch.
to dump a PUBLIC user mapping correctly, as per bug #5560 from Shigeru Hanada.
Use the pg_user_mappings view rather than trying to access pg_user_mapping
directly, so that the code doesn't fail when run by a non-superuser. And
clean up some minor carelessness such as unsafe usage of fmtId().
Back-patch to 8.4 where this code was added.
parameter against server cert's CN field) to succeed in the case where
both host and hostaddr are specified. As with the existing precedents
for Kerberos, GSSAPI, SSPI, it is the calling application's responsibility
that host and hostaddr match up --- we just use the host name as given.
Per bug #5559 from Christopher Head.
In passing, make the error handling and messages for the no-host-name-given
failure more consistent among these four cases, and correct a lie in the
documentation: we don't attempt to reverse-lookup host from hostaddr
if host is missing.
Back-patch to 8.4 where SSL cert verification was introduced.
rather than just $N. This brings the display of nestloop-inner-indexscan
plans back to where it's been, and incidentally improves the display of
SubPlan parameters as well. In passing, simplify the EXPLAIN code by
having it deal primarily in the PlanState tree rather than separately
searching Plan and PlanState trees. This is noticeably cleaner for
subplans, and about a wash elsewhere.
One small difference from previous behavior is that EXPLAIN will no longer
qualify local variable references in inner-indexscan plan nodes, since it
no longer sees such nodes as possibly referencing multiple tables. Vars
referenced through PARAM_EXEC Params are still forcibly qualified, though,
so I don't think the display is any more confusing than before. Adjust a
couple of examples in the documentation to match this behavior.
loop from being dropped, I missed subtransaction cleanup. Pinned portals
must be dropped at subtransaction cleanup just as they are at main
transaction cleanup.
Per bug #5556 by Robert Walker. Backpatch to 8.0, 7.4 didn't have
subtransactions.
relation using the general PARAM_EXEC executor parameter mechanism, rather
than the ad-hoc kluge of passing the outer tuple down through ExecReScan.
The previous method was hard to understand and could never be extended to
handle parameters coming from multiple join levels. This patch doesn't
change the set of possible plans nor have any significant performance effect,
but it's necessary infrastructure for future generalization of the concept
of an inner indexscan plan.
ExecReScan's second parameter is now unused, so it's removed.
use the actual element type of the array it's disassembling, rather than
trusting the type OID passed in by its caller. This is needed because
sometimes the planner passes in a type OID that's only binary-compatible
with the target column's type, rather than being an exact match. Per an
example from Bernd Helmle.
Possibly we should refactor get_attstatsslot/free_attstatsslot to not expect
the caller to supply type ID data at all, but for now I'll just do the
minimum-change fix.
Back-patch to 7.4. Bernd's test case only crashes back to 8.0, but since
these subroutines are the same in 7.4, I suspect there may be variant
cases that would crash 7.4 as well.
resjunk outputs of subquery tlists, instead of throwing an error. Per bug
#5548 from Daniel Grace.
We might at some point find we ought to back-patch this further than 9.0,
but I think that such Vars can only occur as resjunk members of upper-level
tlists, in which case the problem can't arise because prior versions didn't
print resjunk tlist items in EXPLAIN VERBOSE.
This hook allows a loadable module to gain control when table permissions
are checked. It is expected to be used by an eventual SE-PostgreSQL
implementation, but there are other possible applications as well. A
sample contrib module can be found in the archives at:
http://archives.postgresql.org/pgsql-hackers/2010-05/msg01095.php
Robert Haas and Stephen Frost
(_PG_init should be called only once anyway, but as long as it's got an
internal guard against repeat calls, that should be in front of the
version check.)
This wasn't important when we used diff's -w (--ignore-all-space) option
to compare regression result files, but it is now. Per buildfarm member
canary, which evidently has been offline since we did that in November,
but came to life again today.
sub-select contains a join alias reference that expands into an expression
containing another sub-select. Per yesterday's report from Merlin Moncure
and subsequent off-list investigation.
Back-patch to 7.4. Older versions didn't attempt to flatten sub-selects in
ways that would trigger this problem.
To do that, replace L'\0' by (WCHAR) 0. Perhaps someday we should teach
pgindent about wide-character literals, but so long as this is the only
use-case in the entire Postgres sources, a workaround seems easier.
Per extensive discussion on pgsql-hackers. We are deliberately not
back-patching this even though the behavior of 8.3 and 8.4 is
unquestionably broken, for fear of breaking existing users of this
parameter. This incompatibility should be release-noted.
flag for src/port/ in front of any -L flags placed in LDFLAGS by configure.
This undoes an L-flag-ordering change that I had thought would be safe,
but seems to be making at least one buildfarm member fail --- the only
theory for orca's failure that I can think of is that it's got an old
copy of libpgport.a in /usr/lib. Also allow for LDFLAGS_SL to be set by
contrib makefiles before they invoke Makefile.global.
needs to appear before anything placed in SHLIB_LINK. This is because
SHLIB_LINK is typically a subset of LIBS, and LIBS has to appear after
LDFLAGS on platforms that are sensitive to the relative order of -L and -l
switches.
supposing that they should set SHLIB_LINK rather than LDFLAGS_SL. Since these
don't go through Makefile.shlib that was a no-op on most platforms. Also
regularize the few platform-specific Makefiles that did pay attention to
SHLIB_LINK: it seems that the real value of that is to pull in BE_DLLLIBS,
so do that instead. Per buildfarm failures on cygwin.
linking both executables and shared libraries, and we add on LDFLAGS_EX when
linking executables or LDFLAGS_SL when linking shared libraries. This
provides a significantly cleaner way of dealing with link-time switches than
the former behavior. Also, make sure that the various platform-specific
%.so: %.o rules incorporate LDFLAGS and LDFLAGS_SL; most of them missed that
before. (I did not add these variables for the platforms that invoke $(LD)
directly, however. It's not clear if we can do that safely, since for the
most part we assume these variables use CC command-line syntax.)
Per gripe from Aaron Swenson and subsequent investigation.
being used in a PL/pgSQL FOR loop is closed was inadequate, as Tom Lane
pointed out. The bug affects FOR statement variants too, because you can
close an implicitly created cursor too by guessing the "<unnamed portal X>"
name created for it.
To fix that, "pin" the portal to prevent it from being dropped while it's
being used in a PL/pgSQL FOR loop. Backpatch all the way to 7.4 which is
the oldest supported version.
idea from the start since the variable is only meant to track commit/abort
events. This patch reverts the logic around the variable to what it was in
8.4, except that the value is now kept in shared memory rather than a static
variable, so that it can be reported correctly by CreateRestartPoint (which is
executed in the bgwriter).
to have different values in different processes of the primary server.
Also put it into the "Streaming Replication" GUC category; it doesn't belong
in "Standby Servers" because you use it on the master not the standby.
In passing also correct guc.c's idea of wal_keep_segments' category.
max_standby_streaming_delay, and revise the implementation to avoid assuming
that timestamps found in WAL records can meaningfully be compared to clock
time on the standby server. Instead, the delay limits are compared to the
elapsed time since we last obtained a new WAL segment from archive or since
we were last "caught up" to WAL data arriving via streaming replication.
This avoids problems with clock skew between primary and standby, as well
as other corner cases that the original coding would misbehave in, such
as the primary server having significant idle time between transactions.
Per my complaint some time ago and considerable ensuing discussion.
Do some desultory editing on the hot standby documentation, too.
Backpatch to 8.3, which is as far back as we have opfamilies.
The opclass portion could probably be backpatched to 8.2, when
REASSIGN OWNED was added, but for now I have not done that.
Asko Tiidumaa, with minor adjustments by me.
The previous commit to make copydir() interruptible prevented
postgres.exe from linking on MinGW and Cygwin, because on those
platforms libpgport_srv.a can't freely reference symbols defined
by the backend. Since that code is already backend-specific anyway,
just move the whole file into the backend rather than adding further
kludges to deal with the symbols needed by CHECK_FOR_INTERRUPTS().
This probably needs some further cleanup, but this commit just moves
the file as-is, which should hopefully be enough to turn the
buildfarm green again.
This makes ALTER DATABASE .. SET TABLESPACE and CREATE DATABASE more
sensitive to interrupts. Backpatch to 8.4, where ALTER DATABASE .. SET
TABLESPACE was introduced. We could go back further, but in the absence
of complaints about the CREATE DATABASE case it doesn't seem worth it.
Guillaume Lelarge, with a small correction by me.
but we have nevertheless exposed them to users via pg_get_expr(). It would
be too much maintenance effort to rigorously check the input, so put a hack
in place instead to restrict pg_get_expr() so that the argument must come
from one of the system catalog columns known to contain valid expressions.
Per report from Rushabh Lathia. Backpatch to 7.4 which is the oldest
supported version at the moment.
as well as fseeko, and to not assume that fseeko(fp, 0, SEEK_CUR) proves
anything. Also improve some related comments. Per my observation that
the SEEK_CUR test didn't actually work on some platforms, and subsequent
discussion with Robert Haas.
Back-patch to 8.4. In earlier releases it's not that important whether
we get the hasSeek test right, but with parallel restore it matters.
contain data offsets (which it won't, if pg_dump thought its output wasn't
seekable). To do that, remove an unnecessarily aggressive error check, and
instead fail if we get to the end of the archive without finding the desired
data item. Also improve the error message to be more specific about the
cause of the problem. Per discussion of recent report from Igor Neyman.
Back-patch to 8.4 where parallel restore was introduced.
of YYSTYPE, and hence returning the wrong answer for cases where a plpgsql
"unreserved keyword" really does conflict with a variable name. Obviously
I didn't test this enough :-(. Per bug #5524 from Peter Gagarinov.
This adds four additional connection parameters to libpq: keepalives,
keepalives_idle, keepalives_count, and keepalives_interval.
keepalives default to on, per discussion, but can be turned off by
specifying keepalives=0. The remaining parameters, where supported,
can be used to adjust how often keepalives are sent and how many
can be lost before the connection is broken.
The immediate motivation for this patch is to make sure that
walreceiver will eventually notice if the master reboots without
closing the connection cleanly, but it should be helpful in other
cases as well.
Tollef Fog Heen, Fujii Masao, and me.
In HEAD, emit a warning when an operator named => is defined.
In both HEAD and the backbranches (except in 8.2, where contrib
modules do not have documentation), document that hstore's text =>
text operator may be removed in a future release, and encourage the
use of the hstore(text, text) function instead. This function only
exists in HEAD (previously, it was called tconvert), so backpatch
it back to 8.2, when hstore was added. Per discussion.
might close the cursor, rendering the Portal pointer to it invalid.
Closing the cursor in the middle of the loop is not a very sensible thing
to do, but we must handle it gracefully and throw an error instead of
crashing.
If such a Var appeared within a nested sub-select, we failed to translate it
correctly during pullup of the view, because the recursive call to
replace_rte_variables_mutator was looking for the wrong sublevels_up value.
Bug was introduced during the addition of the PlaceHolderVar mechanism.
Per bug #5514 from Marcos Castedo.
master. Otherwise a subsequent crash could cause the master to lose WAL that
has already been applied on the slave, resulting in the slave being out of
sync and soon corrupt. Per recent discussion and an example from Robert Haas.
Fujii Masao
description for vacuum_defer_cleanup_age to the correct category.
Sections in postgresql.conf are also sorted in the same order with docs.
Per gripe by Fujii Masao, suggestion by Heikki Linnakangas, and patch by me.
and retry. If the record is genuinely corrupt in the master database,
there's little hope of recovering, but it's better than simply retrying
to apply the corrupt WAL record in a tight loop without even trying to
retransmit it, which is what we used to do.
string for a streaming replication connection. It's ignored by the
server, but allows libpq to pick up the password from .pgpass where
"replication" is specified as the database name.
Patch by Fujii Masao per Tom's suggestion, with some wording changes by me.
pg_last_xlog_replay_location(). Per Robert Haas's suggestion, after
Itagaki Takahiro pointed out an issue in the docs. Also, some wording
changes in the docs by me.
While my previous attempt seems to always produce valid YAML, it
doesn't always produce YAML that means what it appears to mean,
because of tokens like "0xa" and "true", which without quotes will
be interpreted as integer or Boolean literals. So, instead, just
quote everything that's not known to be a number, as we do for
JSON.
Dean Rasheed, with some changes to the comments by me.
checkpoint_timeout to trigger restartpoints. We used to deliberately only
do time-based restartpoints, because if checkpoint_segments is small we
would spend time doing restartpoints more often than really necessary.
But now that restartpoints are done in bgwriter, they're not as
disruptive as they used to be. Secondly, because streaming replication
stores the streamed WAL files in pg_xlog, we want to clean it up more
often to avoid running out of disk space when checkpoint_timeout is large
and checkpoint_segments small.
Patch by Fujii Masao, with some minor changes by me.
the current one. Not doing this would leave the walwriter with a handle to a
deleted file if there was nothing for it to do for a long period of time,
preventing the file from being completely removed.
Reported by Tollef Fog Heen, and thanks to Heikki for some hand-holding with
the patch.
The previous code failed to quote in many cases where quoting was necessary -
YAML has loads of special characters, including -:[]{},"'|*& - so quote much
more aggressively, and only refrain from quoting things where it seems fairly
clear that it isn't necessary.
Per report from Dean Rasheed.
to be initialized with proper values. Affected parameters are
fillfactor, analyze_threshold, and analyze_scale_factor.
Especially uninitialized fillfactor caused inefficient page usage
because we built a StdRdOptions struct in which fillfactor is zero
if any reloption is set for the toast table.
In addition, we disallow toast.autovacuum_analyze_threshold and
toast.autovacuum_analyze_scale_factor because we didn't actually
support them; they are always ignored.
Report by Rumko on pgsql-bugs on 12 May 2010.
Analysis by Tom Lane and Alvaro Herrera. Patch by me.
Backpatch to 8.4.
and current server clock time to SR data messages. These are not currently
used on the slave side but seem likely to be useful in future, and it'd be
better not to change the SR protocol after release. Per discussion.
Also do some minor code review and cleanup on walsender.c, and improve the
protocol documentation.