Don't leak a file descriptor if the file is empty or we can't read its size.
Expect there to be a newline at the end of the last line, too. If there
isn't, ignore anything after the last newline. This makes it a tiny bit
more robust in case the file is appended to concurrently, so that we don't
return the last line if it hasn't been fully written yet. And this makes
the code a bit less obscure, anyway. Per Tom Lane's suggestion.
Backpatch to all supported branches.
Concurrent behaviour was flawed when using
a two-step process, so add an additional
phase of processing to ensure concurrency
for both SELECTs and INSERT/UPDATE/DELETEs.
Backpatch to 9.2
Andres Freund, tweaked by me
If a potential equivalence clause references a variable from the nullable
side of an outer join, the planner needs to take care that derived clauses
are not pushed to below the outer join; else they may use the wrong value
for the variable. (The problem arises only with non-strict clauses, since
if an upper clause can be proven strict then the outer join will get
simplified to a plain join.) The planner attempted to prevent this type
of error by checking that potential equivalence clauses aren't
outerjoin-delayed as a whole, but actually we have to check each side
separately, since the two sides of the clause will get moved around
separately if it's treated as an equivalence. Bugs of this type can be
demonstrated as far back as 7.4, even though releases before 8.3 had only
a very ad-hoc notion of equivalence clauses.
In addition, we neglected to account for the possibility that such clauses
might have nonempty nullable_relids even when not outerjoin-delayed; so the
equivalence-class machinery lacked logic to compute correct nullable_relids
values for clauses it constructs. This oversight was harmless before 9.2
because we were only using RestrictInfo.nullable_relids for OR clauses;
but as of 9.2 it could result in pushing constructed equivalence clauses
to incorrect places. (This accounts for bug #7604 from Bill MacArthur.)
Fix the first problem by adding a new test check_equivalence_delay() in
distribute_qual_to_rels, and fix the second one by adding code in
equivclass.c and called functions to set correct nullable_relids for
generated clauses. Although I believe the second part of this is not
currently necessary before 9.2, I chose to back-patch it anyway, partly to
keep the logic similar across branches and partly because it seems possible
we might find other reasons why we need valid values of nullable_relids in
the older branches.
Add regression tests illustrating these problems. In 9.0 and up, also
add test cases checking that we can push constants through outer joins,
since we've broken that optimization before and I nearly broke it again
with an overly simplistic patch for this problem.
If an SMgrRelation is not "owned" by a relcache entry, don't allow it to
live past transaction end. This design allows the same SMgrRelation to be
used for blind writes of multiple blocks during a transaction, but ensures
that we don't hold onto such an SMgrRelation indefinitely. Because an
SMgrRelation typically corresponds to open file descriptors at the fd.c
level, leaving it open when there's no corresponding relcache entry can
mean that we prevent the kernel from reclaiming deleted disk space.
(While CacheInvalidateSmgr messages usually fix that, there are cases
where they're not issued, such as DROP DATABASE. We might want to add
some more sinval messaging for that, but I'd be inclined to keep this
type of logic anyway, since allowing VFDs to accumulate indefinitely
for blind-written relations doesn't seem like a good idea.)
This code replaces a previous attempt towards the same goal that proved
to be unreliable. Back-patch to 9.1 where the previous patch was added.
This reverts commit fba105b109.
That approach had problems with the smgr-level state not tracking what
we really want to happen, and with the VFD-level state not tracking the
smgr-level state very well either. In consequence, it was still possible
to hold kernel file descriptors open for long-gone tables (as in recent
report from Tore Halset), and yet there were also cases of FDs being closed
undesirably soon. A replacement implementation will follow.
Provide a common implementation of embedded singly-linked and
doubly-linked lists. "Embedded" in the sense that the nodes'
next/previous pointers exist within some larger struct; this design
choice reduces memory allocation overhead.
Most of the implementation uses inlineable functions (where supported),
for performance.
Some existing uses of both types of lists have been converted to the new
code, for demonstration purposes. Other uses can (and probably will) be
converted in the future. Since dllist.c is unused after this conversion,
it has been removed.
Author: Andres Freund
Some tweaks by me
Reviewed by Tom Lane, Peter Geoghegan
... because the latter plays games with the privileges for language SQL.
It looks like running alter_generic in parallel with "misc" is OK though.
Also, adjust serial_schedule to maintain the same test ordering (up to
parallelism) as parallel_schedule.
If postmaster changed postmaster.pid while pg_ctl was reading it, pg_ctl
could overrun the buffer it allocated for the file. Fix by reading the
whole file to memory with one read() call.
initdb contains an identical copy of the readfile() function, but the files
that initdb reads are static, not modified concurrently. Nevertheless, add
a simple bounds-check there, if only to silence static analysis tools.
Per report from Dave Vitek. Backpatch to all supported branches.
In the previous coding, new backend processes would attempt to create their
self-pipe during the OwnLatch call in InitProcess. However, pipe creation
could fail if the kernel is short of resources; and the system does not
recover gracefully from a FATAL error right there, since we have armed the
dead-man switch for this process and not yet set up the on_shmem_exit
callback that would disarm it. The postmaster then forces an unnecessary
database-wide crash and restart, as reported by Sean Chittenden.
There are various ways we could rearrange the code to fix this, but the
simplest and sanest seems to be to split out creation of the self-pipe into
a new function InitializeLatchSupport, which must be called from a place
where failure is allowed. For most processes that gets called in
InitProcess or InitAuxiliaryProcess, but processes that don't call either
but still use latches need their own calls.
Back-patch to 9.1, which has only a part of the latch logic that 9.2 and
HEAD have, but nonetheless includes this bug.
In commit 11e131854f, I missed the case of
a CTE RTE that doesn't have a user-defined alias, but does have an
alias assigned by set_rtable_names(). Per report from Peter Eisentraut.
While at it, refactor slightly to reduce code duplication.
This change ensures that the planner will see implicit and explicit casts
as equivalent for all purposes, except in the minority of cases where
there's actually a semantic difference (as reflected by having a 3-argument
cast function). In particular, this fixes cases where the EquivalenceClass
machinery failed to consider two references to a varchar column as
equivalent if one was implicitly cast to text but the other was explicitly
cast to text, as seen in bug #7598 from Vaclav Juza. We have had similar
bugs before in other parts of the planner, so I think it's time to fix this
problem at the core instead of continuing to band-aid around it.
Remove set_coercionform_dontcare(), which represents the band-aid
previously in use for allowing matching of index and constraint expressions
with inconsistent cast labeling. (We can probably get rid of
COERCE_DONTCARE altogether, but I don't think removing that enum value in
back branches would be wise; it's possible there's third party code
referring to it.)
Back-patch to 9.2. We could go back further, and might want to once this
has been tested more; but for the moment I won't risk destabilizing plan
choices in long-since-stable branches.
When hashing a subplan like "WHERE (a, b) NOT IN (SELECT x, y FROM ...)",
findPartialMatch() attempted to match rows using the hashtable's internal
equality operators, which of course are for x and y's datatypes. What we
need to use are the potentially cross-type operators for a=x, b=y, etc.
Failure to do that leads to wrong answers or even crashes. The scope for
problems is limited to cases where we have different types with compatible
hash functions (else we'd not be using a hashed subplan), but for example
int4 vs int8 can cause the problem.
Per bug #7597 from Bo Jensen. This has been wrong since the hashed-subplan
code was written, so patch all the way back.
Rename replication_timeout to wal_sender_timeout, and add a new setting
called wal_receiver_timeout that does the same at the walreceiver side.
There was previously no timeout in walreceiver, so if the network went down,
for example, the walreceiver could take a long time to notice that the
connection was lost. Now with the two settings, both sides of a replication
connection will detect a broken connection similarly.
It is no longer necessary to manually set wal_receiver_status_interval to
a value smaller than the timeout. Both wal sender and receiver now
automatically send a "ping" message if more than 1/2 of the configured
timeout has elapsed, and it hasn't received any messages from the other end.
Amit Kapila, heavily edited by me.
Numerous flex and bison make rules have appeared in the source tree
over time, and they are all virtually identical, so we can replace
them by pattern rules with some variables for customization.
Users of pgxs will also be able to benefit from this.
Apparently, on some glibc versions this causes warnings when
optimization is not enabled.
Altogether, there appear to be too many incompatibilities surrounding
this.
We no longer use GetNewOidWithIndex on pg_largeobject; rather,
pg_largeobject_metadata's regular OID column is considered the repository
of OIDs for large objects. The special functionality is still needed for
TOAST tables however.
The idea here is to make sure the planner will evaluate these functions
last not first among the filter conditions in psql pattern search and
tab-completion queries. We've discussed this several times, and there
was consensus to do it back in August, but we didn't want to do it just
before a release. Now seems like a safer time.
No catversion bump, since this catalog change doesn't create a backend
incompatibility nor any regression test result changes.
Building a shlib on AIX requires use of the mkldexport.sh script, but we
failed to install that, preventing its use from non-source-tree contexts.
Also, Makefile.aix had the wrong idea about where to find the installed
copy of the postgres.imp symbol file used by AIX.
Per report from John Pierce. Patch all the way back, since this has been
broken since the beginning of PGXS.
Do read/write permissions checks at most once per large object descriptor,
not once per lo_read or lo_write call as before. The repeated tests were
quite useless in the read case since the snapshot-based tests were
guaranteed to produce the same answer every time. In the write case,
the extra tests could in principle detect revocation of write privileges
after a series of writes has started --- but there's a race condition there
anyway, since we'd check privileges before performing and certainly before
committing the write. So there's no real advantage to checking every
single time, and we might as well redefine it as "only check the first
time".
On the same reasoning, remove the LargeObjectExists checks in inv_write
and inv_truncate. We already checked existence when the descriptor was
opened, and checking again doesn't provide any real increment of safety
that would justify the cost.
The former name was too likely to conflict with symbols from external
headers; and, as seen in recent buildfarm failures in member spoonbill,
it has now happened at least in plpython.
I found that these functions tend to return -1 while leaving an empty error
message string in the PGconn, if they suffer some kind of I/O error on the
file. The reason is that lo_close, which thinks it's executed a perfectly
fine SQL command, clears the errorMessage. The minimum-change workaround
is to reorder operations here so that we don't fill the errorMessage until
after lo_close.
libpq defines these functions as accepting "size_t" lengths ... but the
underlying backend functions expect signed int32 length parameters, and so
will misinterpret any value exceeding INT_MAX. Fix the libpq side to throw
error rather than possibly doing something unexpected.
This is a bug of long standing, but I doubt it's worth back-patching. The
problem is really pretty academic anyway with lo_read/lo_write, since any
caller expecting sane behavior would have to have provided a multi-gigabyte
buffer. It's slightly more pressing with lo_truncate, but still we haven't
supported large objects over 2GB until now.
Fix broken-on-bigendian-machines byte-swapping functions, add missed update
of alternate regression expected file, improve error reporting, remove some
unnecessary code, sync testlo64.c with current testlo.c (it seems to have
been cloned from a very old copy of that), assorted cosmetic improvements.
We already had those, but they forced modules to spell out the function
bodies twice. Eliminate some duplicates we had already grown.
Extracted from a somewhat larger patch from Andres Freund.
This bug was introduced by my patch to use the regular die/quickdie signal
handlers in walsender processes. I tried to make walsender exit at next
CHECK_FOR_INTERRUPTS() by setting ProcDiePending, but that's not enough, you
need to set InterruptPending too. On second thoght, it was not a very good
way to make walsender exit anyway, so use proc_exit(0) instead.
Also, send a CommandComplete message before exiting; that's what we did
before, and you get a nicer error message in the standby that way.
Reported by Thom Brown.
Get rid of the fundamentally indefensible assumption that "long long int"
exists and is exactly 64 bits wide on every platform Postgres runs on.
Instead let the configure script select the type to use for "pg_int64".
This is a bit of a pain in the rear since we do not want to pollute client
namespace with all the random symbols that pg_config.h defines; instead
we have to create a separate generated header file, "pg_config_ext.h".
But now that the infrastructure is there, we might have the ability to
add some other stuff that's long been wanting in this area.
4TB large objects (standard 8KB BLCKSZ case). For this purpose new
libpq API lo_lseek64, lo_tell64 and lo_truncate64 are added. Also
corresponding new backend functions lo_lseek64, lo_tell64 and
lo_truncate64 are added. inv_api.c is changed to handle 64-bit
offsets.
Patch contributed by Nozomi Anzai (backend side) and Yugo Nagata
(frontend side, docs, regression tests and example program). Reviewed
by Kohei Kaigai. Committed by Tatsuo Ishii with minor editings.
Instead of continuing if the next character is not an array boundary get_data()
used to continue only on finding a boundary so it was not able to read any
element after the first.
The regular backend's main loop handles signal handling and error recovery
better than the current WAL sender command loop does. For example, if the
client hangs and a SIGTERM is received before starting streaming, the
walsender will now terminate immediately, rather than hang until the
connection times out.
This makes the naming inside plpgsql consistent and distinguishes the
file from the backend's gram.y file. It will also allow easier
refactoring of the bison make rules later on.
Our getnameinfo() replacement implementation in getaddrinfo.c failed
unless NI_NUMERICHOST and NI_NUMERICSERV were given as flags, because
it doesn't resolve host names, only numeric IPs. But per standard,
when those flags are not given, an implementation can still degrade to
not returning host names, so this restriction is unnecessary. When we
remove it, we can eliminate some code in postmaster.c that apparently
tried to work around that.
The initial transition value is stored as a text string and not fed to the
transition type's input function until runtime (so that values such as
"now" don't get frozen at creation time). Previously, CREATE AGGREGATE
didn't do anything with it but that, which meant that even erroneous values
would be accepted and not complained of until the aggregate is used. This
seems unhelpful, and it's confused at least one user, as in Rhys Stewart's
recent report. It seems worth taking a few more cycles to invoke the input
function and verify that the value is acceptable. We can't do this if the
transition type is polymorphic, but in normal aggregates we know the actual
transition type so we can call the right input function.
The previous coding of the YYLLOC_DEFAULT macro behaved strangely for empty
productions, assigning the previous nonterminal's location as the parse
location of the result. The usefulness of that was (at best) debatable
already, but the real problem is that in list-generating nonterminals like
OptFooList: /* EMPTY */ { ... } | OptFooList Foo { ... } ;
the initially-identified location would get copied up, so that even a
nonempty list would be given a bogus parse location. Document how to work
around that, and do so for OptSchemaEltList, so that the error condition
just added for CREATE SCHEMA IF NOT EXISTS produces a sane error cursor.
So far as I can tell, there are currently no other cases where the
situation arises, so we don't need other instances of this coding yet.
Per discussion, schema-element subcommands are not allowed together with
this option, since it's not very obvious what should happen to the element
objects.
Fabrízio de Royes Mello
Remove duplicate implementation of catalog munging and miscellaneous
privilege and consistency checks. Instead rely on already existing data
in objectaddress.c to do the work.
Author: KaiGai Kohei
Tweaked by me
Reviewed by Robert Haas
examine_simple_variable supposed that any RTE_SUBQUERY rel it gets pointed
at must have been planned already. However, this isn't a safe assumption
because we must do selectivity estimation while generating indexscan paths,
and that code might look at join clauses involving a rel that the loop in
set_base_rel_sizes() hasn't reached yet. The simplest fix is to play dumb
in such a situation, that is give up trying to extract any stats for the
Var. This could possibly be improved by making a separate pass over the
RTE list to plan each unflattened subquery before we start the main
planning work --- but that would be pretty invasive and it doesn't seem
worth it, for now at least. (We couldn't just break set_base_rel_sizes()
into two loops: the prescan would need to handle all subquery rels in the
query, not only those in the current join subproblem.)
This bug was introduced in commit 1cb108efb0,
although I think that subsequent changes may have exposed it more than it
was originally. Per bug #7580 from Maxim Boguk.
Apparently this was considered in the original code (see commit
cec3b0a9) but I failed to notice that such entries would always be
skipped by the database check at the start of the loop.
Per bugs #7578 by Nikolay, #6116 by tushar.qa@gmail.com.
You can now get the number of rows processed by a COPY statement in a
PL/pgSQL function with "GET DIAGNOSTICS x = ROW_COUNT".
Pavel Stehule, reviewed by Amit Kapila, with some editing by me.
On some platforms these functions return NULL, rather than the more common
practice of returning a pointer to a zero-sized block of memory. Hack our
various wrapper functions to hide the difference by substituting a size
request of 1. This is probably not so important for the callers, who
should never touch the block anyway if they asked for size 0 --- but it's
important for the wrapper functions themselves, which mistakenly treated
the NULL result as an out-of-memory failure. This broke at least pg_dump
for the case of no user-defined aggregates, as per report from
Matthew Carrington.
Back-patch to 9.2 to fix the pg_dump issue. Given the lack of previous
complaints, it seems likely that there is no live bug in previous releases,
even though some of these functions were in place before that.
Instead of having each object type implement the catalog munging
independently, centralize knowledge about how to do it and expand the
existing table in objectaddress.c with enough data about each object
type to support this operation.
Author: KaiGai Kohei
Tweaks by me
Reviewed by Robert Haas
We had a number of variants on the theme of "malloc or die", with the
majority named like "pg_malloc", but by no means all. Standardize on the
names pg_malloc, pg_malloc0, pg_realloc, pg_strdup. Get rid of pg_calloc
entirely in favor of using pg_malloc0.
This is an essentially cosmetic change, so no back-patch. (I did find
a couple of places where psql and pg_dump were using plain malloc or
strdup instead of the pg_ versions, but they don't look significant
enough to bother back-patching.)
timeval.t_sec is of type time_t, which is not always compatible with long.
I'm not sure if this was just harmless warning or a real bug, but this
fixes it, anyway.
This is just refactoring, to make the functions accessible outside xlog.c.
A followup patch will make use of that, to allow fetching timeline history
files over streaming replication.
The error messages they generate are not portable enough.
Also, since the only point of the alter_generic_1 expected file was to
cover platforms with no collation support, it's now useless, so remove
it.
On reflection (especially after noticing how many buildfarm critters have
__builtin_types_compatible_p but not _Static_assert), it seems like we
ought to try a bit harder to make these macros do something everywhere.
The initial cut at it would have been no help to code that is compiled only
on platforms without _Static_assert, for instance; and in any case not all
our contributors do their initial coding on the latest gcc version.
Some googling about static assertions turns up quite a bit of prior art
for making it work in compilers that lack _Static_assert. The method
that seems closest to our needs involves defining a struct with a bit-field
that has negative width if the assertion condition fails. There seems no
reliable way to get the error message string to be output, but throwing a
compile error with a confusing message is better than missing the problem
altogether.
In the same spirit, if we don't have __builtin_types_compatible_p we can at
least insist that the variable have the same width as the type. This won't
catch errors such as "wrong pointer type", but it's far better than
nothing.
In addition to changing the macro definitions, adjust a
compile-time-constant Assert in contrib/hstore to use StaticAssertStmt,
so we can get some buildfarm coverage on whether that macro behaves sanely
or not. There's surely more places that could be converted, but this is
the first one I came across.
Currently, the macros only work with fairly recent gcc versions, but there
is room to expand them to other compilers that have comparable features.
Heavily revised and autoconfiscated version of a patch by Andres Freund.
The tar output module did some very ugly and ultimately incorrect hacking
on COPY commands to try to get them to work in the context of restoring a
deconstructed tar archive. In particular, it would fail altogether for
table names containing any upper-case characters, since it smashed the
command string to lower-case before modifying it (and, just to add insult
to injury, did that in a way that would fail in multibyte encodings).
I don't see any particular value in being flexible about the case of the
command keywords, since the string will just have been created by
dumpTableData, so let's get rid of the whole case-folding thing.
Also, it doesn't seem to meet the POLA for the script to restore data only
in COPY mode, so add \i commands to make it have comparable behavior in
--inserts mode.
Noted while looking at the tar-output code in connection with Brian
Weaver's patch.
The original only expected file failed to consider machines without
non-default collation support. Per buildfarm.
Also, move the test to another parallel group; the one it was originally
put in is already full according to comments in the schedule file. Per
note from Tom Lane.
Both programs got the "magic" string wrong, causing standard-conforming tar
implementations to believe the output was just legacy tar format without
any POSIX extensions. This doesn't actually matter that much, especially
since pg_dump failed to fill the POSIX fields anyway, but still there is
little point in emitting tar format if we can't be compliant with the
standard. In addition, pg_dump failed to write the EOF marker correctly
(there should be 2 blocks of zeroes not just one), pg_basebackup put the
numeric group ID in the wrong place, and both programs had a pretty
brain-dead idea of how to compute the checksum. Fix all that and improve
the comments a bit.
pg_restore is modified to accept either the correct POSIX-compliant "magic"
string or the previous value. This part of the change will need to be
back-patched to avoid an unnecessary compatibility break when a previous
version tries to read tar-format output from 9.3 pg_dump.
Brian Weaver and Tom Lane
This fixes another error in commit 9e8da0f757.
I neglected to make the mark/restore functionality save and restore the
current set of array key values, which led to strange behavior if an
IndexScan with ScalarArrayOpExpr quals was used as the inner side of a
mergejoin. Per bug #7570 from Melese Tesfaye.
This worked fine for superusers, but not for ordinary users trying to
cancel their own processes. Tweak the order the checks are done in so
that we correctly return SIGNAL_BACKEND_ERROR (which current callers
know to ignore without erroring out) so that an ordinary user can loop
through a resultset without fearing that a process might exit in the
middle of said looping -- causing the remaining processes to go
unsignalled.
Incidentally, the last in-core caller of IsBackendPid() is now gone.
However, the function is exported and must remain in place, because
there are plenty of callers in external modules.
Author: Josh Kupershmidt
Reviewed by Noah Misch
This script is a bit slow, but still it only takes a fraction of the time
the bison run does, so the overhead doesn't seem intolerable. And we
definitely need some mechanical aid here, because people keep missing
the need to add new keywords to the appropriate keyword-list production.
While at it, I moved check_keywords.pl from src/tools into
src/backend/parser where it's actually used, and did some very minor
cleanup on the script.
There were assorted places where unreserved keywords were not treated the
same as T_WORD (that is, a random unrecognized identifier). Fix them.
It might not always be possible to allow this, but it is in all these
places, so I don't see any downside.
Per gripe from Jim Wilson. Arguably this is a bug fix, but given the lack
of other complaints and the ease of working around it (just quote the
word), I won't risk back-patching.
Once again, somebody who ought to know better forgot this. We really
need some automated cross-check on the keyword-list productions, I think.
Per report from Brian Weaver.
This allows easily splitting configuration into many files, deployed in a
directory.
Magnus Hagander, Greg Smith, Selena Deckelmann, reviewed by Noah Misch.
Produce a NOTICE when the label already exists, for consistency with other
CREATE IF NOT EXISTS commands. Also, fix the code so it produces something
more user-friendly than an index violation when the label already exists.
This not incidentally enables making a regression test that the previous
patch didn't make for fear of exposing an unpredictable OID in the results.
Also some wordsmithing on the documentation.
If the label is already in the enum the statement becomes a no-op.
This will reduce the pain that comes from our not allowing this
operation inside a transaction block.
Andrew Dunstan, reviewed by Tom Lane and Magnus Hagander.
The previous scheme had bugs in some corner cases involving tables that had
been renamed since a view was made. This could result in dumped views that
failed to reload or reloaded incorrectly, as seen in bug #7553 from Lloyd
Albin, as well as in some pgsql-hackers discussion back in January. Also,
its behavior for printing EXPLAIN plans was sometimes confusing because of
willingness to use the same alias for multiple RTEs (it was Ashutosh
Bapat's complaint about that aspect that started the January thread).
To fix, ensure that each RTE in the query has a unique unqualified alias,
by modifying the alias if necessary (we add "_" and digits as needed to
create a non-conflicting name). Then we can just print its variables with
that alias, avoiding the confusing and bug-prone scheme of sometimes
schema-qualifying variable names. In EXPLAIN, it proves to be expedient to
take the further step of only assigning such aliases to RTEs that are
actually referenced in the query, since the planner has a habit of
generating extra RTEs with the same alias in situations such as
inheritance-tree expansion.
Although this fixes a bug of very long standing, I'm hesitant to back-patch
such a noticeable behavioral change. My experiments while creating a
regression test convinced me that actually incorrect output (as opposed to
confusing output) occurs only in very narrow cases, which is backed up by
the lack of previous complaints from the field. So we may be better off
living with it in released branches; and in any case it'd be smart to let
this ripen awhile in HEAD before we consider back-patching it.
Similar changes were done to pg_hba.conf earlier already, this commit makes
pg_ident.conf to behave the same as pg_hba.conf.
This has two user-visible effects. First, if pg_ident.conf contains multiple
errors, the whole file is parsed at postmaster startup time and all the
errors are immediately reported. Before this patch, the file was parsed and
the errors were reported only when someone tries to connect using an
authentication method that uses the file, and the parsing stopped on first
error. Second, if you SIGHUP to reload the config files, and the new
pg_ident.conf file contains an error, the error is logged but the old file
stays in effect.
Also, regular expressions in pg_ident.conf are now compiled only once when
the file is loaded, rather than every time the a user is authenticated. That
should speed up authentication if you have a lot of regexps in the file.
Amit Kapila
These calls were removed in commit 4240e429d0
as part of a general refactoring and improvement of DDL locking. However,
there's a problem not solved by the rewrite, which is that GRANT/REVOKE
update pg_class.relacl without taking any particular lock on the target
table as such. If another backend fails to do AcceptInvalidationMessages,
it won't notice a recently-committed change in ACLs. Bug #7557 from Piotr
Czachur demonstrates that there's at least one code path in 9.2.0 in which
a command fails to do any AcceptInvalidationMessages calls at all, if the
current transaction already holds all the locks it will need.
Since we're hard up against the release deadline for 9.2.1, fix this by
putting back the AcceptInvalidationMessages calls in heap_openrv and
heap_openrv_extended, thereby restoring the historical behavior in this
area. We ought to look for a more elegant and perhaps more bulletproof
solution, but there's no time for that right now.
In commit 9e8da0f757, I improved btree
to handle ScalarArrayOpExpr quals natively, so that constructs like
"indexedcol IN (list)" could be supported by index-only scans. Using
such a qual results in multiple scans of the index, under-the-hood.
I went to some lengths to ensure that this still produces rows in index
order ... but I failed to recognize that if a higher-order index column
is lacking an equality constraint, rescans can produce out-of-order
data from that column. Tweak the planner to not expect sorted output
in that case. Per trouble report from Robert McGehee.
Currently, we are making mangled copies of plpython/{expected,sql} to
plpython/python3/{expected,sql}, and run the tests in
plpython/python3. This has the disadvantage that the regression.diffs
file, if any, ends up in plpython/python3, which is not the normal
location. If we instead make the mangled copies in
plpython/{expected,sql}/python3/, we can run the tests from the normal
directory, regression.diffs ends up the normal place, and the
pg_regress invocation also becomes a lot simpler. It's also more
obvious at run time what's going on, because the tests end up being
named "python3/something" in the test output.
Some experimentation with examples similar to bug #7539 has convinced me
that indxpath.c's original implementation of parameterized-path generation
was several bricks shy of a load. In general, if we are relying on a
particular outer rel or set of outer rels for a parameterized path, the
path should use every indexable join clause that's available from that rel
or rels. Any join clauses that get left out of the indexqual will end up
getting applied as plain filter quals (qpquals), and that's generally a
significant loser compared to having the index AM enforce them. (This is
particularly true with btree, which can skip the index scan entirely if
it can see that the given indexquals are mutually contradictory.) The
original heuristics failed to ensure this, though, and were overly
complicated anyway. Rewrite to make the code explicitly identify each
useful set of outer rels and then select all applicable join clauses for
each one. The one plan that changes in the regression tests is in fact
for the better according to the planner's cost estimates.
(Note: this is not a correctness issue but just a matter of plan quality.
I don't yet know what is going on in bug #7539, but I don't expect this
change to fix that.)
Recovery code documents clearly that a shutdown checkpoint is executed at
end of recovery - a shutdown checkpoint WAL record is written but the buffer
manager had been altered to treat end of recovery as a normal checkpoint.
This bug exacerbates the bufmgr relpersistence bug.
Bug spotted by Andres Freund, patch by me.
Looks like the correct size of DOS-ified tenk.data is 680800 not 680801.
(I got the latter from a version of unix2dos that appends a trailing ^Z,
which evidently is not git's practice.)
The idea here is to provide a more easily diagnosable failure diff when
the problem is that tenk.data has been DOS-ified, as I believe to be
happening currently on buildfarm member hamerkop. Per suggestion from
Magnus Hagander.
Also, sync output/largeobject_1.source with current regression test.
Failure to do that in commit 3a0e4d36eb
turns out to be the real reason that hamerkop has been complaining.
Given what we now know about the cause of this bug, it seems like it'd
be a real good idea to include it in the plperl regression tests, so as
to catch any platform-specific cases where the code gets misoptimized.
This at least saves some palloc overhead, and should furthermore reduce
the risk of anything going wrong, eg somebody resetting the context the
current_call_data record was in.
In commit 1bc16a9460 I added a minor
optimization to drop the component variables of a GROUP BY expression from
the target list computed at the aggregation level of a query, if those Vars
weren't referenced elsewhere in the tlist. However, I overlooked that the
window-function planning code would deconstruct such expressions and thus
need to have access to their component variables. Fix it to not do that.
While at it, I removed the distinction between volatile and nonvolatile
window partition/order expressions: the code now computes all of them
at the aggregation level. This saves a relatively expensive check for
volatility, and it's unclear that the resulting plan isn't better anyway.
Per bug #7535 from Louis-David Mitterrand. Back-patch to 9.2.
I made multiple errors in commit 97532f7c29,
stemming mostly from failure to think about the available frequency data
as being element frequencies not value frequencies (so that occurrences of
different elements are not mutually exclusive). This led to sillinesses
such as estimating that "word" would match more rows than "word:*".
The choice to clamp to a minimum estimate of DEFAULT_TS_MATCH_SEL also
seems pretty ill-considered in hindsight, as it would frequently result in
an estimate much larger than the available data suggests. We do need some
sort of clamp, since a pattern not matching any of the MCELEMs probably
still needs a selectivity estimate of more than zero. I chose instead to
clamp to at least what a non-MCELEM word would be estimated as, preserving
the property that "word:*" doesn't get an estimate less than plain "word",
whether or not the word appears in MCELEM.
Per investigation of a gripe from Bill Martin, though I suspect that his
example case actually isn't even reaching the erroneous code.
Back-patch to 9.1 where this code was introduced.
validate_plperl_function() supposed that it could free an old
plperl_proc_desc struct immediately upon detecting that it was stale.
However, if a plperl function is called recursively, this could result
in deleting the struct out from under an outer invocation, leading to
misbehavior or crashes. Add a simple reference-count mechanism to
ensure that such structs are freed only when the last reference goes
away.
Per investigation of bug #7516 from Marko Tiikkaja. I am not certain
that this error explains his report, because he says he didn't have
any recursive calls --- but it's hard to see how else it could have
crashed right there. In any case, this definitely fixes some problems
in the area.
Back-patch to all active branches.
Investigation shows that some intermittent build failures in ecpg are the
result of a gmake bug that was reported quite some time ago:
http://savannah.gnu.org/bugs/?30653
Preventing parallel builds of the ecpg subdirectories seems to dodge the
bug. Per yesterday's pgsql-hackers discussion, there are some other things
in the subdirectory makefiles that seem rather unsafe for parallel builds
too, but there's little point in fixing them as long as we have to work
around a make bug.
Back-patch to 9.1; parallel builds weren't very well supported before
that anyway.
Commit 2cfb1c6f77 fixed some issues caused
by Python 3.3 choosing to iterate through dict entries in a different order
than before. But here's another one: the test cases adjusted here made two
bad entries in a dict and expected the one complained of would always be
the same.
Possibly this should be back-patched further than 9.2, but there seems
little point unless the earlier fix is too.
Create an internal function pqDropConnection that does the physical socket
close and cleans up closely-associated state. This removes a bunch of ad
hoc, not always consistent closure code. The ulterior motive is to have a
single place to wait for a spawned child backend to exit, but this seems
like good cleanup even if that never happens.
I went back and forth on whether to include "conn->status = CONNECTION_BAD"
in pqDropConnection's actions, but for the moment decided not to. Only a
minority of the call sites actually want that, and in any case it's
arguable that conn->status is slightly higher-level state, and thus not
part of this function's purview.
This fix removes an unnecessary incompatibility with the old behavior of
the unix_socket_directory parameter. Since pathnames with embedded spaces
are fairly popular on some platforms, the incompatibility could be
significant in practice. We'll still strip unquoted leading/trailing
spaces, however.
No docs update since the documentation already implied that it worked
like this.
Per bug #7514 from Murray Cumming.
When the startup process restores a WAL file from the archive, it deletes
any old file with the same name and renames the new file in its place. On
Windows, however, when a file is deleted, it still lingers as long as a
process holds a file handle open on it. With cascading replication, a
walsender process can hold the old file open, so the rename() in the startup
process would fail. To fix that, rename the old file to a temporary name, to
make the original file name available for reuse, before deleting the old
file.
Give the correct name of the GUC parameter being complained of.
Also, emit a more suitable SQLSTATE (INVALID_PARAMETER_VALUE,
not the default INTERNAL_ERROR).
Gurjeet Singh, errcode adjustment by me
Perl, for some unaccountable reason, believes it's a good idea to reset
SIGFPE handling to SIG_IGN. Which wouldn't be a good idea even if it
worked; but on some platforms (Linux at least) it doesn't work at all,
instead resulting in forced process termination if the signal occurs.
Given the lack of other complaints, it seems safe to assume that Perl
never actually provokes SIGFPE and so there is no value in the setting
anyway. Hence, reset it to our normal handler after initializing Perl.
Report, analysis and patch by Andres Freund.
The planner previously assumed that parameter Vars having the same absolute
query level, varno, and varattno could safely be assigned the same runtime
PARAM_EXEC slot, even though they might be different Vars appearing in
different subqueries. This was (probably) safe before the introduction of
CTEs, but the lazy-evalution mechanism used for CTEs means that a CTE can
be executed during execution of some other subquery, causing the lifespan
of Params at the same syntactic nesting level as the CTE to overlap with
use of the same slots inside the CTE. In 9.1 we created additional hazards
by using the same parameter-assignment technology for nestloop inner scan
parameters, but it was broken before that, as illustrated by the added
regression test.
To fix, restructure the planner's management of PlannerParamItems so that
items having different semantic lifespans are kept rigorously separated.
This will probably result in complex queries using more runtime PARAM_EXEC
slots than before, but the slots are cheap enough that this hardly matters.
Also, stop generating PlannerParamItems containing Params for subquery
outputs: all we really need to do is reserve the PARAM_EXEC slot number,
and that now only takes incrementing a counter. The planning code is
simpler and probably faster than before, as well as being more correct.
Per report from Vik Reykja.
These changes will mostly also need to be made in the back branches, but
I'm going to hold off on that until after 9.2.0 wraps.
The cascading replication code assumed that the current RecoveryTargetTLI
never changes, but that's not true with recovery_target_timeline='latest'.
The obvious upshot of that is that RecoveryTargetTLI in shared memory needs
to be protected by a lock. A less obvious consequence is that when a
cascading standby is connected, and the standby switches to a new target
timeline after scanning the archive, it will continue to stream WAL to the
cascading standby, but from a wrong file, ie. the file of the previous
timeline. For example, if the standby is currently streaming from the middle
of file 000000010000000000000005, and the timeline changes, the standby
will continue to stream from that file. However, the WAL on the new
timeline is in file 000000020000000000000005, so the standby sends garbage
from 000000010000000000000005 to the cascading standby, instead of the
correct WAL from file 000000020000000000000005.
This also fixes a related bug where a partial WAL segment is restored from
the archive and streamed to a cascading standby. The code assumed that when
a WAL segment is copied from the archive, it can immediately be fully
streamed to a cascading standby. However, if the segment is only partially
filled, ie. has the right size, but only N first bytes contain valid WAL,
that's not safe. That can happen if a partial WAL segment is manually copied
to the archive, or if a partial WAL segment is archived because a server is
started up on a new timeline within that segment. The cascading standby will
get confused if the WAL it received is not valid, and will get stuck until
it's restarted. This patch fixes that problem by not allowing WAL restored
from the archive to be streamed to a cascading standby until it's been
replayed, and thus validated.
Serializable Snapshot Isolation used for serializable transactions
depends on acquiring SIRead locks on all heap relation tuples which
are used to generate the query result, so that a later delete or
update of any of the tuples can flag a read-write conflict between
transactions. This is normally handled in heapam.c, with tuple level
locking. Since an index-only scan avoids heap access in many cases,
building the result from the index tuple, the necessary predicate
locks were not being acquired for all tuples in an index-only scan.
To prevent problems with tuple IDs which are vacuumed and re-used
while the transaction still matters, the xmin of the tuple is part of
the tag for the tuple lock. Since xmin is not available to the
index-only scan for result rows generated from the index tuples, it
is not possible to acquire a tuple-level predicate lock in such
cases, in spite of having the tid. If we went to the heap to get the
xmin value, it would no longer be an index-only scan. Rather than
prohibit index-only scans under serializable transaction isolation,
we acquire an SIRead lock on the page containing the tuple, when it
was not necessary to visit the heap for other reasons.
Backpatch to 9.2.
Kevin Grittner and Tom Lane
Each setup block is run as a single PQexec submission, and some
statements such as VACUUM cannot be combined with others in such a
block.
Backpatch to 9.2.
Kevin Grittner and Tom Lane
The same message is used in both pg_restore and pg_dump, and it's
confusing to output "restoring data for table xyz" when the user
is just doing a pg_dump.
This gets rid of a dangerous-looking use of the not-volatile XLogCtl
pointer in a couple of spinlock-protected sections, where the normal
coding rule is that you should only access shared memory through a
pointer-to-volatile. I think the risk is only hypothetical not actual,
since for there to be a bug the compiler would have to move the spinlock
acquire or release across the memcpy() call, which one sincerely hopes
it will not. Still, it looks cleaner this way.
Per comment from Daniel Farina and subsequent discussion.
Formerly it would only show them for relkinds 'r' and 'f' (plain tables
and foreign tables). However, as of 9.2, views can also have reloptions,
namely security_barrier. The relkind restriction seems pointless and
not at all future-proof, so just print reloptions whenever there are any.
In passing, make some cosmetic improvements to the code that pulls the
"tableinfo" fields out of the PGresult.
Noted and patched by Dean Rasheed, with adjustment for all relkinds by me.
We can detect whether the planner top level is going to care at all about
cheap startup cost (it will only do so if query_planner's tuple_fraction
argument is greater than zero). If it isn't, we might as well discard
paths immediately whose only advantage over others is cheap startup cost.
This turns out to get rid of quite a lot of paths in complex queries ---
I saw planner runtime reduction of more than a third on one large query.
Since add_path isn't currently passed the PlannerInfo "root", the easiest
way to tell it whether to do this was to add a bool flag to RelOptInfo.
That's a bit redundant, since all relations in a given query level will
have the same setting. But in the future it's possible that we'd refine
the control decision to work on a per-relation basis, so this seems like
a good arrangement anyway.
Per my suggestion of a few months ago.
If a PlaceHolderVar contains a pulled-up LATERAL reference, its minimum
possible evaluation level might be higher in the join tree than its
original syntactic location. That in turn affects the ph_needed level for
any contained PlaceHolderVars (that is, those PHVs had better propagate up
the join tree at least to the evaluation level of the outer PHV). We got
this mostly right, but mark_placeholder_maybe_needed() failed to account
for the effect, and in consequence could leave the inner PHVs with
ph_may_need less than what their ultimate ph_needed value will be. That's
bad because it could lead to failure to select a join order that will allow
evaluation of the inner PHV at a valid location. Fix that, and add an
Assert that checks that we don't ever set ph_needed to more than
ph_may_need.
Only warn when connecting to a newer server, since connecting to older
servers works pretty well nowadays. Also update the documentation a
little about current psql/server compatibility expectations.
This was removed in commit cd00406774,
we're not quite sure why, but there have been reports of crashes due
to AS Perl being built with it when we are not, and it certainly
seems like the right thing to do. There is still some uncertainty
as to why it sometimes fails and sometimes doesn't.
Original patch from Owais Khani, substantially reworked and
extended by Andrew Dunstan.
The LATERAL implementation is now basically complete, and I still don't
see a cost-effective way to make an exact qual scope cross-check in the
presence of LATERAL. However, I did add a PlannerInfo.hasLateralRTEs flag
along the way, so it's easy to make the check only when not hasLateralRTEs.
That seems to still be useful, and it beats having no check at all.
We previously supposed that any given platform would supply both or neither
of these functions, so that one configure test would be sufficient. It now
appears that at least on AIX this is not the case ... which is likely an
AIX bug, but nonetheless we need to cope with it. So use separate tests.
Per bug #6758; thanks to Andrew Hastie for doing the followup testing
needed to confirm what was happening.
Backpatch to 9.1, where we began using these functions.