Print the command tag if we get PGRES_COMMAND_OK, and throw an error for
other cases. Per gripe from Michael Paquier.
In passing, add an fflush(), just to be real sure the output appears
before we sleep.
A view defined as "select <something> where false" had the curious property
that the system wouldn't check whether users had the privileges necessary
to select from it. More generally, permissions checks could be skipped
for tables referenced in sub-selects or views that were proven empty by
constraint exclusion (although some quick testing suggests this seldom
happens in cases of practical interest). This happened because the planner
failed to include rangetable entries for such tables in the finished plan.
This was noticed in connection with erroneous handling of materialized
views, but actually the issue is quite unrelated to matviews. Therefore,
revert commit 200ba1667b in favor of a more
direct test for the real problem.
Back-patch to 9.2 where the bug was introduced (by commit
7741dd6590).
This patch gets rid of the concept of, and infrastructure for,
non-canonical PathKeys; we now only ever create canonical pathkey lists.
The need for non-canonical pathkeys came from the desire to have
grouping_planner initialize query_pathkeys and related pathkey lists before
calling query_planner. However, since query_planner didn't actually *do*
anything with those lists before they'd been made canonical, we can get rid
of the whole mess by just not creating the lists at all until the point
where we formerly canonicalized them.
There are several ways in which we could implement that without making
query_planner itself deal with grouping/sorting features (which are
supposed to be the province of grouping_planner). I chose to add a
callback function to query_planner's API; other alternatives would have
required adding more fields to PlannerInfo, which while not bad in itself
would create an ABI break for planner-related plugins in the 9.2 release
series. This still breaks ABI for anything that calls query_planner
directly, but it seems somewhat unlikely that there are any such plugins.
I had originally conceived of this change as merely a step on the way to
fixing bug #8049 from Teun Hoogendoorn; but it turns out that this fixes
that bug all by itself, as per the added regression test. The reason is
that now get_eclass_for_sort_expr is adding the ORDER BY expression at the
end of EquivalenceClass creation not the start, and so anything that is in
a multi-member EquivalenceClass has already been created with correct
em_nullable_relids. I am suspicious that there are related scenarios in
which we still need to teach get_eclass_for_sort_expr to compute correct
nullable_relids, but am not eager to risk destabilizing either 9.2 or 9.3
to fix bugs that are only hypothetical. So for the moment, do this and
stop here.
Back-patch to 9.2 but not to earlier branches, since they don't exhibit
this bug for lack of join-clause-movement logic that depends on
em_nullable_relids being correct. (We might have to revisit that choice
if any related bugs turn up.) In 9.2, don't change the signature of
make_pathkeys_for_sortclauses nor remove canonicalize_pathkeys, so as
not to risk more plugin breakage than we have to.
Patch b19e4250b4 attempted to
preserve existing behavior regarding statistics generation in the
case that a truncation attempt was canceled due to lock conflicts.
It failed to do this accurately in two regards: (1) autovacuum had
previously generated statistics if the truncate attempt failed to
initially get the lock rather than having started the attempt, and
(2) the VACUUM ANALYZE command had always generated statistics.
Both of these changes were unintended, and are reverted by this
patch. On review, there seems to be consensus that the previous
failure to generate statistics when the truncate was terminated
was more an unfortunate consequence of how that effort was
previously terminated than a feature we want to keep; so this
patch generates statistics even when an autovacuum truncation
attempt terminates early. Another unintended change which is kept
on the basis that it is an improvement is that when a VACUUM
command is truncating, it will the new heuristic for avoiding
blocking other processes, rather than keeping an
AccessExclusiveLock on the table for however long the truncation
takes.
Per multiple reports, with some renaming per patch by Jeff Janes.
Backpatch to 9.0, where problem was created.
Previously, libpq and the backend had opposite ideas about whether
it was necessary for the client to send a CopyDone message after
receiving an ErrorResponse, making it impossible to cleanly exit
COPY BOTH mode. Fix libpq so that works correctly, adopting the
backend's notion that an ErrorResponse kills the copy in both
directions.
Adjust receivelog.c to avoid a degradation in the quality of the
resulting error messages. libpqwalreceiver.c is already doing
the right thing, so no adjustment needed there.
Add an explicit statement to the documentation explaining how
this part of the protocol is supposed to work, in the hopes of
avoiding future confusion in this area.
Since the consequences of all this confusion are very limited,
especially in the back-branches where no client ever attempts
to exit COPY BOTH mode without closing the connection entirely,
no back-patch.
Isolate checksum calculation to its own module, so that bufpage
knows little if anything about the details of the calculation.
This implementation is a modified FNV-1a hash checksum, details
of which are given in the new checksum.c header comments.
Basic implementation only, so we fix the output value.
Later related commits will add version numbers to pg_control,
compiler optimization flags and memory barriers.
Ants Aasma, reviewed by Jeff Davis and Simon Riggs
Choose a saner ordering of parameters (adding a new input param after
the output params seemed a bit random), update the function's header
comment to match reality (cmon folks, is this really that hard?),
get rid of useless and sloppily-defined distinction between
PROCESS_UTILITY_SUBCOMMAND and PROCESS_UTILITY_GENERATED.
We mustn't run any of the event-trigger support code when handling
utility statements like START TRANSACTION or ABORT, because that code
may need to refresh event-trigger cache data, which requires being
inside a valid transaction. (This mistake explains the consistent
build failures exhibited by the CLOBBER_CACHE_ALWAYS buildfarm members,
as well as some irreproducible failures on other members.)
The least messy fix seems to be to break standard_ProcessUtility into two
functions, one that handles all the statements not supported by event
triggers, and one that contains the event-trigger support code and handles
the statements that are supported by event triggers.
This change also fixes several inconsistencies, such as four cases where
support had been installed for "ddl_event_start" but not "ddl_event_end"
triggers, plus the fact that InvokeDDLCommandEventTriggersIfSupported()
paid no mind to isCompleteQuery.
Dimitri Fontaine and Tom Lane
Move checking for unscannable matviews into ExecOpenScanRelation, which is
a better place for it first because the open relation is already available
(saving a relcache lookup cycle), and second because this eliminates the
problem of telling the difference between rangetable entries that will or
will not be scanned by the query. In particular we can get rid of the
not-terribly-well-thought-out-or-implemented isResultRel field that the
initial matviews patch added to RangeTblEntry.
Also get rid of entirely unnecessary scannability check in the rewriter,
and a bogus decision about whether RefreshMatViewStmt requires a parse-time
snapshot.
catversion bump due to removal of a RangeTblEntry field, which changes
stored rules.
ORDER BY expressions were being treated the same as regular aggregate
arguments for purposes of collation determination, but really they should
not affect the aggregate's collation at all; only collations of the
aggregate's regular arguments should affect it.
In many cases this mistake would lead to incorrectly throwing a "collation
conflict" error; but in some cases the corrected code will silently assign
a different collation to the aggregate than before, for example
agg(foo ORDER BY bar COLLATE "x")
which will now use foo's collation rather than "x" for the aggregate.
Given this risk and the lack of field complaints about the issue, it
doesn't seem prudent to back-patch.
In passing, rearrange code in assign_collations_walker so that we don't
need multiple copies of the standard logic for computing collation of a
node with children. (Previously, CaseExpr duplicated the standard logic,
and we would have needed a third copy for Aggref without this change.)
Andrew Gierth and David Fetter
There's probably no real bug here at present, so not backpatching.
But it seems good to make these bits consistent with the rest of
libpq, so as to avoid future surprises.
Patch by me. Review by Tom Lane.
There was a high probability of two or more concurrent C.I.C. commands
deadlocking just before completion, because each would wait for the others
to release their reference snapshots. Fix by releasing the snapshot
before waiting for other snapshots to go away.
Per report from Paul Hinze. Back-patch to all active branches.
On non-Windows systems, sys/time.h was pulled in by portability/instr_time.h,
which pulled in time.h. We certainly should include time.h directly, since
we're using time(2), but the indirect include masked the problem on most
platforms.
Andres Freund
This was due to incomplete implementation of rowcount reporting
for RMV, which was due to initial waffling on whether it should
be provided. It seems unlikely to be a useful or universally
available number as more sophisticated techniques for maintaining
matviews are added, so remove the partial support rather than
completing it.
Per report of Jeevan Chalke, but with a different fix
When creating or manipulating a cached plan for a transaction control
command (particularly ROLLBACK), we must not perform any catalog accesses,
since we might be in an aborted transaction. However, plancache.c busily
saved or examined the search_path for every cached plan. If we were
unlucky enough to do this at a moment where the path's expansion into
schema OIDs wasn't already cached, we'd do some catalog accesses; and with
some more bad luck such as an ill-timed signal arrival, that could lead to
crashes or Assert failures, as exhibited in bug #8095 from Nachiket Vaidya.
Fortunately, there's no real need to consider the search path for such
commands, so we can just skip the relevant steps when the subject statement
is a TransactionStmt. This is somewhat related to bug #5269, though the
failure happens during initial cached-plan creation rather than
revalidation.
This bug has been there since the plan cache was invented, so back-patch
to all supported branches.
In most cases, these were just references to the SQL standard in
general. In a few cases, a contrast was made between SQL92 and later
standards -- those have been kept unchanged.
If an FDW fails to take special measures with a CurrentOfExpr, we will
end up trying to execute it as an ordinary qual, which was being treated
as a purely internal failure condition. Provide a more user-oriented
error message for such cases.
This saves some memory from each index relcache entry. At least on a 64-bit
machine, it saves just enough to shrink a typical relcache entry's memory
usage from 2k to 1k. That's nice if you have a lot of backends and a lot of
indexes.
This changes the behavior of the start and stop actions to exit
successfully if the server was already started or stopped.
This changes the default behavior of the start action: Before, if the
server was already running, it would print a message and succeed. Now,
that situation will result in an error. When running in idempotent
mode, no message is printed and pg_ctl exits successfully.
It was considered to just make the idempotent behavior the default and
only option, but pg_upgrade needs the old behavior.
The build of .pc (pkg-config) files depends on all makefiles in use, and
in dependency tracking mode, the previous coding ended up including
/dev/null as a makefile. Apparently, on some platforms the modification
time of /dev/null changes sporadically, and so the .pc files would end
up being rebuilt every so often. Fix that by changing the makefile code
to do without using /dev/null.
Revert the matview-related changes in explain.c's API, as per recent
complaint from Robert Haas. The reason for these appears to have been
principally some ill-considered choices around having intorel_startup do
what ought to be parse-time checking, plus a poor arrangement for passing
it the view parsetree it needs to store into pg_rewrite when creating a
materialized view. Do the latter by having parse analysis stick a copy
into the IntoClause, instead of doing it at runtime. (On the whole,
I seriously question the choice to represent CREATE MATERIALIZED VIEW as a
variant of SELECT INTO/CREATE TABLE AS, because that means injecting even
more complexity into what was already a horrid legacy kluge. However,
I didn't go so far as to rethink that choice ... yet.)
I also moved several error checks into matview parse analysis, and
made the check for external Params in a matview more accurate.
In passing, clean things up a bit more around interpretOidsOption(),
and fix things so that we can use that to force no-oids for views,
sequences, etc, thereby eliminating the need to cons up "oids = false"
options when creating them.
catversion bump due to change in IntoClause. (I wonder though if we
really need readfuncs/outfuncs support for IntoClause anymore.)
Latch activity was not being detected by non-database-connected workers; the
SIGUSR1 signal handler which is normally in charge of that was set to SIG_IGN.
Create a simple handler to call latch_sigusr1_handler instead.
Robert Haas (bug report and suggested fix)
Add a SignalUnconnectedWorkers() call so that non-database-connected background
workers are also notified when postmaster is SIGHUPped. Previously, only
database-connected workers were.
Michael Paquier (bug report and fix)
The intent was that being populated would, long term, be just one
of the conditions which could affect whether a matview was
scannable; being populated should be necessary but not always
sufficient to scan the relation. Since only CREATE and REFRESH
currently determine the scannability, names and comments
accidentally conflated these concepts, leading to confusion.
Also add missing locking for the SQL function which allows a
test for scannability, and fix a modularity violatiion.
Per complaints from Tom Lane, although its not clear that these
will satisfy his concerns. Hopefully this will at least better
frame the discussion.
The materialized views patch adjusted ExplainOneQuery to take an
additional DestReceiver argument, but failed to add a matching
argument to the definition of ExplainOneQuery_hook. This is a
problem for users of the hook that want to call ExplainOnePlan.
Fix by adding the missing argument.
This works by extracting trigrams from the given regular expression,
in generally the same spirit as the previously-existing support for
LIKE searches, though of course the details are far more complicated.
Currently, only GIN indexes are supported. We might be able to make
it work with GiST indexes later.
The implementation includes adding API functions to backend/regex/
to provide a view of the search NFA created from a regular expression.
These functions are meant to be generic enough to be supportable in
a standalone version of the regex library, should that ever happen.
Alexander Korotkov, reviewed by Heikki Linnakangas and Tom Lane
Heikki reported comment was wrong, so fixed
code to match the comment: we only need to
take additional locking precautions when we
have a shared lock on the buffer.
We copy the buffer before inserting an XLOG_HINT to avoid WAL CRC errors
caused by concurrent hint writes to buffer while share locked. To make this work
we refactor RestoreBackupBlock() to allow an XLOG_HINT to avoid the normal
path for backup blocks, which assumes the underlying buffer is exclusive locked.
Resulting code completely changes layout of XLOG_HINT WAL records, but
this isn't even beta code, so this is a low impact change.
In passing, avoid taking WALInsertLock for full page writes on checksummed
hints, remove related cruft from XLogInsert() and improve xlog_desc record for
XLOG_HINT.
Andres Freund
Bug report by Fujii Masao, testing by Jeff Janes and Jaime Casanova,
review by Jeff Davis and Simon Riggs. Applied with changes from review
and some comment editing.
In CLUSTER, VACUUM FULL and ALTER TABLE SET TABLESPACE
I erroneously set checksum before log_newpage, which
sets the LSN and invalidates the checksum. So set
checksum immediately *after* log_newpage.
Bug report Fujii Masao, Fix and patch by Jeff Davis
Counting newlines shows that quite a few recent patches have neglected
to update the output-lines count given to PageOutput(). Fortunately
it's not terribly critical that this be exact, since we long since
exceeded the height of most people's terminal windows. Still, maybe
we ought to think of a way to not have to maintain this manually anymore.
The old formula didn't take into account that each WAL sender process needs
a spinlock. We had also already exceeded the fixed number of spinlocks
reserved for misc purposes (10). Bump that to 30.
Backpatch to 9.0, where WAL senders were introduced. If I counted correctly,
9.0 had exactly 10 predefined spinlocks, and 9.1 exceeded that, but bump the
limit in 9.0 too because 10 is uncomfortably close to the edge.
The point of turning off track_activities is to avoid this reporting
overhead, but a thinko in commit 4f42b546fd
caused pgstat_report_activity() to perform half of its updates anyway.
Fix that, and also make sure that we clear all the now-disabled fields
when transitioning to the non-reporting state.
Notice and complain about PQcancel() failures. Also, don't dump core if
an error PGresult doesn't contain severity and message subfields, as it
might not if it was generated by libpq itself. (We have a longstanding
TODO item to improve that, but in the meantime isolationtester had better
cope.)
I tripped across the latter item while investigating a trouble report on
buildfarm member spoonbill. As for the former, there's no evidence that
PQcancel failure is actually involved in spoonbill's problem, but it still
seems like a bad idea to ignore an error return code.
An oversight in commit e710b65c1c allowed
database names beginning with "-" to be treated as though they were secure
command-line switches; and this switch processing occurs before client
authentication, so that even an unprivileged remote attacker could exploit
the bug, needing only connectivity to the postmaster's port. Assorted
exploits for this are possible, some requiring a valid database login,
some not. The worst known problem is that the "-r" switch can be invoked
to redirect the process's stderr output, so that subsequent error messages
will be appended to any file the server can write. This can for example be
used to corrupt the server's configuration files, so that it will fail when
next restarted. Complete destruction of database tables is also possible.
Fix by keeping the database name extracted from a startup packet fully
separate from command-line switches, as had already been done with the
user name field.
The Postgres project thanks Mitsumasa Kondo for discovering this bug,
Kyotaro Horiguchi for drafting the fix, and Noah Misch for recognizing
the full extent of the danger.
Security: CVE-2013-1899
The pg_start_backup() and pg_stop_backup() functions checked the privileges
of the initially-authenticated user rather than the current user, which is
wrong. For example, a user-defined index function could successfully call
these functions when executed by ANALYZE within autovacuum. This could
allow an attacker with valid but low-privilege database access to interfere
with creation of routine backups. Reported and fixed by Noah Misch.
Security: CVE-2013-1901
This reverts commit 3780fc679c.
HP-UX didn't like it. There would probably be a way to fix that, but
since the net effect of all of this is zero because ecpg ends up using
libpq anyway, it's not worth bothering further.
In commit 0f61d4dd1b, I added code to copy up
column width estimates for each column of a subquery. That code supposed
that the subquery couldn't have any output columns that didn't correspond
to known columns of the current query level --- which is true when a query
is parsed from scratch, but the assumption fails when planning a view that
depends on another view that's been redefined (adding output columns) since
the upper view was made. This results in an assertion failure or even a
crash, as per bug #8025 from lindebg. Remove the Assert and instead skip
the column if its resno is out of the expected range.
This will hopefully be easier to use than pg_config for users who are
already used to the pkg-config interface. It also works better for
multi-arch installations.
reviewed by Tom Lane
It doesn't actually use libpq. But we need to keep libpq in the
CPPFLAGS for building, because compatlib uses ecpglib.h which uses
libpq-fe.h, but we don't need to refer to libpq for linking.
reviewed by Tom Lane
The modern incarnation of md.c is by no means specific to magnetic disk
technology, but every so often we hear from someone who's misled by the
label. Try to clarify that it will work for anything that supports
standard filesystem operations. Per suggestion from Andrew Dunstan.
In some parallel make situations, the install-headers target could be
called before the installation directories are created by installdirs,
causing the installation to fail. Fix that by making install-headers
depend on installdirs.
The JSON parser is converted into a recursive descent parser, and
exposed for use by other modules such as extensions. The API provides
hooks for all the significant parser event such as the beginning and end
of objects and arrays, and providing functions to handle these hooks
allows for fairly simple construction of a wide variety of JSON
processing functions. A set of new basic processing functions and
operators is also added, which use this API, including operations to
extract array elements, object fields, get the length of arrays and the
set of keys of a field, deconstruct an object into a set of key/value
pairs, and create records from JSON objects and arrays of objects.
Catalog version bumped.
Andrew Dunstan, with some documentation assistance from Merlin Moncure.
9.2 uses a kluge representation of "indislive"; we have to account for
that when examining pg_index. Simplest solution is to check indisready
for 9.0 and 9.1 as well; that's harmless though unnecessary, so it's
not worth making a version distinction for.
Fixes oversight in commit 683abc73df,
as noted by Andres Freund.
On older-model gcc, the original coding of UTILITY_BEGIN_QUERY() can
draw this error because of multiple assignments to _needCleanup.
Rather than mark that variable volatile, we can suppress the warning
by arranging to have just one unconditional assignment before PG_TRY.
This event takes place just before ddl_command_end, and is fired if and
only if at least one object has been dropped by the command. (For
instance, DROP TABLE IF EXISTS of a table that does not in fact exist
will not lead to such a trigger firing). Commands that drop multiple
objects (such as DROP SCHEMA or DROP OWNED BY) will cause a single event
to fire. Some firings might be surprising, such as
ALTER TABLE DROP COLUMN.
The trigger is fired after the drop has taken place, because that has
been deemed the safest design, to avoid exposing possibly-inconsistent
internal state (system catalogs as well as current transaction) to the
user function code. This means that careful tracking of object
identification is required during the object removal phase.
Like other currently existing events, there is support for tag
filtering.
To support the new event, add a new pg_event_trigger_dropped_objects()
set-returning function, which returns a set of rows comprising the
objects affected by the command. This is to be used within the user
function code, and is mostly modelled after the recently introduced
pg_identify_object() function.
Catalog version bumped due to the new function.
Dimitri Fontaine and Álvaro Herrera
Review by Robert Haas, Tom Lane
Previously, if the postmaster initialized OpenSSL's PRNG (which it will do
when ssl=on in postgresql.conf), the same pseudo-random state would be
inherited by each forked child process. The problem is masked to a
considerable extent if the incoming connection uses SSL encryption, but
when it does not, identical pseudo-random state is made available to
functions like contrib/pgcrypto. The process's PID does get mixed into any
requested random output, but on most systems that still only results in 32K
or so distinct random sequences available across all Postgres sessions.
This might allow an attacker who has database access to guess the results
of "secure" operations happening in another session.
To fix, forcibly reset the PRNG after fork(). Each child process that has
need for random numbers from OpenSSL's generator will thereby be forced to
go through OpenSSL's normal initialization sequence, which should provide
much greater variability of the sequences. There are other ways we might
do this that would be slightly cheaper, but this approach seems the most
future-proof against SSL-related code changes.
This has been assigned CVE-2013-1900, but since the issue and the patch
have already been publicized on pgsql-hackers, there's no point in trying
to hide this commit.
Back-patch to all supported branches.
Marko Kreen
In a heap update, if the old and new tuple were on different pages, and the
new page no longer existed (because it was subsequently truncated away by
vacuum), heap_xlog_update forgot to release the pin on the old buffer. This
bug was introduced by the "Fix multiple problems in WAL replay" patch,
commit 3bbf668de9 (on master branch).
With full_page_writes=off, this triggered an "incorrect local pin count"
error later in replay, if the old page was vacuumed.
This fixes bug #7969, reported by Yunong Xiao. Backpatch to 9.0, like the
commit that introduced this bug.
Move functions used only by pg_dump and pg_restore from dumputils.c to a new
file, pg_backup_utils.c. dumputils.c is linked into psql and some programs
in bin/scripts, so it seems good to keep it slim. The parallel functionality
is moved to parallel.c, as is exit_horribly, because the interesting code in
exit_horribly is parallel-related.
This refactoring gets rid of the on_exit_msg_func function pointer. It was
problematic, because a modern gcc version with -Wmissing-format-attribute
complained if it wasn't marked with PF_PRINTF_ATTRIBUTE, but the ancient gcc
version that Tom Lane's old HP-UX box has didn't accept that attribute on a
function pointer, and gave an error. We still use a similar function pointer
trick for getLocalPQBuffer() function, to use a thread-local version of that
in parallel mode on Windows, but that dodges the problem because it doesn't
take printf-like arguments.
Dumping invalid indexes can cause problems at restore time, for example
if the reason the index creation failed was because it tried to enforce
a uniqueness condition not satisfied by the table's data. Also, if the
index creation is in fact still in progress, it seems reasonable to
consider it to be an uncommitted DDL change, which pg_dump wouldn't be
expected to dump anyway.
Back-patch to all active versions, and teach them to ignore invalid
indexes in servers back to 8.2, where the concept was introduced.
Michael Paquier
For getting the server's version in numeric form, use PQserverVersion().
It does the exact same parsing as dumputils.c's parse_version(), and has
been around in libpq for a long time. For the client's version, just use
the PG_VERSION_NUM constant.
If you have clusters of different versions pointing to the same tablespace
location, we would incorrectly include all the data belonging to the other
versions, too.
Fixes bug #7986, reported by Sergey Burladyan.
A new 'starttli' field was added to the response of BASE_BACKUP command.
Make pg_basebackup tolerate the case that it's missing, so that it still
works with older servers.
Add an explicit check for the server version, so that you get a nicer error
message if you try to use it with a pre-9.1 server.
The streaming protocol message format changed in 9.3, so -X stream still won't
work with pre-9.3 servers. I added a version check to ReceiveXLogStream()
earlier, but write that slightly differently, so that in 9.4, it will still
work with a 9.3 server. (In 9.4, the error message needs to be adjusted to
"9.3 or above", though). Also, if the version check fails, don't retry.
New infrastructure is added which creates a set number of workers
(threads on Windows, forked processes on Unix). Jobs are then
handed out to these workers by the master process as needed.
pg_restore is adjusted to use this new infrastructure in place of the
old setup which created a new worker for each step on the fly. Parallel
dumps acquire a snapshot clone in order to stay consistent, if
available.
The parallel option is selected by the -j / --jobs command line
parameter of pg_dump.
Joachim Wieland, lightly editorialized by Andrew Dunstan.
Most (all?) of Russia has moved to what's effectively year-round daylight
savings time, so that the "standard" zone names now mean an hour later
than they used to. Update that, notably changing MSK as per recent
complaint from Sergey Konoplev, but also CHOT, GET, IRKT, KGT, KRAT,
MAGT, NOVT, OMST, VLAT, YAKT, YEKT. The corresponding DST abbreviations
are presumably now obsolete, but I left them in place with their old
definitions, just to reduce any possible breakage from this change.
Also add VOLT (Europe/Volgograd), which for some reason we never had
before, as well as MIST (Antarctica/Macquarie), and fix obsolete
definitions of MAWT, TKT, and WST.
Add an option to zic.c to dump out all non-obsolete timezone abbreviations
defined in the Olson database. Comparing this list to its previous state
will clue us in when something happens that we may need to account for in
the tznames/ time zone abbreviation lists. The README file's previous
exhortation to "just grep for differences" was completely useless advice,
in my now-considerable experience; but maybe this will be a bit more
useful. As a starting point I built the same list from the tzdata files
as they existed in 2006, which is committed here as known_abbrevs.txt.
Comparison indeed turned up quite a few changes we had neglected to account
for, which I will commit separately.
This appears to cause some intermittent file system problems
on Windows 8. Instead, set up the old data directory in its
intended final location to start with.
Checksums are set immediately prior to flush out of shared buffers
and checked when pages are read in again. Hint bit setting will
require full page write when block is dirtied, which causes various
infrastructure changes. Extensive comments, docs and README.
WARNING message thrown if checksum fails on non-all zeroes page;
ERROR thrown but can be disabled with ignore_checksum_failure = on.
Feature enabled by an initdb option, since transition from option off
to option on is long and complex and has not yet been implemented.
Default is not to use checksums.
Checksum used is WAL CRC-32 truncated to 16-bits.
Simon Riggs, Jeff Davis, Greg Smith
Wide input and assistance from many community members. Thank you.
Prior to 9.3 the commit_delay affected only the current user,
whereas now only the group leader waits while holding the
WALWriteLock. Deliberate or accidental settings to a poor
value could seriously degrade performance for all users.
Privileges may be delegated by SECURITY DEFINER functions
for anyone that needs per-user settings in real situations.
Request for change from Peter Geoghegan
I wasn't going to ship this without having at least some example of how
to do that. This version isn't terribly bright; in particular it won't
consider any combinations of multiple join clauses. Given the cost of
executing a remote EXPLAIN, I'm not sure we want to be very aggressive
about doing that, anyway.
In support of this, refactor generate_implied_equalities_for_indexcol
so that it can be used to extract equivalence clauses that aren't
necessarily tied to an index.
The statistics-based cost estimation patch for range types broke that, by
incorrectly assuming that the left operand of all range oeprators is a
range. That lead to a "type x is not a range type" error. Because it took so
long for anyone to notice, add a regression test for that case.
We still don't do proper statistics-based cost estimation for that, so you
just get a default constant estimate. We should look into implementing that,
but this patch at least fixes the regression.
Spotted by Tom Lane, when testing query from Josh Berkus.
Introduce pg_identify_object(oid,oid,int4), which is similar in spirit
to pg_describe_object but instead produces a row of machine-readable
information to uniquely identify the given object, without resorting to
OIDs or other internal representation. This is intended to be used in
the event trigger implementation, to report objects being operated on;
but it has usefulness of its own.
Catalog version bumped because of the new function.
The buildfarm members using -DCLOBBER_CACHE_ALWAYS still don't like this
test. Some experimentation shows that on my machine, isolationtester's
query to check for "waiting" state takes 2 to 2.5 seconds to bind+execute
under -DCLOBBER_CACHE_ALWAYS. Set the timeouts to 5 seconds to leave some
headroom for possibly-slower buildfarm critters.
Really we ought to fix the "waiting" query, which is not only horridly
slow but outright wrong in detail; and then maybe we can back off these
timeouts. But right now I'm just trying to get the buildfarm green again.
Per report from Hadi Moshayedi of matview regression test failure
with optimization of aggregates. A few ORDER BY clauses improve
code coverage for matviews while solving that problem.
Remove use of PageSetTLI() from all page manipulation functions
and adjust README to indicate change in the way we make changes
to pages. Repurpose those bytes into the pd_checksum field and
explain how that works in comments about page header.
Refactoring ahead of actual feature patch which would make use
of the checksum field, arriving later.
Jeff Davis, with comments and doc changes by Simon Riggs
Direction suggested by Robert Haas; many others providing
review comments.
Buildfarm member friarbird doesn't like this test as-committed, evidently
because it's so slow that the test framework doesn't reliably notice that
the backend is waiting before the timeout goes off. (This is not totally
surprising, since friarbird builds with -DCLOBBER_CACHE_ALWAYS.) Increase
the timeout delay from 1 second to 2 in hopes of resolving that problem.
Rather than doing a fairly-expensive setitimer() call to prevent interrupts
from happening, let's just invent a simple boolean flag that the signal
handler is required to check. This is not only faster but considerably
more robust than before, since the previous code effectively assumed that
only ITIMER_REAL events would ever fire the SIGALRM handler, which is
obviously something that can be broken easily by third-party code.
Zoltán Böszörményi and Tom Lane
We need this in non-ENABLE_THREAD_SAFETY builds, and also to satisfy
the exports.txt entry; while it might be a good idea to remove the
latter, I'm hesitant to do so except in the context of an intentional
ABI break. At least we don't have a separately maintained source file
for it anymore.
I had thought we weren't using this version of pqsignal() at all on
Windows, but that's wrong --- initdb is using it (and coping with the
POSIX-ish semantics of bare signal() :-(). So allow the file to be
built in WIN32+FRONTEND case, and add it to the MSVC build logic.
We had two copies of this function in the backend and libpq, which was
already pretty bogus, but it turns out that we need it in some other
programs that don't use libpq (such as pg_test_fsync). So put it where
it probably should have been all along. The signal-mask-initialization
support in src/backend/libpq/pqsignal.c stays where it is, though, since
we only need that in the backend.
This GUC allows limiting the time spent waiting to acquire any one
heavyweight lock.
In support of this, improve the recently-added timeout infrastructure
to permit efficiently enabling or disabling multiple timeouts at once.
That reduces the performance hit from turning on lock_timeout, though
it's still not zero.
Zoltán Böszörményi, reviewed by Tom Lane,
Stephen Frost, and Hari Babu
Formerly we just Assert'ed that each refcount was zero, which was quick
and easy but failed to provide a good overview of what was wrong.
Change the code so that we'll call PrintBufferLeakWarning() for each
buffer with a nonzero refcount, and then Assert at the end of the loop.
This costs nothing in runtime and might ease diagnosis of some bugs.
Greg Smith, reviewed by Satoshi Nagayasu, further tweaked by me
Due to a misreading of the function's comment block, there was an
unneeded change to a call in rewriteDefine.c. There is, in fact
no reason to pass false for a MV; it should be true just like a
view.
Fixes issue pointed out by Tom Lane
This would have caught a bug in the initial patch, and seems like
a good thing to test going forward.
Per bug report by Erik Rijkers and fix by Tom Lane
Even though this patch had no user-visible difference, better keep the code
in psqlscan.l sync with the backend lexer. And of course it's nice to shrink
the psql binary, too. Ecpg's version of the lexer doesn't have the error
rule, it doesn't try to avoid backing up, so it doesn't need to be modified.
As reminded by Tom Lane
The planner sometimes inserts Result nodes to perform column projections
(ie, arbitrary scalar calculations) above plan nodes that lack projection
logic of their own. However, we did that even if the lower plan node was
in fact producing the required column set already; which is a pretty common
case given the popularity of "SELECT * FROM ...". Measurements show that
the useless plan node adds non-negligible overhead, especially when there
are many columns in the result. So add a check to avoid inserting a Result
node unless there's something useful for it to do.
There are a couple of remaining places where unnecessary Result nodes
could get inserted, but they are (a) much less performance-critical,
and (b) coded in such a way that it's hard to avoid inserting a Result,
because the desired tlist is changed on-the-fly in subsequent logic.
We'll leave those alone for now.
Kyotaro Horiguchi; reviewed and further hacked on by Amit Kapila and
Tom Lane.
The error rule used to avoid backtracking with the U&'...' UESCAPE 'x'
syntax bloated the flex tables, so refactor that. This patch makes the error
rule shorter, by introducing a new exclusive flex state that's entered after
parsing U&'...'. This shrinks the postgres binary by about 220kB.
The estimates are based on the existing lower bound histogram, and a new
histogram of range lengths.
Bump catversion, because the range length histogram now needs to be present
in statistic slot kind 6, or you get an error on @> and <@ queries. (A
re-ANALYZE would be enough to fix that, though)
Alexander Korotkov, with some refactoring by me.
There's still some discussion about exactly how postgres_fdw ought to
handle this case, but there seems no debate that we want to allow defaults
to be used for inserts into foreign tables. So remove the core-code
restrictions that prevented it.
While at it, get rid of the special grammar productions for CREATE FOREIGN
TABLE, and instead add explicit FEATURE_NOT_SUPPORTED error checks for the
disallowed cases. This makes the grammar a shade smaller, and more
importantly results in much more intelligible error messages for
unsupported cases. It's also one less thing to fix if we ever start
supporting constraints on foreign tables.
"break" instead of "continue" suppressed view expansion for views appearing
later in the range table. Per report from Erikjan Rijkers.
While at it, improve the associated comment a bit.
This adds the following:
json_agg(anyrecord) -> json
to_json(any) -> json
hstore_to_json(hstore) -> json (also used as a cast)
hstore_to_json_loose(hstore) -> json
The last provides heuristic treatment of numbers and booleans.
Also, in json generation, if any non-builtin type has a cast to json,
that function is used instead of the type's output function.
Andrew Dunstan, reviewed by Steve Singer.
Catalog version bumped.
This patch adds the core-system infrastructure needed to support updates
on foreign tables, and extends contrib/postgres_fdw to allow updates
against remote Postgres servers. There's still a great deal of room for
improvement in optimization of remote updates, but at least there's basic
functionality there now.
KaiGai Kohei, reviewed by Alexander Korotkov and Laurenz Albe, and rather
heavily revised by Tom Lane.
Instead of just reporting which user failed to log in, log both the
line number in the active pg_hba.conf file (which may not match reality
in case the file has been edited and not reloaded) and the contents of
the matching line (which will always be correct), to make it easier
to debug incorrect pg_hba.conf files.
The message to the client remains unchanged and does not include this
information, to prevent leaking security sensitive information.
Reviewed by Tom Lane and Dean Rasheed
The libpgcommon patch made that unnecessary, palloc and friends are now
available in frontend programs too, mapped to plain old malloc.
As pointed out by Alvaro Herrera.
The previous coding of this function could get into situations where it
would never terminate, because successive passes would re-add EMPTY arcs
that had been removed by the previous pass. Rewrite the function
completely using a new algorithm that is guaranteed to terminate, and
also seems to be usually faster than the old one. Per Tcl bugs 3604074
and 3606683.
Tom Lane and Don Porter
If we were about to enter archive recovery after crash recovery, we scanned
the archive for the latest tli history file, and set the recovery target
timeline to that. However, when we actually tried to read the history file,
we would not fetch the file from the archive, because we were not in archive
recovery yet.
To fix, make readTimeLineHistory and existsTimeLineHistory to always fetch
the file from archive if archive recovery is requested, even if we're not in
archive recovery yet.
Backpatch to 9.2. Mitsumasa KONDO
This saves several catalog lookups per reference. It's not all that
exciting right now, because we'd managed to minimize the number of places
that need to fetch the data; but the upcoming writable-foreign-tables patch
needs this info in a lot more places.
This page with no tuples is used to distinguish an MV containing a
zero-row resultset of its backing query from an MV which has not
been populated by its backing query. Unless WAL-logged, recovery
and hot standby don't work correctly with what should be an empty
but scannable materialized view.
Fixes bugs reported by Fujii Masao in testing MVs on hot standby.
This confused Cygwin's make because of the colon in the path. The
DLL isn't likely to change under us so preserving the dependency
doesn't gain us much, and it's useful to be able to do a native
Windows build with the Cygwin mingw toolset.
Noah Misch.
formatting.c used locale-dependent case folding rules in some code paths
where the result isn't supposed to be locale-dependent, for example
to_char(timestamp, 'DAY'). Since the source data is always just ASCII
in these cases, that usually didn't matter ... but it does matter in
Turkish locales, which have unusual treatment of "i" and "I". To confuse
matters even more, the misbehavior was only visible in UTF8 encoding,
because in single-byte encodings we used pg_toupper/pg_tolower which
don't have locale-specific behavior for ASCII characters. Fix by providing
intentionally ASCII-only case-folding functions and using these where
appropriate. Per bug #7913 from Adnan Dursun. Back-patch to all active
branches, since it's been like this for a long time.
I fixed this code back in commit 841b4a2d5, but didn't think carefully
enough about the behavior near zero, which meant it improperly rejected
1999-12-31 24:00:00. Per report from Magnus Hagander.
This was already the case for domains over arrays, but not for domains
over certain built-in types such as boolean. The special formatting
rules for those types should apply to domains over them as well.
Per discussion.
While this is a bug fix, it's also a behavioral change that seems likely
to trip up some applications. So no back-patch.
Pavel Stehule
A materialized view has a rule just like a view and a heap and
other physical properties like a table. The rule is only used to
populate the table, references in queries refer to the
materialized data.
This is a minimal implementation, but should still be useful in
many cases. Currently data is only populated "on demand" by the
CREATE MATERIALIZED VIEW and REFRESH MATERIALIZED VIEW statements.
It is expected that future releases will add incremental updates
with various timings, and that a more refined concept of defining
what is "fresh" data will be developed. At some point it may even
be possible to have queries use a materialized in place of
references to underlying tables, but that requires the other
above-mentioned features to be working first.
Much of the documentation work by Robert Haas.
Review by Noah Misch, Thom Brown, Robert Haas, Marko Tiikkaja
Security review by KaiGai Kohei, with a decision on how best to
implement sepgsql still pending.
Also make sure other fields of the view's pg_class entry are appropriate
for a view; it shouldn't have relfrozenxid set for instance.
This ancient omission isn't believed to have any serious consequences in
versions 8.4-9.2, so no backpatch. But let's fix it before it does bite
us in some serious way. It's just luck that the case doesn't cause
problems for autovacuum. (It did cause problems in 8.3, but that's out
of support.)
Andres Freund
fmgr_sql had been designed on the assumption that the FmgrInfo it's called
with has only query lifespan. This is demonstrably unsafe in connection
with range types, as shown in bug #7881 from Andrew Gierth. Fix things
so that we re-generate the function's cache data if the (sub)transaction
it was made in is no longer active.
Back-patch to 9.2. This might be needed further back, but it's not clear
whether the case can realistically arise without range types, so for now
I'll desist from back-patching further.
Careless use of TopMemoryContext for I/O function data meant that repeated
use of spi_prepare and spi_freeplan would leak memory at the session level,
as per report from Christian Schröder. In addition, spi_prepare
leaked a lot of transient data within the current plperl function's SPI
Proc context, which would be a problem for repeated use of spi_prepare
within a single plperl function call; and it wasn't terribly careful
about releasing permanent allocations in event of an error, either.
In passing, clean up some copy-and-pasteos in query-lookup error messages.
Alex Hunsaker and Tom Lane
In copy-out mode, the frontend should not send any messages until the
backend has finished streaming, by sending a CopyDone message. I'm not sure
if it would be legal for the client to send a new query before receiving the
CopyDone message from the backend, but trying to support that would require
bigger changes to the backend code structure.
Fixes an assertion failure reported by Fujii Masao.
This includes backend "COPY TO/FROM PROGRAM '...'" syntax, and corresponding
psql \copy syntax. Like with reading/writing files, the backend version is
superuser-only, and in the psql version, the program is run in the client.
In the passing, the psql \copy STDIN/STDOUT syntax is subtly changed: if you
the stdin/stdout is quoted, it's now interpreted as a filename. For example,
"\copy foo from 'stdin'" now reads from a file called 'stdin', not from
standard input. Before this, there was no way to specify a filename called
stdin, stdout, pstdin or pstdout.
This creates a new function in pgport, wait_result_to_str(), which can
be used to convert the exit status of a process, as returned by wait(3),
to a human-readable string.
Etsuro Fujita, reviewed by Amit Kapila.
parseqatom() failed to check for an error return (NULL result) from its
recursive call to parsebranch(), and in consequence could crash with a
null-pointer dereference after an error return. This bug has been there
since day one, but wasn't noticed before, probably because most error cases
in parsebranch() didn't actually lead to returning NULL. Add the missing
error check, and also tweak parsebranch() to exit in a less indirect
fashion after a call to parseqatom() fails.
Report by Tomasz Karlik, fix by me.
The backend grammar treats STDIN and STDOUT completely interchangeable, so
that the above accepted. Arguably that was a mistake the backend grammar,
but it's not ecpg's business to second guess that.
There's no harm in excessive quoting per se, but it makes the strings nicer
to read. The values can get quite unwieldy, when they're first quoted within
within single-quotes when included in the connection string, and then all
the single-quotes are escaped when the connection string is passed as a
shell argument.
Like with pg_basebackup and pg_receivexlog, it's a bit strange to call the
option -d/--dbname, when in fact you cannot pass a database name in it.
Original patch by Amit Kapila, heavily modified by me.
You could already pass a database name just by passing it as the last
option, without -d. This is an alias for that, like the -d/--dbname option
in psql and many other client applications. For consistency.
The previous commit didn't work on MSVC editions earlier than
Visual Studio 2011, apparently. This works by copying files into the
contrib directory, and making provision to clean them up, which should
work on all editions.
Without this, there's no way to pass arbitrary libpq connection parameters
to these applications. It's a bit strange that the option is called
-d/--dbname, when in fact you can *not* pass a database name in it, but it's
consistent with other client applications where a connection string is also
passed using -d.
Original patch by Amit Kapila, heavily modified by me.
This program relies on rm_desc backend routines and the xlogreader
infrastructure to emit human-readable rendering of WAL records.
Author: Andres Freund, with many reworks by Álvaro
Reviewed (in a much earlier version) by Peter Eisentraut
We must still initialize minRecoveryPoint if we start straight with archive
recovery, e.g when recovering from a normal base backup taken with
pg_start/stop_backup. Otherwise we never consider the system consistent.
If you create a base backup using an atomic filesystem snapshot, and try to
perform PITR starting from that base backup, or if you just kill a master
server and create recovery.conf to put it into standby mode, we don't know
how far we need to recover before reaching consistency. Normally in crash
recovery, we replay all the WAL present in pg_xlog, and assume that we're
consistent after that. And normally in archive recovery, minRecoveryPoint,
backupEndRequired, or backupEndPoint is set in the control file, indicating
how far we need to replay to reach consistency. But if the server was
previously up and running normally, and you kill -9 it or take an atomic
filesystem snapshot, none of those fields are set in the control file.
The solution is to perform crash recovery first, replaying all the WAL in
pg_xlog. After that's done, we assume that the system is consistent like in
normal crash recovery, and switch to archive recovery mode after that.
Per report from Kyotaro HORIGUCHI. In his scenario, recovery.conf was
created after "pg_ctl stop -m i". I'm not sure we need to support that exact
scenario, but we should support backing up using a filesystem snapshot,
which looks identical.
This issue goes back to at least 9.0, where hot standby was introduced and
we started to track when consistency is reached. In 9.1 and 9.2, we would
open up for hot standby too early, and queries could briefly see an
inconsistent state. But 9.2 made it more visible, as we started to PANIC if
we see a reference to a non-existing page during recovery, if we've already
reached consistency. This is a fairly big patch, so back-patch to 9.2 only,
where the issue is more visible. We can consider back-patching further after
this has received some more testing in 9.2 and master.
This enables non-backend code, such as pg_xlogdump, to use it easily.
The previous location, in src/backend/catalog/catalog.c, made that
essentially impossible because that file depends on many backend-only
facilities; so this needs to live separately.
There's still a lot of room for improvement, but it basically works,
and we need this to be present before we can do anything much with the
writable-foreign-tables patch. So let's commit it and get on with testing.
Shigeru Hanada, reviewed by KaiGai Kohei and Tom Lane
If a database name contained a '=' character, pg_dumpall failed. The problem
was in the way pg_dumpall passes the database name to pg_dump on the
command line. If it contained a '=' character, pg_dump would interpret it
as a libpq connection string instead of a plain database name.
To fix, pass the database name to pg_dump as a connection string,
"dbname=foo", with the database name escaped if necessary.
Back-patch to all supported branches.
It needs to be defined in the backend even when assertions are not
enabled. It's cleaner to put it back, than create a separate #ifdef
section in c.h.
Per trouble report from Jeff Janes
We now write one file per database and one global file, instead of
having the whole thing in a single huge file. This reduces the I/O that
must be done when partial data is required -- which is all the time,
because each process only needs information on its own database anyway.
Also, the autovacuum launcher does not need data about tables and
functions in each database; having the global stats for all DBs is
enough.
Catalog version bumped because we have a new subdir under PGDATA.
Author: Tomas Vondra. Some rework by Álvaro
Testing by Jeff Janes
Other discussion by Heikki Linnakangas, Tom Lane.
This generalizes the existing ALTER ROLE ... SET and ALTER DATABASE
... SET functionality to allow creating settings that apply to all users
in all databases.
reviewed by Pavel Stehule
Revert my earlier fix for the bug that unarchived WAL files get deleted on
crash recovery, commit c9cc7e05c6. We create
a .done file for files streamed or restored from archive, so the WAL file
recycling logic used during normal operation works just as well during
archive recovery.
Per Fujii Masao's suggestion.
This is a forward-patch of commit 6f4b8a4f4f,
applied to 9.2 back in August. The plan was to do something else in master,
but it looks like it's not going to happen, so let's just apply the 9.2
solution to master as well.
Fujii Masao
The previous order of steps didn't literally work, because git clean
-fdx would delete the downloaded typedefs.list. Also, pgindent needs to
be called with a path when one is in at the top of the build tree.
Currently it's only possible for loadable modules to get control during
post-commit cleanup of a transaction. That doesn't work too well if they
want to do something that could throw an error; for example, an FDW might
need to issue a remote commit, which could well fail. To improve matters,
extend the existing APIs for XactCallback and SubXactCallback functions
to provide new pre-commit events for this purpose.
The release notes will need to mention that existing callback functions
should be checked to make sure they don't do something unwanted when one
of the new event types occurs. In the examples within our source tree,
contrib/sepgsql was fine but plpgsql had been a bit too cute.
Revert commit ab0f7b6089 (in HEAD only)
in favor of the proper solution, which is to declare enum_recv() correctly
in the system catalogs. It should be declared to take type "internal"
not "cstring".
Also improve the type_sanity regression test, which should have caught
this typo, so that it actually would. Most of the relevant checks on
the signature of type I/O functions should not have been restricted to
basetypes/pseudotypes, as they should apply to any type's I/O functions.
Since a backend adds itself to the global listener array during
Exec_ListenPreCommit, it's inappropriate for it to remove itself during
Exec_UnlistenCommit or Exec_UnlistenAllCommit --- that leads to failure
when committing a transaction that did UNLISTEN then LISTEN, since we end
up not registered though we should be. (This leads to missing later
notifications, or to Assert failures in assert-enabled builds.) Instead
deal with deregistering at the bottom of AtCommit_Notify, when we know the
final state of the listenChannels list.
Also, simplify the representation of registration status by replacing the
transient backendHasExecutedInitialListen flag with an amRegisteredListener
flag.
Per report from Greg Sabino Mullane. Back-patch to 9.0, where the problem
was introduced during the LISTEN/NOTIFY rewrite.
There's a high chance that a page becomes all-visible when the second phase
of vacuum removes all the dead tuples on it, so it makes sense to check for
that. Otherwise the visibility map won't get updated until the next vacuum.
Pavan Deolasee, reviewed by Jeff Janes.
libpgcommon is a new static library to allow sharing code among the
various frontend programs and backend; this lets us eliminate duplicate
implementations of common routines. We avoid libpgport, because that's
intended as a place for porting issues; per discussion, it seems better
to keep them separate.
The first use case, and the only implemented by this patch, is pg_malloc
and friends, which many frontend programs were already using.
At the same time, we can use this to provide palloc emulation functions
for the frontend; this way, some palloc-using files in the backend can
also be used by the frontend cleanly. To do this, we change palloc() in
the backend to be a function instead of a macro on top of
MemoryContextAlloc(). This was previously believed to cause loss of
performance, but this implementation has been tweaked by Tom and Andres
so that on modern compilers it provides a slight improvement over the
previous one.
This lets us clean up some places that were already with
localized hacks.
Most of the pg_malloc/palloc changes in this patch were authored by
Andres Freund. Zoltán Böszörményi also independently provided a form of
that. libpgcommon infrastructure was authored by Álvaro.
The reason this wasn't supported before was that GiST indexes need an
increasing sequence to detect concurrent page-splits. In a regular WAL-
logged GiST index, the LSN of the page-split record is used for that
purpose, and in a temporary index, we can get away with a backend-local
counter. Neither of those methods works for an unlogged relation.
To provide such an increasing sequence of numbers, create a "fake LSN"
counter that is saved and restored across shutdowns. On recovery, unlogged
relations are blown away, so the counter doesn't need to survive that
either.
Jeevan Chalke, based on discussions with Robert Haas, Tom Lane and me.
The intention was to request a regular online checkpoint immediately after
end of recovery, when performing "fast promotion". However, because the
checkpoint was requested before other backends were allowed to write WAL,
the checkpointer process performed a restartpoint rather than a checkpoint.
Delay the RequestCheckPoint call until after recovery has truly ended, so
that you get a real checkpoint.
This isn't used for anything but a sanity check at the moment, but it could
be highly valuable for debugging purposes. It could also be used to recreate
timeline history by traversing WAL, which seems useful.
After further reflection I was unconvinced that the existing coding is
guaranteed to return valid union datums in every code path for multi-column
indexes. Fix that by forcing a gistunionsubkey() call at the end of the
recursion. Having done that, we can remove some clearly-redundant calls
elsewhere. This should be a little faster for multi-column indexes (since
the previous coding would uselessly do such a call for each column while
unwinding the recursion), as well as much harder to break.
Also, simplify the handling of cases where one side or the other of a
primary split contains only don't-care tuples. The previous coding used a
very ugly hack in removeDontCares() that essentially forced one random
tuple to be treated as non-don't-care, providing a random initial choice of
seed datum for the secondary split. It seems unlikely that that method
will give better-than-random splits. Instead, treat such a split as
degenerate and just let the next column determine the split, the same way
that we handle fully degenerate cases where the two sides produce identical
union datums.
This LOG message was put in over five years ago with the evident
expectation that we'd make all GiST opclasses support secondary split
directly. However, no such thing ever happened, and indeed the number of
opclasses supporting it decreased to zero in 9.2. The reason is that
improving on the default implementation isn't that easy --- the
opclass-specific code that did exist, before 9.2, doesn't appear to have
been any improvement over the default.
Hence, remove the message altogether. There's certainly no point in
nagging users about this in released branches, but I doubt that we'll
ever implement complete opclass-specific support anyway.
Not only is this implementation of secondary-split not better than the
default implementation in gistsplit.c, it's actually worse. The gistsplit.c
code at least looks to see if switching the left and right sides would make
a better merge with the previously-split tuples, while this doesn't.
In any case it's rather useless to support secondary split only in an edge
case. There used to be more complete support for it here (in chooseLR()),
but that was removed in commit 7f3bd86843.
It appears to me though that the chooseLR() code was really isomorphic to
the default implementation, since it was still based on choosing the cheaper
way of adding two sub-split vectors that had been chosen without regard to
the primary split initially. I think an implementation of secondary split
that could beat the default implementation would have to be pretty fully
integrated into the split algorithm, not plastered on at the end.
Back-patch to 9.2, but not further; previous branches have the chooseLR()
code which I don't feel a great need to mess with. This is mainly so we
just have two behaviors and not three among the various branches (IOW, this
patch is cleanup for commit 7f3bd86843e5aad84585a57d3f6b80db3c609916's
incomplete removal of secondary-split support).
Improve comments, rename some variables and functions, slightly simplify
a couple of APIs, in an attempt to make this code readable by people other
than its original author.
Even though this is essentially just cosmetic, back-patch to all active
branches, because otherwise it's going to make back-patching future fixes
in this file very painful.
When there are zero result rows, in expanded mode, "(No rows)" is
printed. So far, there was no way to turn this off. Now, when
tuples-only mode is turned on, nothing is printed in this case.
Given the assumption that a box's high coordinates are not less than its
low coordinates, the tests in box_ov() are overly complicated and can be
reduced to about half as much work. Since many other functions in
geo_ops.c rely on that assumption, there doesn't seem to be a good reason
not to use it here.
Per discussion of Alexander Korotkov's GiST fix, which was already using
the simplified logic (in a non-fuzzy form, but the equivalence holds just
as well for fuzzy).
While there's considerable doubt that we want fuzzy behavior in the
geometric operators at all (let alone as currently implemented), nobody is
stepping forward to redesign that stuff. In the meantime it behooves us
to make sure that index searches agree with the behavior of the underlying
operators. This patch fixes two problems in this area.
First, gist_box_same was using fuzzy equality, but it really needs to use
exact equality to prevent not-quite-identical upper index keys from being
treated as identical, which for example would prevent an existing upper
key from being extended by an amount less than epsilon. This would result
in inconsistent indexes. (The next release notes will need to recommend
that users reindex GiST indexes on boxes, polygons, circles, and points,
since all four opclasses use gist_box_same.)
Second, gist_point_consistent used exact comparisons for upper-page
comparisons in ~= searches, when it needs to use fuzzy comparisons to
ensure it finds all matches; and it used fuzzy comparisons for point <@ box
searches, when it needs to use exact comparisons because that's what the
<@ operator (rather inconsistently) does.
The added regression test cases illustrate all three misbehaviors.
Back-patch to all active branches. (8.4 did not have GiST point_ops,
but it still seems prudent to apply the gist_box_same patch to it.)
Alexander Korotkov, reviewed by Noah Misch
Without this, building in src/bin/scripts directly will fail if
libpgport wasn't built first. Other bin components are handled the same
way.
Phil Sorber
Commit af7914c662, which added the TIMING
option to EXPLAIN, had an oversight: if the TIMING option is disabled
then control in InstrStartNode() goes through an elog(DEBUG2) call, which
typically does nothing but takes a noticeable amount of time to do it.
Tweak the logic to avoid that.
In HEAD, also change the elog(DEBUG2)'s in instrument.c to elog(ERROR).
It's not very clear why they weren't like that to begin with, but this
episode shows that not complaining more vociferously about misuse is
likely to do little except allow bugs to remain hidden.
While at it, adjust some code that was making possibly-dangerous
assumptions about flag bits being in the rightmost byte of the
instrument_options word.
Problem reported by Pavel Stehule (via Tomas Vondra).
When considering a non-last column in a multi-column GiST index,
gistsplit.c tries to improve on the split chosen by the opclass-specific
pickSplit function by considering penalties for the next column. However,
there were two bugs in this code: it failed to recompute the union keys for
the leftmost index columns, even though these might well change after
reassigning tuples; and it included the old union keys in the recomputation
for the columns it did recompute, so that those keys couldn't get smaller
even if they should. The first problem could result in an invalid index
in which searches wouldn't find index entries that are in fact present;
the second would make the index less efficient to search.
Both of these errors were caused by misuse of gistMakeUnionItVec, whose
API was designed in a way that just begged such errors to be made. There
is no situation in which it's safe or useful to compute the union keys for
a subset of the index columns, and there is no caller that wants any
previous union keys to be included in the computation; so the undocumented
choice to treat the union keys as in/out rather than pure output parameters
is a waste of code as well as being dangerous.
Hence, rather than just making a minimal patch, I've changed the API of
gistMakeUnionItVec to remove the "startkey" parameter (it now always
processes all index columns) and treat the attr/isnull arrays as purely
output parameters.
In passing, also get rid of a couple of unnecessary and dangerous uses
of static variables in gistutil.c. It's remarkable that the one in
gistMakeUnionKey hasn't given us portability troubles before now, because
in addition to posing a re-entrancy hazard, it was unsafely assuming that
a static char[] array would have at least Datum alignment.
Per investigation of a trouble report from Tomas Vondra. (There are also
some bugs in contrib/btree_gist to be fixed, but that seems like material
for a separate patch.) Back-patch to all supported branches.
Normally, we suppress sending a tabstats message to the collector unless
there were some actual table stats to send. However, during backend exit
we should force out the message if there are any transaction commit/abort
counts to send, else the session's last few commit/abort counts will never
get reported at all. We had logic for this, but the short-circuit test
at the top of pgstat_report_stat() ignored the "force" flag, with the
consequence that session-ending transactions that touched no database-local
tables would not get counted. Seems to be an oversight in my commit
641912b4d1, which added the "force" flag.
That was back in 8.3, so back-patch to all supported versions.
The new rmgrlist.h header, containing all necessary data
about built-in resource managers, allows other pieces of code to
access them.
In particular, this allows a future pg_xlogdump program to extract
rm_desc function pointers, without having to keep a duplicate list of
them.
This function was misdeclared to take cstring when it should take internal.
This at least allows crashing the server, and in principle an attacker
might be able to use the function to examine the contents of server memory.
The correct fix is to adjust the system catalog contents (and fix the
regression tests that should have caught this but failed to). However,
asking users to correct the catalog contents in existing installations
is a pain, so as a band-aid fix for the back branches, install a check
in enum_recv() to make it throw error if called with a cstring argument.
We will later revert this in HEAD in favor of correcting the catalogs.
Our thanks to Sumit Soni (via Secunia SVCRP) for reporting this issue.
Security: CVE-2013-0255
This patch changes pg_get_viewdef() and allied functions so that
PRETTY_INDENT processing is always enabled. Per discussion, only the
PRETTY_PAREN processing (that is, stripping of "unnecessary" parentheses)
poses any real forward-compatibility risk, so we may as well make dump
output look as nice as we safely can.
Also, set the default wrap length to zero (i.e, wrap after each SELECT
or FROM list item), since there's no very principled argument for the
former default of 80-column wrapping, and most people seem to agree this
way looks better.
Marko Tiikkaja, reviewed by Jeevan Chalke, further hacking by Tom Lane
In the previous coding, psql's state variable saying that output should
go to a file was only reset after successful completion of a query
returning tuples. Thus for example,
regression=# select 1/0
regression-# \g somefile
ERROR: division by zero
regression=# select 1/2;
regression=#
... huh, I wonder where that output went. Even more oddly, the state
was not reset even if it's the file that's causing the failure:
regression=# select 1/2 \g /foo
/foo: Permission denied
regression=# select 1/2;
/foo: Permission denied
regression=# select 1/2;
/foo: Permission denied
This seems to me not to satisfy the principle of least surprise.
\g is certainly not documented in a way that suggests its effects are
at all persistent.
To fix, adjust the code so that the flag is reset at exit from SendQuery
no matter what happened.
Noted while reviewing the \gset patch, which had comparable issues.
Arguably this is a bug fix, but I'll refrain from back-patching for now.