In CLUSTER, VACUUM FULL and ALTER TABLE SET TABLESPACE
I erroneously set checksum before log_newpage, which
sets the LSN and invalidates the checksum. So set
checksum immediately *after* log_newpage.
Bug report Fujii Masao, Fix and patch by Jeff Davis
Counting newlines shows that quite a few recent patches have neglected
to update the output-lines count given to PageOutput(). Fortunately
it's not terribly critical that this be exact, since we long since
exceeded the height of most people's terminal windows. Still, maybe
we ought to think of a way to not have to maintain this manually anymore.
The old formula didn't take into account that each WAL sender process needs
a spinlock. We had also already exceeded the fixed number of spinlocks
reserved for misc purposes (10). Bump that to 30.
Backpatch to 9.0, where WAL senders were introduced. If I counted correctly,
9.0 had exactly 10 predefined spinlocks, and 9.1 exceeded that, but bump the
limit in 9.0 too because 10 is uncomfortably close to the edge.
The point of turning off track_activities is to avoid this reporting
overhead, but a thinko in commit 4f42b546fd
caused pgstat_report_activity() to perform half of its updates anyway.
Fix that, and also make sure that we clear all the now-disabled fields
when transitioning to the non-reporting state.
Notice and complain about PQcancel() failures. Also, don't dump core if
an error PGresult doesn't contain severity and message subfields, as it
might not if it was generated by libpq itself. (We have a longstanding
TODO item to improve that, but in the meantime isolationtester had better
cope.)
I tripped across the latter item while investigating a trouble report on
buildfarm member spoonbill. As for the former, there's no evidence that
PQcancel failure is actually involved in spoonbill's problem, but it still
seems like a bad idea to ignore an error return code.
An oversight in commit e710b65c1c allowed
database names beginning with "-" to be treated as though they were secure
command-line switches; and this switch processing occurs before client
authentication, so that even an unprivileged remote attacker could exploit
the bug, needing only connectivity to the postmaster's port. Assorted
exploits for this are possible, some requiring a valid database login,
some not. The worst known problem is that the "-r" switch can be invoked
to redirect the process's stderr output, so that subsequent error messages
will be appended to any file the server can write. This can for example be
used to corrupt the server's configuration files, so that it will fail when
next restarted. Complete destruction of database tables is also possible.
Fix by keeping the database name extracted from a startup packet fully
separate from command-line switches, as had already been done with the
user name field.
The Postgres project thanks Mitsumasa Kondo for discovering this bug,
Kyotaro Horiguchi for drafting the fix, and Noah Misch for recognizing
the full extent of the danger.
Security: CVE-2013-1899
The pg_start_backup() and pg_stop_backup() functions checked the privileges
of the initially-authenticated user rather than the current user, which is
wrong. For example, a user-defined index function could successfully call
these functions when executed by ANALYZE within autovacuum. This could
allow an attacker with valid but low-privilege database access to interfere
with creation of routine backups. Reported and fixed by Noah Misch.
Security: CVE-2013-1901
This reverts commit 3780fc679c.
HP-UX didn't like it. There would probably be a way to fix that, but
since the net effect of all of this is zero because ecpg ends up using
libpq anyway, it's not worth bothering further.
In commit 0f61d4dd1b, I added code to copy up
column width estimates for each column of a subquery. That code supposed
that the subquery couldn't have any output columns that didn't correspond
to known columns of the current query level --- which is true when a query
is parsed from scratch, but the assumption fails when planning a view that
depends on another view that's been redefined (adding output columns) since
the upper view was made. This results in an assertion failure or even a
crash, as per bug #8025 from lindebg. Remove the Assert and instead skip
the column if its resno is out of the expected range.
This will hopefully be easier to use than pg_config for users who are
already used to the pkg-config interface. It also works better for
multi-arch installations.
reviewed by Tom Lane
It doesn't actually use libpq. But we need to keep libpq in the
CPPFLAGS for building, because compatlib uses ecpglib.h which uses
libpq-fe.h, but we don't need to refer to libpq for linking.
reviewed by Tom Lane
The modern incarnation of md.c is by no means specific to magnetic disk
technology, but every so often we hear from someone who's misled by the
label. Try to clarify that it will work for anything that supports
standard filesystem operations. Per suggestion from Andrew Dunstan.
In some parallel make situations, the install-headers target could be
called before the installation directories are created by installdirs,
causing the installation to fail. Fix that by making install-headers
depend on installdirs.
The JSON parser is converted into a recursive descent parser, and
exposed for use by other modules such as extensions. The API provides
hooks for all the significant parser event such as the beginning and end
of objects and arrays, and providing functions to handle these hooks
allows for fairly simple construction of a wide variety of JSON
processing functions. A set of new basic processing functions and
operators is also added, which use this API, including operations to
extract array elements, object fields, get the length of arrays and the
set of keys of a field, deconstruct an object into a set of key/value
pairs, and create records from JSON objects and arrays of objects.
Catalog version bumped.
Andrew Dunstan, with some documentation assistance from Merlin Moncure.
9.2 uses a kluge representation of "indislive"; we have to account for
that when examining pg_index. Simplest solution is to check indisready
for 9.0 and 9.1 as well; that's harmless though unnecessary, so it's
not worth making a version distinction for.
Fixes oversight in commit 683abc73df,
as noted by Andres Freund.
On older-model gcc, the original coding of UTILITY_BEGIN_QUERY() can
draw this error because of multiple assignments to _needCleanup.
Rather than mark that variable volatile, we can suppress the warning
by arranging to have just one unconditional assignment before PG_TRY.
This event takes place just before ddl_command_end, and is fired if and
only if at least one object has been dropped by the command. (For
instance, DROP TABLE IF EXISTS of a table that does not in fact exist
will not lead to such a trigger firing). Commands that drop multiple
objects (such as DROP SCHEMA or DROP OWNED BY) will cause a single event
to fire. Some firings might be surprising, such as
ALTER TABLE DROP COLUMN.
The trigger is fired after the drop has taken place, because that has
been deemed the safest design, to avoid exposing possibly-inconsistent
internal state (system catalogs as well as current transaction) to the
user function code. This means that careful tracking of object
identification is required during the object removal phase.
Like other currently existing events, there is support for tag
filtering.
To support the new event, add a new pg_event_trigger_dropped_objects()
set-returning function, which returns a set of rows comprising the
objects affected by the command. This is to be used within the user
function code, and is mostly modelled after the recently introduced
pg_identify_object() function.
Catalog version bumped due to the new function.
Dimitri Fontaine and Álvaro Herrera
Review by Robert Haas, Tom Lane
Previously, if the postmaster initialized OpenSSL's PRNG (which it will do
when ssl=on in postgresql.conf), the same pseudo-random state would be
inherited by each forked child process. The problem is masked to a
considerable extent if the incoming connection uses SSL encryption, but
when it does not, identical pseudo-random state is made available to
functions like contrib/pgcrypto. The process's PID does get mixed into any
requested random output, but on most systems that still only results in 32K
or so distinct random sequences available across all Postgres sessions.
This might allow an attacker who has database access to guess the results
of "secure" operations happening in another session.
To fix, forcibly reset the PRNG after fork(). Each child process that has
need for random numbers from OpenSSL's generator will thereby be forced to
go through OpenSSL's normal initialization sequence, which should provide
much greater variability of the sequences. There are other ways we might
do this that would be slightly cheaper, but this approach seems the most
future-proof against SSL-related code changes.
This has been assigned CVE-2013-1900, but since the issue and the patch
have already been publicized on pgsql-hackers, there's no point in trying
to hide this commit.
Back-patch to all supported branches.
Marko Kreen
In a heap update, if the old and new tuple were on different pages, and the
new page no longer existed (because it was subsequently truncated away by
vacuum), heap_xlog_update forgot to release the pin on the old buffer. This
bug was introduced by the "Fix multiple problems in WAL replay" patch,
commit 3bbf668de9 (on master branch).
With full_page_writes=off, this triggered an "incorrect local pin count"
error later in replay, if the old page was vacuumed.
This fixes bug #7969, reported by Yunong Xiao. Backpatch to 9.0, like the
commit that introduced this bug.
Move functions used only by pg_dump and pg_restore from dumputils.c to a new
file, pg_backup_utils.c. dumputils.c is linked into psql and some programs
in bin/scripts, so it seems good to keep it slim. The parallel functionality
is moved to parallel.c, as is exit_horribly, because the interesting code in
exit_horribly is parallel-related.
This refactoring gets rid of the on_exit_msg_func function pointer. It was
problematic, because a modern gcc version with -Wmissing-format-attribute
complained if it wasn't marked with PF_PRINTF_ATTRIBUTE, but the ancient gcc
version that Tom Lane's old HP-UX box has didn't accept that attribute on a
function pointer, and gave an error. We still use a similar function pointer
trick for getLocalPQBuffer() function, to use a thread-local version of that
in parallel mode on Windows, but that dodges the problem because it doesn't
take printf-like arguments.
Dumping invalid indexes can cause problems at restore time, for example
if the reason the index creation failed was because it tried to enforce
a uniqueness condition not satisfied by the table's data. Also, if the
index creation is in fact still in progress, it seems reasonable to
consider it to be an uncommitted DDL change, which pg_dump wouldn't be
expected to dump anyway.
Back-patch to all active versions, and teach them to ignore invalid
indexes in servers back to 8.2, where the concept was introduced.
Michael Paquier
For getting the server's version in numeric form, use PQserverVersion().
It does the exact same parsing as dumputils.c's parse_version(), and has
been around in libpq for a long time. For the client's version, just use
the PG_VERSION_NUM constant.
If you have clusters of different versions pointing to the same tablespace
location, we would incorrectly include all the data belonging to the other
versions, too.
Fixes bug #7986, reported by Sergey Burladyan.
A new 'starttli' field was added to the response of BASE_BACKUP command.
Make pg_basebackup tolerate the case that it's missing, so that it still
works with older servers.
Add an explicit check for the server version, so that you get a nicer error
message if you try to use it with a pre-9.1 server.
The streaming protocol message format changed in 9.3, so -X stream still won't
work with pre-9.3 servers. I added a version check to ReceiveXLogStream()
earlier, but write that slightly differently, so that in 9.4, it will still
work with a 9.3 server. (In 9.4, the error message needs to be adjusted to
"9.3 or above", though). Also, if the version check fails, don't retry.
New infrastructure is added which creates a set number of workers
(threads on Windows, forked processes on Unix). Jobs are then
handed out to these workers by the master process as needed.
pg_restore is adjusted to use this new infrastructure in place of the
old setup which created a new worker for each step on the fly. Parallel
dumps acquire a snapshot clone in order to stay consistent, if
available.
The parallel option is selected by the -j / --jobs command line
parameter of pg_dump.
Joachim Wieland, lightly editorialized by Andrew Dunstan.
Most (all?) of Russia has moved to what's effectively year-round daylight
savings time, so that the "standard" zone names now mean an hour later
than they used to. Update that, notably changing MSK as per recent
complaint from Sergey Konoplev, but also CHOT, GET, IRKT, KGT, KRAT,
MAGT, NOVT, OMST, VLAT, YAKT, YEKT. The corresponding DST abbreviations
are presumably now obsolete, but I left them in place with their old
definitions, just to reduce any possible breakage from this change.
Also add VOLT (Europe/Volgograd), which for some reason we never had
before, as well as MIST (Antarctica/Macquarie), and fix obsolete
definitions of MAWT, TKT, and WST.
Add an option to zic.c to dump out all non-obsolete timezone abbreviations
defined in the Olson database. Comparing this list to its previous state
will clue us in when something happens that we may need to account for in
the tznames/ time zone abbreviation lists. The README file's previous
exhortation to "just grep for differences" was completely useless advice,
in my now-considerable experience; but maybe this will be a bit more
useful. As a starting point I built the same list from the tzdata files
as they existed in 2006, which is committed here as known_abbrevs.txt.
Comparison indeed turned up quite a few changes we had neglected to account
for, which I will commit separately.
This appears to cause some intermittent file system problems
on Windows 8. Instead, set up the old data directory in its
intended final location to start with.
Checksums are set immediately prior to flush out of shared buffers
and checked when pages are read in again. Hint bit setting will
require full page write when block is dirtied, which causes various
infrastructure changes. Extensive comments, docs and README.
WARNING message thrown if checksum fails on non-all zeroes page;
ERROR thrown but can be disabled with ignore_checksum_failure = on.
Feature enabled by an initdb option, since transition from option off
to option on is long and complex and has not yet been implemented.
Default is not to use checksums.
Checksum used is WAL CRC-32 truncated to 16-bits.
Simon Riggs, Jeff Davis, Greg Smith
Wide input and assistance from many community members. Thank you.
Prior to 9.3 the commit_delay affected only the current user,
whereas now only the group leader waits while holding the
WALWriteLock. Deliberate or accidental settings to a poor
value could seriously degrade performance for all users.
Privileges may be delegated by SECURITY DEFINER functions
for anyone that needs per-user settings in real situations.
Request for change from Peter Geoghegan
I wasn't going to ship this without having at least some example of how
to do that. This version isn't terribly bright; in particular it won't
consider any combinations of multiple join clauses. Given the cost of
executing a remote EXPLAIN, I'm not sure we want to be very aggressive
about doing that, anyway.
In support of this, refactor generate_implied_equalities_for_indexcol
so that it can be used to extract equivalence clauses that aren't
necessarily tied to an index.
The statistics-based cost estimation patch for range types broke that, by
incorrectly assuming that the left operand of all range oeprators is a
range. That lead to a "type x is not a range type" error. Because it took so
long for anyone to notice, add a regression test for that case.
We still don't do proper statistics-based cost estimation for that, so you
just get a default constant estimate. We should look into implementing that,
but this patch at least fixes the regression.
Spotted by Tom Lane, when testing query from Josh Berkus.
Introduce pg_identify_object(oid,oid,int4), which is similar in spirit
to pg_describe_object but instead produces a row of machine-readable
information to uniquely identify the given object, without resorting to
OIDs or other internal representation. This is intended to be used in
the event trigger implementation, to report objects being operated on;
but it has usefulness of its own.
Catalog version bumped because of the new function.
The buildfarm members using -DCLOBBER_CACHE_ALWAYS still don't like this
test. Some experimentation shows that on my machine, isolationtester's
query to check for "waiting" state takes 2 to 2.5 seconds to bind+execute
under -DCLOBBER_CACHE_ALWAYS. Set the timeouts to 5 seconds to leave some
headroom for possibly-slower buildfarm critters.
Really we ought to fix the "waiting" query, which is not only horridly
slow but outright wrong in detail; and then maybe we can back off these
timeouts. But right now I'm just trying to get the buildfarm green again.
Per report from Hadi Moshayedi of matview regression test failure
with optimization of aggregates. A few ORDER BY clauses improve
code coverage for matviews while solving that problem.
Remove use of PageSetTLI() from all page manipulation functions
and adjust README to indicate change in the way we make changes
to pages. Repurpose those bytes into the pd_checksum field and
explain how that works in comments about page header.
Refactoring ahead of actual feature patch which would make use
of the checksum field, arriving later.
Jeff Davis, with comments and doc changes by Simon Riggs
Direction suggested by Robert Haas; many others providing
review comments.
Buildfarm member friarbird doesn't like this test as-committed, evidently
because it's so slow that the test framework doesn't reliably notice that
the backend is waiting before the timeout goes off. (This is not totally
surprising, since friarbird builds with -DCLOBBER_CACHE_ALWAYS.) Increase
the timeout delay from 1 second to 2 in hopes of resolving that problem.
Rather than doing a fairly-expensive setitimer() call to prevent interrupts
from happening, let's just invent a simple boolean flag that the signal
handler is required to check. This is not only faster but considerably
more robust than before, since the previous code effectively assumed that
only ITIMER_REAL events would ever fire the SIGALRM handler, which is
obviously something that can be broken easily by third-party code.
Zoltán Böszörményi and Tom Lane
We need this in non-ENABLE_THREAD_SAFETY builds, and also to satisfy
the exports.txt entry; while it might be a good idea to remove the
latter, I'm hesitant to do so except in the context of an intentional
ABI break. At least we don't have a separately maintained source file
for it anymore.
I had thought we weren't using this version of pqsignal() at all on
Windows, but that's wrong --- initdb is using it (and coping with the
POSIX-ish semantics of bare signal() :-(). So allow the file to be
built in WIN32+FRONTEND case, and add it to the MSVC build logic.
We had two copies of this function in the backend and libpq, which was
already pretty bogus, but it turns out that we need it in some other
programs that don't use libpq (such as pg_test_fsync). So put it where
it probably should have been all along. The signal-mask-initialization
support in src/backend/libpq/pqsignal.c stays where it is, though, since
we only need that in the backend.
This GUC allows limiting the time spent waiting to acquire any one
heavyweight lock.
In support of this, improve the recently-added timeout infrastructure
to permit efficiently enabling or disabling multiple timeouts at once.
That reduces the performance hit from turning on lock_timeout, though
it's still not zero.
Zoltán Böszörményi, reviewed by Tom Lane,
Stephen Frost, and Hari Babu
Formerly we just Assert'ed that each refcount was zero, which was quick
and easy but failed to provide a good overview of what was wrong.
Change the code so that we'll call PrintBufferLeakWarning() for each
buffer with a nonzero refcount, and then Assert at the end of the loop.
This costs nothing in runtime and might ease diagnosis of some bugs.
Greg Smith, reviewed by Satoshi Nagayasu, further tweaked by me
Due to a misreading of the function's comment block, there was an
unneeded change to a call in rewriteDefine.c. There is, in fact
no reason to pass false for a MV; it should be true just like a
view.
Fixes issue pointed out by Tom Lane
This would have caught a bug in the initial patch, and seems like
a good thing to test going forward.
Per bug report by Erik Rijkers and fix by Tom Lane
Even though this patch had no user-visible difference, better keep the code
in psqlscan.l sync with the backend lexer. And of course it's nice to shrink
the psql binary, too. Ecpg's version of the lexer doesn't have the error
rule, it doesn't try to avoid backing up, so it doesn't need to be modified.
As reminded by Tom Lane
The planner sometimes inserts Result nodes to perform column projections
(ie, arbitrary scalar calculations) above plan nodes that lack projection
logic of their own. However, we did that even if the lower plan node was
in fact producing the required column set already; which is a pretty common
case given the popularity of "SELECT * FROM ...". Measurements show that
the useless plan node adds non-negligible overhead, especially when there
are many columns in the result. So add a check to avoid inserting a Result
node unless there's something useful for it to do.
There are a couple of remaining places where unnecessary Result nodes
could get inserted, but they are (a) much less performance-critical,
and (b) coded in such a way that it's hard to avoid inserting a Result,
because the desired tlist is changed on-the-fly in subsequent logic.
We'll leave those alone for now.
Kyotaro Horiguchi; reviewed and further hacked on by Amit Kapila and
Tom Lane.
The error rule used to avoid backtracking with the U&'...' UESCAPE 'x'
syntax bloated the flex tables, so refactor that. This patch makes the error
rule shorter, by introducing a new exclusive flex state that's entered after
parsing U&'...'. This shrinks the postgres binary by about 220kB.
The estimates are based on the existing lower bound histogram, and a new
histogram of range lengths.
Bump catversion, because the range length histogram now needs to be present
in statistic slot kind 6, or you get an error on @> and <@ queries. (A
re-ANALYZE would be enough to fix that, though)
Alexander Korotkov, with some refactoring by me.
There's still some discussion about exactly how postgres_fdw ought to
handle this case, but there seems no debate that we want to allow defaults
to be used for inserts into foreign tables. So remove the core-code
restrictions that prevented it.
While at it, get rid of the special grammar productions for CREATE FOREIGN
TABLE, and instead add explicit FEATURE_NOT_SUPPORTED error checks for the
disallowed cases. This makes the grammar a shade smaller, and more
importantly results in much more intelligible error messages for
unsupported cases. It's also one less thing to fix if we ever start
supporting constraints on foreign tables.
"break" instead of "continue" suppressed view expansion for views appearing
later in the range table. Per report from Erikjan Rijkers.
While at it, improve the associated comment a bit.
This adds the following:
json_agg(anyrecord) -> json
to_json(any) -> json
hstore_to_json(hstore) -> json (also used as a cast)
hstore_to_json_loose(hstore) -> json
The last provides heuristic treatment of numbers and booleans.
Also, in json generation, if any non-builtin type has a cast to json,
that function is used instead of the type's output function.
Andrew Dunstan, reviewed by Steve Singer.
Catalog version bumped.
This patch adds the core-system infrastructure needed to support updates
on foreign tables, and extends contrib/postgres_fdw to allow updates
against remote Postgres servers. There's still a great deal of room for
improvement in optimization of remote updates, but at least there's basic
functionality there now.
KaiGai Kohei, reviewed by Alexander Korotkov and Laurenz Albe, and rather
heavily revised by Tom Lane.
Instead of just reporting which user failed to log in, log both the
line number in the active pg_hba.conf file (which may not match reality
in case the file has been edited and not reloaded) and the contents of
the matching line (which will always be correct), to make it easier
to debug incorrect pg_hba.conf files.
The message to the client remains unchanged and does not include this
information, to prevent leaking security sensitive information.
Reviewed by Tom Lane and Dean Rasheed
The libpgcommon patch made that unnecessary, palloc and friends are now
available in frontend programs too, mapped to plain old malloc.
As pointed out by Alvaro Herrera.
The previous coding of this function could get into situations where it
would never terminate, because successive passes would re-add EMPTY arcs
that had been removed by the previous pass. Rewrite the function
completely using a new algorithm that is guaranteed to terminate, and
also seems to be usually faster than the old one. Per Tcl bugs 3604074
and 3606683.
Tom Lane and Don Porter
If we were about to enter archive recovery after crash recovery, we scanned
the archive for the latest tli history file, and set the recovery target
timeline to that. However, when we actually tried to read the history file,
we would not fetch the file from the archive, because we were not in archive
recovery yet.
To fix, make readTimeLineHistory and existsTimeLineHistory to always fetch
the file from archive if archive recovery is requested, even if we're not in
archive recovery yet.
Backpatch to 9.2. Mitsumasa KONDO
This saves several catalog lookups per reference. It's not all that
exciting right now, because we'd managed to minimize the number of places
that need to fetch the data; but the upcoming writable-foreign-tables patch
needs this info in a lot more places.
This page with no tuples is used to distinguish an MV containing a
zero-row resultset of its backing query from an MV which has not
been populated by its backing query. Unless WAL-logged, recovery
and hot standby don't work correctly with what should be an empty
but scannable materialized view.
Fixes bugs reported by Fujii Masao in testing MVs on hot standby.