For some reason lost in the mists of prehistory, RETURN was only coded to
allow a simple reference to a composite variable when the function's return
type is composite. Allow an expression instead, while preserving the
efficiency of the original code path in the case where the expression is
indeed just a composite variable's name. Likewise for RETURN NEXT.
As is true in various other places, the supplied expression must yield
exactly the number and data types of the required columns. There was some
discussion of relaxing that for pl/pgsql, but no consensus yet, so this
patch doesn't address that.
Asif Rehman, reviewed by Pavel Stehule
Background workers are postmaster subprocesses that run arbitrary
user-specified code. They can request shared memory access as well as
backend database connections; or they can just use plain libpq frontend
database connections.
Modules listed in shared_preload_libraries can register background
workers in their _PG_init() function; this is early enough that it's not
necessary to provide an extra GUC option, because the necessary extra
resources can be allocated early on. Modules can install more than one
bgworker, if necessary.
Care is taken that these extra processes do not interfere with other
postmaster tasks: only one such process is started on each ServerLoop
iteration. This means a large number of them could be waiting to be
started up and postmaster is still able to quickly service external
connection requests. Also, shutdown sequence should not be impacted by
a worker process that's reasonably well behaved (i.e. promptly responds
to termination signals.)
The current implementation lets worker processes specify their start
time, i.e. at what point in the server startup process they are to be
started: right after postmaster start (in which case they mustn't ask
for shared memory access), when consistent state has been reached
(useful during recovery in a HOT standby server), or when recovery has
terminated (i.e. when normal backends are allowed).
In case of a bgworker crash, actions to take depend on registration
data: if shared memory was requested, then all other connections are
taken down (as well as other bgworkers), just like it were a regular
backend crashing. The bgworker itself is restarted, too, within a
configurable timeframe (which can be configured to be never).
More features to add to this framework can be imagined without much
effort, and have been discussed, but this seems good enough as a useful
unit already.
An elementary sample module is supplied.
Author: Álvaro Herrera
This patch is loosely based on prior patches submitted by KaiGai Kohei,
and unsubmitted code by Simon Riggs.
Reviewed by: KaiGai Kohei, Markus Wanner, Andres Freund,
Heikki Linnakangas, Simon Riggs, Amit Kapila
storage.
Have pg_upgrade use it, and enable server options fsync=off and
full_page_writes=off.
Document that users turning fsync from off to on should run initdb
--sync-only.
[ Previous commit was incorrectly applied as a git merge. ]
We've generally recommended use of INSTEAD triggers over rules since that
feature was added; but this old text in the CREATE VIEW reference page
didn't get the memo. Noted by Thomas Kellerer.
When a relfilenode is created in this subtransaction or
a committed child transaction and it cannot otherwise
be seen by our own process, mark tuples committed ahead
of transaction commit for all COPY commands in same
transaction. If FREEZE specified on COPY
and pre-conditions met then rows will also be frozen.
Both options designed to avoid revisiting rows after commit,
increasing performance of subsequent commands after
data load and upgrade. pg_restore changes later.
Simon Riggs, review comments from Heikki Linnakangas, Noah Misch and design
input from Tom Lane, Robert Haas and Kevin Grittner
This allows a caller to get back the exact conninfo array that was
used to create a connection, including parameters read from the
environment.
In doing this, restructure how options are copied from the conninfo
to the actual connection.
Zoltan Boszormenyi and Magnus Hagander
Commit 8cb53654db, which introduced DROP
INDEX CONCURRENTLY, managed to break CREATE INDEX CONCURRENTLY via a poor
choice of catalog state representation. The pg_index state for an index
that's reached the final pre-drop stage was the same as the state for an
index just created by CREATE INDEX CONCURRENTLY. This meant that the
(necessary) change to make RelationGetIndexList ignore about-to-die indexes
also made it ignore freshly-created indexes; which is catastrophic because
the latter do need to be considered in HOT-safety decisions. Failure to
do so leads to incorrect index entries and subsequently wrong results from
queries depending on the concurrently-created index.
To fix, add an additional boolean column "indislive" to pg_index, so that
the freshly-created and about-to-die states can be distinguished. (This
change obviously is only possible in HEAD. This patch will need to be
back-patched, but in 9.2 we'll use a kluge consisting of overloading the
formerly-impossible state of indisvalid = true and indisready = false.)
In addition, change CREATE/DROP INDEX CONCURRENTLY so that the pg_index
flag changes they make without exclusive lock on the index are made via
heap_inplace_update() rather than a normal transactional update. The
latter is not very safe because moving the pg_index tuple could result in
concurrent SnapshotNow scans finding it twice or not at all, thus possibly
resulting in index corruption. This is a pre-existing bug in CREATE INDEX
CONCURRENTLY, which was copied into the DROP code.
In addition, fix various places in the code that ought to check to make
sure that the indexes they are manipulating are valid and/or ready as
appropriate. These represent bugs that have existed since 8.2, since
a failed CREATE INDEX CONCURRENTLY could leave a corrupt or invalid
index behind, and we ought not try to do anything that might fail with
such an index.
Also fix RelationReloadIndexInfo to ensure it copies all the pg_index
columns that are allowed to change after initial creation. Previously we
could have been left with stale values of some fields in an index relcache
entry. It's not clear whether this actually had any user-visible
consequences, but it's at least a bug waiting to happen.
In addition, do some code and docs review for DROP INDEX CONCURRENTLY;
some cosmetic code cleanup but mostly addition and revision of comments.
This will need to be back-patched, but in a noticeably different form,
so I'm committing it to HEAD before working on the back-patch.
Problem reported by Amit Kapila, diagnosis by Pavan Deolassee,
fix by Tom Lane and Andres Freund.
This reverts commit d573e239f0, "Take fewer
snapshots". While that seemed like a good idea at the time, it caused
execution to use a snapshot that had been acquired before locking any of
the tables mentioned in the query. This created user-visible anomalies
that were not present in any prior release of Postgres, as reported by
Tomas Vondra. While this whole area could do with a redesign (since there
are related cases that have anomalies anyway), it doesn't seem likely that
any future patch would be reasonably back-patchable; and we don't want 9.2
to exhibit a behavior that's subtly unlike either past or future releases.
Hence, revert to prior code while we rethink the problem.
This way it works more like the DSSSL build, and dependencies are
tracked better by make.
Also copy the CSS stylesheet to the html directory. This was forgotten
when the output directory was changed.
Some versions of the XSLT stylesheets don't handle the missing slash
correctly (they concatenate directory and file name without the slash).
This might never have worked correctly.
Without this, the connection will be killed after timeout if
wal_sender_timeout is set in the server.
Original patch by Amit Kapila, modified by me to fit recent changes in the
code.
We used to send structs wrapped in CopyData messages, which works as long as
the client and server agree on things like endianess, timestamp format and
alignment. That's good enough for running a standby server, which has to run
on the same platform anyway, but it's useful for tools like pg_receivexlog
to work across platforms.
This breaks protocol compatibility of streaming replication, but we never
promised that to be compatible across versions, anyway.
Represent a sequence's current value as a separate TableDataInfo dumpable
object, so that it can be dumped within the data section of the archive
rather than in pre-data. This fixes an undesirable inconsistency between
the meanings of "--data-only" and "--section=data", and also fixes dumping
of sequences that are marked as extension configuration tables, as per a
report from Marko Kreen back in July. The main cost is that we do one more
SQL query per sequence, but that's probably not very meaningful in most
databases.
Back-patch to 9.1, since it has the extension configuration issue even
though not the --section switch.
... and have sepgsql use it to determine whether to check permissions
during certain operations. Indexes that are being created as a result
of REINDEX, for instance, do not need to have their permissions checked;
they were already checked when the index was created.
Author: KaiGai Kohei, slightly revised by me
In commit 4317e0246c, I accidentally broke
this behavior while rearranging code to ensure that --create wouldn't
affect whether a DATABASE entry gets put into archive-format output.
Thus, 9.2 would issue a DROP DATABASE command in --clean mode, which is
either useless or dangerous depending on the usage scenario.
It should not do that, and no longer does.
A bright spot is that this refactoring makes it easy to allow the
combination of --clean and --create to work sensibly, ie, emit DROP
DATABASE then CREATE DATABASE before reconnecting. Ordinarily we'd
consider that a feature addition and not back-patch it, but it seems
silly to not include the extra couple of lines required in the 9.2
version of the code.
Per report from Guillaume Lelarge, though this is slightly more extensive
than his proposed patch.
Rename replication_timeout to wal_sender_timeout, and add a new setting
called wal_receiver_timeout that does the same at the walreceiver side.
There was previously no timeout in walreceiver, so if the network went down,
for example, the walreceiver could take a long time to notice that the
connection was lost. Now with the two settings, both sides of a replication
connection will detect a broken connection similarly.
It is no longer necessary to manually set wal_receiver_status_interval to
a value smaller than the timeout. Both wal sender and receiver now
automatically send a "ping" message if more than 1/2 of the configured
timeout has elapsed, and it hasn't received any messages from the other end.
Amit Kapila, heavily edited by me.
dblink now has its own validator function dblink_fdw_validator(), which is
better than the core function postgresql_fdw_validator() because it gets
the list of legal options from libpq instead of having a hard-wired list.
Make the dblink extension module provide a standard foreign data wrapper
dblink_fdw that encapsulates use of this validator, and recommend use of
that wrapper instead of making up wrappers on the fly.
Unfortunately, because ad-hoc wrappers *were* recommended practice
previously, it's not clear when we can get rid of postgresql_fdw_validator
without causing upgrade problems. But this is a step in the right
direction.
Shigeru Hanada, reviewed by KaiGai Kohei
libpq defines these functions as accepting "size_t" lengths ... but the
underlying backend functions expect signed int32 length parameters, and so
will misinterpret any value exceeding INT_MAX. Fix the libpq side to throw
error rather than possibly doing something unexpected.
This is a bug of long standing, but I doubt it's worth back-patching. The
problem is really pretty academic anyway with lo_read/lo_write, since any
caller expecting sane behavior would have to have provided a multi-gigabyte
buffer. It's slightly more pressing with lo_truncate, but still we haven't
supported large objects over 2GB until now.
Copy-editing for previous patch, plus fixing some longstanding markup
issues and oversights (like not mentioning that failures will set the
PQerrorMessage string).
4TB large objects (standard 8KB BLCKSZ case). For this purpose new
libpq API lo_lseek64, lo_tell64 and lo_truncate64 are added. Also
corresponding new backend functions lo_lseek64, lo_tell64 and
lo_truncate64 are added. inv_api.c is changed to handle 64-bit
offsets.
Patch contributed by Nozomi Anzai (backend side) and Yugo Nagata
(frontend side, docs, regression tests and example program). Reviewed
by Kohei Kaigai. Committed by Tatsuo Ishii with minor editings.
Use the terms "simple bind" and "search+bind" consistently do
distinguish the two modes (better than first mode and second mode in
any case). They were already used in some places, now it's just more
prominent.
Split up the list of options into one for common options and one for
each mode, for clarity.
Add configuration examples for either mode.
These reference pages still claimed that you have to be superuser to create
a database or schema owned by a different role. That was true before 8.1,
but it was changed in commits aa1110624c and
f91370cd2f to allow assignment of ownership
to any role you are a member of. However, at the time we were thinking of
that primarily as a change to the ALTER OWNER rules, so the need to touch
these two CREATE ref pages got missed.
Per discussion, schema-element subcommands are not allowed together with
this option, since it's not very obvious what should happen to the element
objects.
Fabrízio de Royes Mello
This allows logging only some fraction of transactions, greatly reducing
the amount of log generated.
Tomas Vondra, reviewed by Robert Haas and Jeff Janes.
You can now get the number of rows processed by a COPY statement in a
PL/pgSQL function with "GET DIAGNOSTICS x = ROW_COUNT".
Pavel Stehule, reviewed by Amit Kapila, with some editing by me.
Both programs got the "magic" string wrong, causing standard-conforming tar
implementations to believe the output was just legacy tar format without
any POSIX extensions. This doesn't actually matter that much, especially
since pg_dump failed to fill the POSIX fields anyway, but still there is
little point in emitting tar format if we can't be compliant with the
standard. In addition, pg_dump failed to write the EOF marker correctly
(there should be 2 blocks of zeroes not just one), pg_basebackup put the
numeric group ID in the wrong place, and both programs had a pretty
brain-dead idea of how to compute the checksum. Fix all that and improve
the comments a bit.
pg_restore is modified to accept either the correct POSIX-compliant "magic"
string or the previous value. This part of the change will need to be
back-patched to avoid an unnecessary compatibility break when a previous
version tries to read tar-format output from 9.3 pg_dump.
Brian Weaver and Tom Lane
The syntax "su -c 'command' username" is not accepted by all versions of
su, for example not OpenBSD's. More portable is "su username -c
'command'". So change runtime.sgml to recommend that syntax. Also,
add a -D switch to the OpenBSD example script, for consistency with other
examples. Per Denis Lapshin and Gábor Hidvégi.
This allows easily splitting configuration into many files, deployed in a
directory.
Magnus Hagander, Greg Smith, Selena Deckelmann, reviewed by Noah Misch.
Produce a NOTICE when the label already exists, for consistency with other
CREATE IF NOT EXISTS commands. Also, fix the code so it produces something
more user-friendly than an index violation when the label already exists.
This not incidentally enables making a regression test that the previous
patch didn't make for fear of exposing an unpredictable OID in the results.
Also some wordsmithing on the documentation.
If the label is already in the enum the statement becomes a no-op.
This will reduce the pain that comes from our not allowing this
operation inside a transaction block.
Andrew Dunstan, reviewed by Tom Lane and Magnus Hagander.
Somewhere along the line, somebody decided to remove all trace of this
notation from the documentation text. It was still in the command syntax
synopses, or at least some of them, but with no indication what it meant.
This will not do, as evidenced by the confusion apparent in bug #7543;
even if the notation is now unnecessary, people will find it in legacy
SQL code and need to know what it does.
The documentation mentioned setting autovacuum_freeze_max_age to
"its maximum allowed value of a little less than two billion".
This led to a post asking about the exact maximum allowed value,
which is precisely two billion, not "a little less".
Based on question by Radovan Jablonovsky. Backpatch to 8.3.
When starting either an old or new postmaster, force it to place its Unix
socket in the current directory. This makes it even harder for accidental
connections to occur during pg_upgrade, and also works around some
scenarios where the default socket location isn't usable. (For example,
if the default location is something other than "/tmp", it might not exist
during "make check".)
When checking an already-running old postmaster, find out its actual socket
directory location from postmaster.pid, if possible. This dodges problems
with an old postmaster having a configured location different from the
default built into pg_upgrade's libpq. We can't find that out if the old
postmaster is pre-9.1, so also document how to cope with such scenarios
manually.
In support of this, centralize handling of the connection-related command
line options passed to pg_upgrade's subsidiary programs, such as pg_dump.
This should make future changes easier.
Bruce Momjian and Tom Lane
Extend xfunc.sgml's discussion of set-returning functions to show an
example of using LATERAL, and recommend that over putting SRFs in the
targetlist.
In passing, reword func.sgml's section on set-returning functions so
that it doesn't claim that the functions listed therein are all the
built-in set-returning functions. That hasn't been true for a long
time, and trying to make it so doesn't seem like it would be an
improvement. (Perhaps we should rename that section?)
Both per suggestions from Merlin Moncure.
Only warn when connecting to a newer server, since connecting to older
servers works pretty well nowadays. Also update the documentation a
little about current psql/server compatibility expectations.
The existing documentation in Linux Memory Overcommit seemed to
assume that PostgreSQL itself could never be the problem, or at
least it didn't tell you what to do about it.
Per discussion with Craig Ringer and Kevin Grittner.
This is so that these sections will have stable HTML tags that one can
link to, rather than things like "AEN1902". Perhaps we should mount a
campaign to do this everywhere, but I've found myself pointing at
syntax.sgml subsections often enough to be sure it's useful here.
As of 9.2, constraint exclusion should work okay with prepared statements:
the planner will try custom plans with actual values of the parameters,
and observe that they are a lot cheaper than the generic plan, and thus
never fall back to using the generic plan. Noted by Tatsuhito Kasahara.
The docs claimed that this mode only waits for the standby to receive WAL
data, but actually it waits for the data to be written out to the standby's
OS; which is a pretty significant difference because it removes the risk of
crash of the walreceiver process.
libxslt offers the ability to read and write both files and URLs through
stylesheet commands, thus allowing unprivileged database users to both read
and write data with the privileges of the database server. Disable that
through proper use of libxslt's security options.
Also, remove xslt_process()'s ability to fetch documents and stylesheets
from external files/URLs. While this was a documented "feature", it was
long regarded as a terrible idea. The fix for CVE-2012-3489 broke that
capability, and rather than expend effort on trying to fix it, we're just
going to summarily remove it.
While the ability to write as well as read makes this security hole
considerably worse than CVE-2012-3489, the problem is mitigated by the fact
that xslt_process() is not available unless contrib/xml2 is installed,
and the longstanding warnings about security risks from that should have
discouraged prudent DBAs from installing it in security-exposed databases.
Reported and fixed by Peter Eisentraut.
Security: CVE-2012-3488
Replace unix_socket_directory with unix_socket_directories, which is a list
of socket directories, and adjust postmaster's code to allow zero or more
Unix-domain sockets to be created.
This is mostly a straightforward change, but since the Unix sockets ought
to be created after the TCP/IP sockets for safety reasons (better chance
of detecting a port number conflict), AddToDataDirLockFile needs to be
fixed to support out-of-order updates of data directory lockfile lines.
That's a change that had been foreseen to be necessary someday anyway.
Honza Horak, reviewed and revised by Tom Lane
Should be limited to the maximum number of connections excluding
autovacuum workers, not including.
Add similar check for max_wal_senders, which should never be higher than
max_connections.
Previously, the -1 option was silently ignored.
Also, emit an error if -1 is used in a context where it won't be
respected, to avoid user confusion.
Original patch by Fabien COELHO, but this version is quite different
from the original submission.
This patch implements the standard syntax of LATERAL attached to a
sub-SELECT in FROM, and also allows LATERAL attached to a function in FROM,
since set-returning function calls are expected to be one of the principal
use-cases.
The main change here is a rewrite of the mechanism for keeping track of
which relations are visible for column references while the FROM clause is
being scanned. The parser "namespace" lists are no longer lists of bare
RTEs, but are lists of ParseNamespaceItem structs, which carry an RTE
pointer as well as some visibility-controlling flags. Aside from
supporting LATERAL correctly, this lets us get rid of the ancient hacks
that required rechecking subqueries and JOIN/ON and function-in-FROM
expressions for invalid references after they were initially parsed.
Invalid column references are now always correctly detected on sight.
In passing, remove assorted parser error checks that are now dead code by
virtue of our having gotten rid of add_missing_from, as well as some
comments that are obsolete for the same reason. (It was mainly
add_missing_from that caused so much fudging here in the first place.)
The planner support for this feature is very minimal, and will be improved
in future patches. It works well enough for testing purposes, though.
catversion bump forced due to new field in RangeTblEntry.
century specifications just like positive/AD centuries. Previously the
behavior was either wrong or inconsistent with positive/AD handling.
Centuries without years now always assume the first year of the century,
which is now documented.
After taking awhile to digest the row-processor feature that was added to
libpq in commit 92785dac2e, we've concluded
it is over-complicated and too hard to use. Leave the core infrastructure
changes in place (that is, there's still a row processor function inside
libpq), but remove the exposed API pieces, and instead provide a "single
row" mode switch that causes PQgetResult to return one row at a time in
separate PGresult objects.
This approach incurs more overhead than proper use of a row processor
callback would, since construction of a PGresult per row adds extra cycles.
However, it is far easier to use and harder to break. The single-row mode
still affords applications the primary benefit that the row processor API
was meant to provide, namely not having to accumulate large result sets in
memory before processing them. Preliminary testing suggests that we can
probably buy back most of the extra cycles by micro-optimizing construction
of the extra results, but that task will be left for another day.
Marko Kreen
The most user-visible part of this is to change the long options
--statusint and --noloop to --status-interval and --no-loop,
respectively, per discussion.
Also, consistently enclose file names in double quotes, per our
conventions; and consistently use the term "transaction log file" to
talk about WAL segments. (Someday we may need to go over this
terminology and make it consistent across the whole source code.)
Finally, reflow the code to better fit in 80 columns, and have pgindent
fix it up some more.