We were doing some amazingly complicated things in order to avoid running
the very expensive identify_system_timezone() procedure during GUC
initialization. But there is an obvious fix for that, which is to do it
once during initdb and have initdb install the system-specific default into
postgresql.conf, as it already does for most other GUC variables that need
system-environment-dependent defaults. This means that the timezone (and
log_timezone) settings no longer have any magic behavior in the server.
Per discussion.
dots. I previously worked around this in initdb, mapping the known
problematic locale names to aliases that work, but Hiroshi Inoue pointed
out that that's not enough because even if you use one of the aliases, like
"Chinese_HKG", setlocale(LC_CTYPE, NULL) returns back the long form, ie.
"Chinese_Hong Kong S.A.R.". When we try to restore an old locale value by
passing that value back to setlocale(), it fails. Note that you are affected
by this bug also if you use one of those short-form names manually, so just
reverting the hack in initdb won't fix it.
To work around that, move the locale name mapping from initdb to a wrapper
around setlocale(), so that the mapping is invoked on every setlocale() call.
Also, add a few checks for failed setlocale() calls in the backend. These
calls shouldn't fail, and if they do there isn't much we can do about it,
but at least you'll get a warning.
Backpatch to 9.1, where the initdb hack was introduced. The Windows bug
affects older versions too if you set locale manually to one of the aliases,
but given the lack of complaints from the field, I'm hesitent to backpatch.
On closer inspection, whining in restore_toc_entries_parallel is really
much too late for any user-facing error case. The right place to do it
is at the start of RestoreArchive(), before we've done anything interesting
(suh as trying to DROP all the targets ...)
Back-patch to 8.4, where parallel restore was introduced.
If we are unable to do a parallel restore because the input file is stdin
or is otherwise unseekable, we should complain and fail immediately, not
after having done some of the restore. Complaining once per thread isn't
so cool either, and the messages should be worded to make it clear this is
an unsupported case not some weird race-condition bug. Per complaint from
Lonni Friedman.
Back-patch to 8.4, where parallel restore was introduced.
These changes allow backtick command evaluation and psql variable
interpolation to happen on substrings of a single meta-command argument.
Formerly, no such evaluations happened at all if the backtick or colon
wasn't the first character of the argument, and we considered an argument
completed as soon as we'd processed one backtick, variable reference, or
quoted substring. A string like 'FOO'BAR was thus taken as two arguments
not one, not exactly what one would expect. In the new coding, an argument
is considered terminated only by unquoted whitespace or backslash.
Also, clean up a bunch of omissions, infelicities and outright errors in
the psql documentation of variables and metacommand argument syntax.
Per previous experimentation, backtracking slows down lexing performance
significantly (by about a third). It's usually pretty easy to avoid, just
need to have rules that accept an incomplete construct and do whatever the
lexer would have done otherwise.
The backtracking was introduced by the patch that added quoted variable
substitution. Back-patch to 9.0 where that was added.
Rather than dumping out the raw array as PostgreSQL represents it
internally, we now print it out in a format similar to the one in
which the user input it, which seems a lot more user friendly.
Shigeru Hanada
When streaming including WAL, the size estimate will always be incorrect,
since we don't know how much WAL is included. To make sure the output doesn't
look completely unreasonable, this patch increases the total size whenever we
go past the estimate, to make sure we never go above 100%.
Instead of displaying comments on an arbitrary subset of the object
types which support them, make \dd display comments on exactly those
object types which don't have their own backlash commands. We now
regard the display of comments as properly the job of the relevant
backslash command (though many of them do so only in verbose mode)
rather than something that \dd should be responsible for. However,
a handful of object types have no backlash command, so make \dd
give information about those.
Josh Kupershmidt
The relevant backslash commands already exist, so we're just adding an
additional column. With this commit, all objects that have psql backslash
commands and accept comments should now display those comments at least
in verbose mode.
Josh Kupershmidt, with doc additions by me.
\dc and \dD now accept a "+" option, which will cause the comments to
be displayed. Along the way, correct a few oversights in the previous
commit in this area, 3b17efdfdd - namely,
(1) when \dL+ is used, make description still be the last column, for
consistency with what we've done elsewhere; and (2) document the
difference between \dC and \dC+.
Josh Kupershmidt, with a couple of doc changes by me.
The output of \dL (list languages) is fairly narrow, so we just always
display the comment. \dC (list casts) can get fairly wide, so we only
display comments if the new \dC+ option is specified.
Josh Kupershmidt
pg_backup_db.c contained a mini SQL lexer with which it tried to identify
boundaries between SQL commands, but that code was not designed to cope
with standard_conforming_strings, and would get the wrong answer if a
backslash immediately precedes a closing single quote in such a string,
as per report from Julian Mehnle. The bug only affects direct-to-database
restores from archive files made with standard_conforming_strings = on.
Rather than complicating the code some more to try to fix that, let's just
rip it all out. The only reason it was needed was to cope with COPY data
embedded into ordinary archive entries, which was a layout that was used
only for about the first three weeks of the archive format's existence,
and never in any production release of pg_dump. Instead, just rely on the
archive file layout to tell us whether we're printing COPY data or not.
This bug represents a data corruption hazard in all releases in which
standard_conforming_strings can be turned on, ie 8.2 and later, so
back-patch to all supported branches.
Also change "switch" to "arg" because "switch" is a bit of a sloppy
term. So the environment variable is called
PSQL_EDITOR_LINENUMBER_ARG. Set "+" as hardcoded default value on
Unix (since "vi" is the hardcoded default editor), so many users won't
have to configure this at all. Move the documentation around a bit to
centralize the editor configuration under environment variables,
rather than repeating bits of it under every backslash command that
invokes an editor.
This requires a new shared catalog, pg_shseclabel.
Along the way, fix the security_label regression tests so that they
don't monkey with the labels of any pre-existing objects. This is
unlikely to matter in practice, since only the label for the "dummy"
provider was being manipulated. But this way still seems cleaner.
KaiGai Kohei, with fairly extensive hacking by me.
\ir is short for "include relative"; when used from a script, the
supplied pathname will be interpreted relative to the input file,
rather than to the current working directory.
Gurjeet Singh, reviewed by Josh Kupershmidt, with substantial further
cleanup by me.
Most queries end with a backslash, but not a newline, so try to
standardize on that, for the convenience of people using psql -E to
extract queries.
Josh Kupershmidt, reviewed by Merlin Moncure.
handleCopyIn incremented pset.lineno for each line of COPY data read from
a file. This is correct when reading from the current script file (i.e.,
we are doing COPY FROM STDIN followed by in-line data), but it's wrong if
the data is coming from some other file. Per bug #6083 from Steve Haslam.
Back-patch to all supported versions.
Certain subdirectories do not get built if corresponding options are not
selected at configure time. However, "make distprep" should visit such
directories anyway, so that constructing derived files to be included in
the tarball happens without requiring all configure options to be given
in the tarball build script. Likewise, it's better if cleanup actions
unconditionally visit all directories (for example, this ensures proper
cleanup if someone has done a manual make in such a subdirectory).
To handle this, set up a convention that subdirectories that are
conditionally included in SUBDIRS should be added to ALWAYS_SUBDIRS
instead when they are excluded.
Back-patch to 9.1, so that plpython's spiexceptions.h will get provided
in 9.1 tarballs. There don't appear to be any instances where distprep
actions got missed in previous releases, and anyway this fix requires
gmake 3.80 so we don't want to apply it before 9.1.
DEF_PGPORT already comes in from pg_config.h, so we don't need to pass
it in again with a -D option. Apparently a leftover from the shell
script conversion.
The --flag argument can be used to tell xgettext the arguments of
which functions should be flagged with c-format in the PO files,
instead of guessing based on the presence of format specifiers, which
fails if no format specifiers are present but the translation
accidentally introduces one.
Appropriate flag settings have been added for each message catalog.
based on a patch by Christoph Berg for bug #6066
It currently doesn't make a difference, but it's inconsistent with
most other usage, and it might interfere with a future patch, so I'll
change it all in a separate commit.
Also, replace tabs with spaces for alignment.
For some reason, when we (I) added table lock acquisition to pg_dump,
we didn't think about making it happen as soon as possible after the
start of the transaction. What with subsequent additions, there was
actually quite a lot going on before we got around to that; which sort
of defeats the purpose. Rearrange the order of calls in dumpSchema()
to close the risk window as much as we easily can. Back-patch to all
supported branches.
The short-form -z switch didn't work, for lack of telling getopt_long
about it; and even if specified long-form, it failed to do anything,
because the various tests elsewhere in the file would take
Z_DEFAULT_COMPRESSION (which is -1) as meaning "don't compress".
Per bug #6060 from Shigehiro Honda, though I editorialized on his patch
a bit.
Since start/stop/restart/reload/status is a kind of standard command
set, it seems odd to insert the special-purpose "promote" in between
the closely related "restart" and "reload". So put it after "status"
in code and documentation.
Put the documentation of the -U option in some sensible place.
Rewrite the synopsis sentence in help and documentation to make it
less of a growing mouthful.
Add a postmaster_is_alive() test to the wait loop, so that we stop waiting
if the postmaster dies without removing its pidfile. Unfortunately this
only helps after the postmaster has created its pidfile, since until then
we don't know which PID to check. But if it never does create the pidfile,
we can give up in a relatively short time, so this is a useful addition
in practice. Per suggestion from Fujii Masao, though this doesn't look
very much like his patch.
In addition, improve pg_ctl's ability to cope with pre-existing pidfiles.
Such a file might or might not represent a live postmaster that is going to
block our postmaster from starting, but the previous code pre-judged the
situation and gave up waiting immediately. Now, we will wait for up to 5
seconds to see if our postmaster overwrites such a file. This issue
interacts with Fujii's patch because we would make the wrong conclusion
if we did the postmaster_is_alive() test with a pre-existing PID.
All of this could be improved if we rewrote start_postmaster() so that it
could report the child postmaster's PID, so that we'd know a-priori the
correct PID to test with postmaster_is_alive(). That looks like a bit too
much change for so late in the 9.1 development cycle, unfortunately.
This is consistent with the behavior of other global objects such as
languages and extensions.
Omitting foreign servers also omits the respective user mappings.
With "-w -t 0", we should report "still starting up", not "ok". If we
fall out of the loop without ever being able to call PQping (because we
were never able to construct a connection string), report "no response",
not "ok". This gets rid of corner cases in which we'd claim the server
had started even though it had not.
Also, if the postmaster.pid file is not there at any point after we've
waited 5 seconds, assume the postmaster has failed and report that, rather
than almost-certainly-fruitlessly continuing to wait. The pidfile should
appear almost instantly even when there is extensive startup work to do,
so 5 seconds is already a very conservative figure. This part is per a
gripe from MauMau --- there might be better ways to do it, but nothing
simple enough to get done for 9.1.
We initially had pg_dump emit CREATE EXTENSION commands unconditionally.
However, pg_dump has long been in the habit of not dumping procedural
language definitions when a --schema or --table switch is given. It seems
appropriate to handle extensions the same way, since like PLs they are SQL
objects that are not in any particular schema. Per complaint from Adrian
Schreyer.
For the --help output and reference pages of pg_dump, pg_dumpall,
pg_restore, put the options in some consistent, mostly alphabetical,
and consistent order, rather than newest option last or something like
that.
pg_dump has some heuristic rules for whether to dump casts and procedural
languages, since it's not all that easy to distinguish built-in ones from
user-defined ones. However, we should not apply those rules to objects
that belong to an extension, but just use the perfectly well-defined rules
for what to do with extension member objects. Otherwise we might
mistakenly lose extension member objects during a binary upgrade (which is
the only time that we'd want to dump extension members).
This commit fixes psql, pg_dump, and the information schema to be
consistent with the backend changes which I made as part of commit
be90032e0d, and also includes a
related documentation tweak.
Shigeru Hanada, with slight adjustment.
Discard any collation aliases that match the built-in pg_collation entries
(ie, "default", "C", "POSIX"). Such aliases would be refused by a CREATE
COLLATION command, but since initdb is injecting them via a simple INSERT,
it has to make the corresponding check for itself. Per Martin Pitt's
report of funny behavior in a machine that had a bogus "C.UTF-8" locale.
Also, use E'' syntax for the output of escape_quotes, as per its header
comment.
The style is set to "printf" for backwards compatibility everywhere except
on Windows, where it is set to "gnu_printf", which eliminates hundreds of
false error messages from modern versions of gcc arising from %m and %ll{d,u}
formats.
Instead of dumping them as CREATE TABLE ... OF, dump them as normal
tables with the usual special processing for dropped columns, and then
attach them to the type afterward, using ALTER TABLE ... OF. This is
analogous to the existing handling of inherited tables.
This code was accidentally part of the patch, it's only
needed for the code that's for 9.2. Not needing the timeline
also removes the need to call IDENTIFY_SYSTEM.
Noted by Peter E.
"People's Republic of China" locale on Windows was causing initdb to fail.
This fixes bug #5818 reported by yulei. On master, this makes the mapping
of "People's Republic of China" to just "China" obsolete. In 9.0 and 8.4,
just fix the escaping. Earlier versions didn't have locale names in bki
file.
In \d, be more careful to print collation only if it's not the default for
the column's data type. Avoid assuming that the name "default" is magic.
Fix \d on a composite type so that it will print per-column collations.
It's no longer the case that a composite type cannot have modifiers.
(In consequence, the expected outputs for composite-type regression tests
change.)
Fix \dD so that it will print collation for a domain, again only if it's
not the same as the base type's collation.
apostrophes or dots. There isn't much hope of Microsoft fixing it any time
soon, it's been like that for ages, so we better work around it. So, map a
few common Windows locale names known to cause problems to aliases that work.
server-encoding, fall back to UTF-8. It happens at least with the Chinese
locale, which implies BIG5. This is safe, because on Windows all locales
are compatible with UTF-8.
This warning is new in gcc 4.6 and part of -Wall. This patch cleans
up most of the noise, but there are some still warnings that are
trickier to remove.
"Unusable" collations are those not matching the current database's
encoding. The former behavior inconsistently showed such collations
some of the time, depending on the details of the pattern argument.
Per discussion, the original behavior seems too noisy. But if things
are so broken that none of the locales reported by "locale -a" are usable,
that's probably worth warning about.
The original coding supposed that a dump TOC file could never contain lines
longer than 1K. The folly of that was exposed by a recent report from
Per-Olov Esgard. We only really need to see the first dozen or two bytes
of each line, since we're just trying to read off the numeric ID at the
start of the line; so there's no need for a particularly huge buffer.
What there is a need for is logic to not process continuation bufferloads.
Back-patch to all supported branches, since it's always been like this.
This is needed only in 9.1 because only 9.0 had this and no one is
upgrading from a 9.0 beta to 9.0 anymore. We basically don't backpatch
9.0 beta fixes at this point.
While putting such entries into pg_collation is harmless (since backends
will ignore entries that don't match the database encoding), it's also
useless.
Install just one instance of the "C" and "POSIX" collations into
pg_collation, rather than one per encoding. Make these instances exist
and do something useful even in machines without locale_t support: to wit,
it's now possible to force comparisons and case-folding functions to use C
locale in an otherwise non-C database, whether or not the platform has
support for using any additional collations.
Fix up severely broken upper/lower/initcap functions, too: the C/POSIX
fastpath now does what it is supposed to, and non-default collations are
handled correctly in single-byte database encodings.
Merge the two separate collation hashtables that were being maintained in
pg_locale.c, and be more wary of the possibility that we fail partway
through filling a cache entry.
This removes an overloading of two authentication options where
one is very secure (peer) and one is often insecure (ident). Peer
is also the name used in libpq from 9.1 to specify the same type
of authentication.
Also make initdb select peer for local connections when ident is
chosen, and ident for TCP connections when peer is chosen.
ident keyword in pg_hba.conf is still accepted and maps to peer
authentication.
The last version in which these options were documented is now EOL, so
it's time to get rid of them for real. We now use GNU-style long
options instead.
This was required in pre-8.4 versions to allow the specification of
"ident sameuser", but sameuser is no longer required. It could be extended
to allow all parameters in the future, but should then apply to all
methods and not just ident.
Use collencoding = -1 to represent such a collation in pg_collation.
We need this to make the "default" entry work sanely, and a later
patch will fix the C/POSIX entries to be represented this way instead
of duplicating them across all encodings. All lookup operations now
search first for an entry that's database-encoding-specific, and then
for the same name with collencoding = -1.
Also some incidental code cleanup in collationcmds.c and pg_collation.c.
Including collation in the behavior of that function promotes a world view
we do not want. Moreover, it was producing the wrong behavior for pg_dump
anyway: what we want is to dump a COLLATE clause on attributes whose
attcollation is different from the underlying type, and likewise for
domains, and the function cannot do that for us. Doing it the hard way
in pg_dump is a bit more tedious but produces more correct output.
In passing, fix initdb so that the initial entry in pg_collation is
properly pinned. It was droppable before :-(
In createlang this is a one-line change. In droplang there's a whole
lot of cruft that can be discarded since the extension mechanism now
manages removal of the language's support functions.
Also, add deprecation notices to these two programs' reference pages,
since per discussion we may toss them overboard altogether in a release
or two.
This mostly just involves creating control, install, and
update-from-unpackaged scripts for them. However, I had to adjust plperl
and plpython to not share the same support functions between variants,
because we can't put the same function into multiple extensions.
catversion bump forced due to new contents of pg_pltemplate, and because
initdb now installs plpgsql as an extension not a bare language.
Add support for regression testing these as extensions not bare
languages.
Fix a couple of other issues that popped up while testing this: my initial
hack at pg_dump binary-upgrade support didn't work right, and we don't want
an extra schema permissions test after all.
Documentation changes still to come, but I'm committing now to see
whether the MSVC build scripts need work (likely they do).
Remove the unconditional superuser permissions check in CREATE EXTENSION,
and instead define a "superuser" extension property, which when false
(not the default) skips the superuser permissions check. In this case
the calling user only needs enough permissions to execute the commands
in the extension's installation script. The superuser property is also
enforced in the same way for ALTER EXTENSION UPDATE cases.
In other ALTER EXTENSION cases and DROP EXTENSION, test ownership of
the extension rather than superuserness. ALTER EXTENSION ADD/DROP needs
to insist on ownership of the target object as well; to do that without
duplicating code, refactor comment.c's big switch for permissions checks
into a separate function in objectaddress.c.
I also removed the superuserness checks in pg_available_extensions and
related functions; there's no strong reason why everybody shouldn't
be able to see that info.
Also invent an IF NOT EXISTS variant of CREATE EXTENSION, and use that
in pg_dump, so that dumps won't fail for installed-by-default extensions.
We don't have any of those yet, but we will soon.
This is all per discussion of wrapping the standard procedural languages
into extensions. I'll make those changes in a separate commit; this is
just putting the core infrastructure in place.
Instead of manually maintaining the "implementation of XXX operator"
comments in pg_proc.h, delete all those entries and let initdb create
them via a join. To let initdb figure out which name to use when there
is a conflict, change the comments for deprecated operators to say they
are deprecated --- which seems like a good thing to do anyway.
Historically, we've not had separate comments for built-in pg_operator
entries, but relied on the comments for the underlying functions. The
trouble with this approach is that there isn't much of anything to suggest
to users that they'd be better off using the operators instead. So, move
all the relevant comments into pg_operator, and give each underlying
function a comment that just says "implementation of XXX operator".
There are only about half a dozen cases where it seems reasonable to use
the underlying function interchangeably with the operator; in these cases
I left the same comment in place on the function as on the operator.
While at it, establish a policy that every built-in function and operator
entry should have a comment: there are now queries in the opr_sanity
regression test that will complain if one doesn't. This only required
adding a dozen or two more entries than would have been there anyway.
I also spent some time trying to eliminate gratuitous inconsistencies in
the style of the comments, though it's hopeless to suppose that more won't
creep in soon enough.
Per my proposal of 2010-10-15.
- ALTER FOREIGN DATA WRAPPER with HANDLER
- ALTER TABLE VALIDATE CONSTRAINT
- ALTER TYPE ADD VALUE
- COPY with ENCODING and FORCE NOT NULL
- CREATE FOREIGN DATA WRAPPER with HANDLER
- CREATE TRIGGER ... INSTEAD OF
Add a new libpq connection option client_encoding (which includes the
existing PGCLIENTENCODING environment variable), which besides an
encoding name accepts a special value "auto" that tries to determine
the encoding from the locale in the client's environment, using the
mechanisms that have been in use in initdb.
psql sets this new connection option to "auto" when running from a
terminal and not overridden by setting PGCLIENTENCODING.
original code by Heikki Linnakangas, with subsequent contributions by
Jaime Casanova, Peter Eisentraut, Stephen Frost, Ibrar Ahmed
Add a fdwhandler column to pg_foreign_data_wrapper, plus HANDLER options
in the CREATE FOREIGN DATA WRAPPER and ALTER FOREIGN DATA WRAPPER commands,
plus pg_dump support for same. Also invent a new pseudotype fdw_handler
with properties similar to language_handler.
This is split out of the "FDW API" patch for ease of review; it's all stuff
we will certainly need, regardless of any other details of the FDW API.
FDW handler functions will not actually get called yet.
In passing, fix some omissions and infelicities in foreigncmds.c.
Shigeru Hanada, Jan Urbanski, Heikki Linnakangas
The previous coding would try to process all SECTION_NONE items in the
initial sequential-restore pass, which failed if they were dependencies of
not-yet-restored items. Fix by postponing such items into the parallel
processing pass once we have skipped any non-PRE_DATA item.
Back-patch into 9.0; the original parallel-restore coding in 8.4 did not
have this bug, so no need to change it.
Report and diagnosis by Arnd Hannemann.
Normally, pg_dump summarily excludes functions in pg_catalog from
consideration. However, some extensions may create functions in pg_catalog
(adminpack already does that, and extensions for procedural languages will
likely do it too). In binary-upgrade mode, we have to dump such functions,
or the extension will be incomplete after upgrading. Per experimentation
with adminpack.
The original design of pg_available_extensions did not consider the
possibility of version-specific control files. Split it into two views:
pg_available_extensions shows information that is generic about an
extension, while pg_available_extension_versions shows all available
versions together with information that could be version-dependent.
Also, add an SRF pg_extension_update_paths() to assist in checking that
a collection of update scripts provide sane update path sequences.
- collowner field
- CREATE COLLATION
- ALTER COLLATION
- DROP COLLATION
- COMMENT ON COLLATION
- integration with extensions
- pg_dump support for the above
- dependency management
- psql tab completion
- psql \dO command
This follows recent discussions, so it's quite a bit different from
Dimitri's original. There will probably be more changes once we get a bit
of experience with it, but let's get it in and start playing with it.
This is still just core code. I'll start converting contrib modules
shortly.
Dimitri Fontaine and Tom Lane
This follows my proposal of yesterday, namely that we try to recreate the
previous state of the extension exactly, instead of allowing CREATE
EXTENSION to run a SQL script that might create some entirely-incompatible
on-disk state. In --binary-upgrade mode, pg_dump won't issue CREATE
EXTENSION at all, but instead uses a kluge function provided by
pg_upgrade_support to recreate the pg_extension row (and extension-level
pg_depend entries) without creating any member objects. The member objects
are then restored in the same way as if they weren't members, in particular
using pg_upgrade's normal hacks to preserve OIDs that need to be preserved.
Then, for each member object, ALTER EXTENSION ADD is issued to recreate the
pg_depend entry that marks it as an extension member.
In passing, fix breakage in pg_upgrade's enum-type support: somebody didn't
fix it when the noise word VALUE got added to ALTER TYPE ADD. Also,
rationalize parsetree representation of COMMENT ON DOMAIN and fix
get_object_address() to allow OBJECT_DOMAIN.
My original idea of doing extension member identification during
getDependencies() didn't work correctly: we have to mark member tables as
not-to-be-dumped rather earlier than that, else their subsidiary objects
like indexes get dumped anyway. Rearrange code to mark them early enough.
This patch adds the server infrastructure to support extensions.
There is still one significant loose end, namely how to make it play nice
with pg_upgrade, so I am not yet committing the changes that would make
all the contrib modules depend on this feature.
In passing, fix a disturbingly large amount of breakage in
AlterObjectNamespace() and callers.
Dimitri Fontaine, reviewed by Anssi Kääriäinen,
Itagaki Takahiro, Tom Lane, and numerous others
This adds collation support for columns and domains, a COLLATE clause
to override it per expression, and B-tree index support.
Peter Eisentraut
reviewed by Pavel Stehule, Itagaki Takahiro, Robert Haas, Noah Misch
FK constraints that are marked NOT VALID may later be VALIDATED, which uses an
ShareUpdateExclusiveLock on constraint table and RowShareLock on referenced
table. Significantly reduces lock strength and duration when adding FKs.
New state visible from psql.
Simon Riggs, with reviews from Marko Tiikkaja and Robert Haas
Until now, our Serializable mode has in fact been what's called Snapshot
Isolation, which allows some anomalies that could not occur in any
serialized ordering of the transactions. This patch fixes that using a
method called Serializable Snapshot Isolation, based on research papers by
Michael J. Cahill (see README-SSI for full references). In Serializable
Snapshot Isolation, transactions run like they do in Snapshot Isolation,
but a predicate lock manager observes the reads and writes performed and
aborts transactions if it detects that an anomaly might occur. This method
produces some false positives, ie. it sometimes aborts transactions even
though there is no anomaly.
To track reads we implement predicate locking, see storage/lmgr/predicate.c.
Whenever a tuple is read, a predicate lock is acquired on the tuple. Shared
memory is finite, so when a transaction takes many tuple-level locks on a
page, the locks are promoted to a single page-level lock, and further to a
single relation level lock if necessary. To lock key values with no matching
tuple, a sequential scan always takes a relation-level lock, and an index
scan acquires a page-level lock that covers the search key, whether or not
there are any matching keys at the moment.
A predicate lock doesn't conflict with any regular locks or with another
predicate locks in the normal sense. They're only used by the predicate lock
manager to detect the danger of anomalies. Only serializable transactions
participate in predicate locking, so there should be no extra overhead for
for other transactions.
Predicate locks can't be released at commit, but must be remembered until
all the transactions that overlapped with it have completed. That means that
we need to remember an unbounded amount of predicate locks, so we apply a
lossy but conservative method of tracking locks for committed transactions.
If we run short of shared memory, we overflow to a new "pg_serial" SLRU
pool.
We don't currently allow Serializable transactions in Hot Standby mode.
That would be hard, because even read-only transactions can cause anomalies
that wouldn't otherwise occur.
Serializable isolation mode now means the new fully serializable level.
Repeatable Read gives you the old Snapshot Isolation level that we have
always had.
Kevin Grittner and Dan Ports, reviewed by Jeff Davis, Heikki Linnakangas and
Anssi Kääriäinen
Add the current xlog insert location to the response of
IDENTIFY_SYSTEM, and adds result sets containing start
and stop location of backups to BASE_BACKUP responses.