Opening a catcache's index could require reading from that cache's own
catalog, which of course would acquire AccessShareLock on the catalog.
So the original coding here risks locking index before heap, which could
deadlock against another backend trying to get exclusive locks in the
normal order. Because InitCatCachePhase2 is only called when a backend
has to start up without a relcache init file, the deadlock was seldom seen
in the field. (And by the same token, there's no need to worry about any
performance disadvantage; so not much point in trying to distinguish
exactly which catalogs have the risk.)
Bug report, diagnosis, and patch by Nikhil Sontakke. Additional commentary
by me. Back-patch to all supported branches.
Instead of playing cute games with pathkeys, just build a direct
representation of the intended sub-select, and feed it through
query_planner to get a Path for the index access. This is a bit slower
than 9.1's previous method, since we'll duplicate most of the overhead of
query_planner; but since the whole optimization only applies to rather
simple single-table queries, that probably won't be much of a problem in
practice. The advantage is that we get to do the right thing when there's
a partial index that needs the implicit IS NOT NULL clause to be usable.
Also, although this makes planagg.c be a bit more closely tied to the
ordering of operations in grouping_planner, we can get rid of some coupling
to lower-level parts of the planner. Per complaint from Marti Raudsepp.
ensure that they use different checkpoints as the starting point. We use
the checkpoint redo location as a unique identifier for the base backup in
the end-of-backup record, and in the backup history file name.
Bug spotted by Fujii Masao.
The local variable "sock" can be unused depending on compilation flags.
But there seems no particular need for it, since the kernel calls can
just as easily say port->sock instead.
Install just one instance of the "C" and "POSIX" collations into
pg_collation, rather than one per encoding. Make these instances exist
and do something useful even in machines without locale_t support: to wit,
it's now possible to force comparisons and case-folding functions to use C
locale in an otherwise non-C database, whether or not the platform has
support for using any additional collations.
Fix up severely broken upper/lower/initcap functions, too: the C/POSIX
fastpath now does what it is supposed to, and non-default collations are
handled correctly in single-byte database encodings.
Merge the two separate collation hashtables that were being maintained in
pg_locale.c, and be more wary of the possibility that we fail partway
through filling a cache entry.
All expression nodes now have an explicit output-collation field, unless
they are known to only return a noncollatable data type (such as boolean
or record). Also, nodes that can invoke collation-aware functions store
a separate field that is the collation value to pass to the function.
This avoids confusion that arises when a function has collatable inputs
and noncollatable output type, or vice versa.
Also, replace the parser's on-the-fly collation assignment method with
a post-pass over the completed expression tree. This allows us to use
a more complex (and hopefully more nearly spec-compliant) assignment
rule without paying for it in extra storage in every expression node.
Fix assorted bugs in the planner's handling of collations by making
collation one of the defining properties of an EquivalenceClass and
by converting CollateExprs into discardable RelabelType nodes during
expression preprocessing.
This removes an overloading of two authentication options where
one is very secure (peer) and one is often insecure (ident). Peer
is also the name used in libpq from 9.1 to specify the same type
of authentication.
Also make initdb select peer for local connections when ident is
chosen, and ident for TCP connections when peer is chosen.
ident keyword in pg_hba.conf is still accepted and maps to peer
authentication.
When adding an inheritance parent to a table, an AccessShareLock on the
parent isn't strong enough to prevent trouble, so take
ShareUpdateExclusiveLock instead. Since this is a behavior change,
albeit a fairly unobtrusive one, and since we have only one report
from the field, no back-patch.
Report by Jon Nelson, analysis by Alvaro Herrera, fix by me.
This is advantageous because the BG writer is alive until much later in
the shutdown sequence than WAL writer; we want to make sure that it's
possible to shut off synchronous replication during a smart shutdown,
else it might not be possible to complete the shutdown at all.
Per very reasonable gripes from Fujii Masao and Simon Riggs.
The last version in which these options were documented is now EOL, so
it's time to get rid of them for real. We now use GNU-style long
options instead.
The maximum value of deadlock_timeout, max_standby_archive_delay,
max_standby_streaming_delay, log_min_duration_statement, and
log_autovacuum_min_duration was INT_MAX/1000 milliseconds, which is
about 35min, which is too short for some practical uses. Raise the
maximum value to INT_MAX; the code that uses the parameters already
supports that just fine.
1. Don't ignore query cancel interrupts. Instead, if the user asks to
cancel the query after we've already committed it, but before it's on
the standby, just emit a warning and let the COMMIT finish.
2. Don't ignore die interrupts (pg_terminate_backend or fast shutdown).
Instead, emit a warning message and close the connection without
acknowledging the commit. Other backends will still see the effect of
the commit, but there's no getting around that; it's too late to abort
at this point, and ignoring die interrupts altogether doesn't seem like
a good idea.
3. If synchronous_standby_names becomes empty, wake up all backends
waiting for synchronous replication to complete. Without this, someone
attempting to shut synchronous replication off could easily wedge the
entire system instead.
4. Avoid depending on the assumption that if a walsender updates
MyProc->syncRepState, we'll see the change even if we read it without
holding the lock. The window for this appears to be quite narrow (and
probably doesn't exist at all on machines with strong memory ordering)
but protecting against it is practically free, so do that.
5. Remove useless state SYNC_REP_MUST_DISCONNECT, which isn't needed and
doesn't actually do anything.
There's still some further work needed here to make the behavior of fast
shutdown plausible, but that looks complex, so I'm leaving it for a
separate commit. Review by Fujii Masao.
This has been broken for years, and I'm not sure why it has not been
noticed before, but now a very modern Cygwin breaks on it, and the fix
is clearly correct. Backpatching to all live branches.
This patch causes unknown-type Consts to be coerced to the resolved output
type of the set operation at parse time. Formerly such Consts were left
alone until late in the planning stage. The disadvantage of that approach
is that it disables some optimizations, because the planner sees the set-op
leaf query as having different output column types than the overall set-op.
We saw an example of that in a recent performance gripe from Claudio
Freire.
Fixing such a Const requires scribbling on the leaf query in
transformSetOperationTree, but that should be all right since if the leaf
query's semantics depended on that output column, it would already have
resolved the unknown to something else.
Most of the bulk of this patch is a simple adjustment of
transformSetOperationTree's API so that upper levels can get at the
TargetEntry containing a Const to be replaced: it now returns a list of
TargetEntries, instead of just the bare expressions.
This was required in pre-8.4 versions to allow the specification of
"ident sameuser", but sameuser is no longer required. It could be extended
to allow all parameters in the future, but should then apply to all
methods and not just ident.
We have a test that verifies that max(anyarray) will cope if the array
column elements aren't all the same array type. However, it's now possible
for that to produce a collation-related error message instead of the
expected one, if the first two column elements happen to be of the same
type and it's one that expects to be given collation info. Tweak the test
to ensure this doesn't happen. Per buildfarm member pika.
While this will give wrong answers when estimating selectivity for a
comparison operator that's using a non-default collation, the estimation
error probably won't be large; and anyway the former approach created
estimation errors of its own by trying to use a histogram that might have
been computed with some other collation. So we'll adopt this simplified
approach for now and perhaps improve it sometime in the future.
This patch incorporates changes from Andres Freund to make sure that
selfuncs.c passes a valid collation OID to any datatype-specific function
it calls, in case that function wants collation information. Said OID will
now always be DEFAULT_COLLATION_OID, but at least we won't get errors.
Add dummy returns before every potential division-by-zero in int8.c,
because apparently further "improvements" in gcc's optimizer have
enabled it to break functions that weren't broken before.
Aurelien Jarno, via Martin Pitt
CollateClause is now used only in raw grammar output, and CollateExpr after
parse analysis. This is for clarity and to avoid carrying collation names
in post-analysis parse trees: that's both wasteful and possibly misleading,
since the collation's name could be changed while the parsetree still
exists.
Also, clean up assorted infelicities and omissions in processing of the
node type.
Use collencoding = -1 to represent such a collation in pg_collation.
We need this to make the "default" entry work sanely, and a later
patch will fix the C/POSIX entries to be represented this way instead
of duplicating them across all encodings. All lookup operations now
search first for an entry that's database-encoding-specific, and then
for the same name with collencoding = -1.
Also some incidental code cleanup in collationcmds.c and pg_collation.c.
variable hiding. A constant is not a variable. It worked in most cases by
accident, because we add constants to the global list of variables (why?),
but float constants like 1.23 were interpreted as struct field references,
and not found.
Backpatch to 9.0, where the test for variable hiding was added.
Including collation in the behavior of that function promotes a world view
we do not want. Moreover, it was producing the wrong behavior for pg_dump
anyway: what we want is to dump a COLLATE clause on attributes whose
attcollation is different from the underlying type, and likewise for
domains, and the function cannot do that for us. Doing it the hard way
in pg_dump is a bit more tedious but produces more correct output.
In passing, fix initdb so that the initial entry in pg_collation is
properly pinned. It was droppable before :-(
It's not a good idea to kill the postmaster just because someone muffs
this, and it's not consistent with what we do for other, similar GUCs.
Fujii Masao, with a bit more hacking by me
SyncRepRequested() must check not only the value of the
synchronous_replication GUC but also whether max_wal_senders > 0.
Otherwise, we might end up waiting for sync rep even when there's no
possibility of a standby ever managing to connect. There are some
existing cross-checks to prevent this, but they're not quite sufficient:
the user can start the server with max_wal_senders=0,
synchronous_standby_names='', and synchronous_replication=off and then
subsequent make synchronous_standby_names not empty using pg_ctl reload,
and then SET synchronous_standby=on, leading to an indefinite hang.
Along the way, rename the global variable for the synchronous_replication
GUC to match the name of the GUC itself, for clarity.
Report by Fujii Masao, though I didn't use his patch.
In earlier versions of the sync rep patch, waiters removed themselves from
the queue, but now walsender removes them before doing the wakeup.
Report by Fujii Masao.
Tom Lane pointed out that it was giving a warning: "-s option given but
default rule can be matched". That was because there was no rule to handle
newline in a quoted string. I made that throw an error.
Also, line number tracking was broken, giving incorrect line number on
error. Fixed that too.
At least two recent commits have apparently imagined that a comment in
a Makefile stating that something would be included in the distribution
tarball was sufficient to make it so. They hadn't bothered to hook
into the upper maintainer-clean targets either. Per bug #5923 from
Charles Johnson, in which it emerged that the 9.1alpha4 tarballs are
short a few files that should be there.
The initial collations patch treated a COLLATE spec as part of a TypeName,
following what can only be described as brain fade on the part of the SQL
committee. It's a lot more reasonable to treat COLLATE as a syntactically
separate object, so that it can be added in only the productions where it
actually belongs, rather than needing to reject it in a boatload of places
where it doesn't belong (something the original patch mostly failed to do).
In addition this change lets us meet the spec's requirement to allow
COLLATE anywhere in the clauses of a ColumnDef, and it avoids unfriendly
behavior for constructs such as "foo::type COLLATE collation".
To do this, pull collation information out of TypeName and put it in
ColumnDef instead, thus reverting most of the collation-related changes in
parse_type.c's API. I made one additional structural change, which was to
use a ColumnDef as an intermediate node in AT_AlterColumnType AlterTableCmd
nodes. This provides enough room to get rid of the "transform" wart in
AlterTableCmd too, since the ColumnDef can carry the USING expression
easily enough.
Also fix some other minor bugs that have crept in in the same areas,
like failure to copy recently-added fields of ColumnDef in copyfuncs.c.
While at it, document the formerly secret ability to specify a collation
in ALTER TABLE ALTER COLUMN TYPE, ALTER TYPE ADD ATTRIBUTE, and
ALTER TYPE ALTER ATTRIBUTE TYPE; and correct some misstatements about
what the default collation selection will be when COLLATE is omitted.
BTW, the three-parameter form of format_type() should go away too,
since it just contributes to the confusion in this area; but I'll do
that in a separate patch.
Formerly, any member of a role could change the role's comment, as of
course could superusers; but holders of CREATEROLE privilege could not,
unless they were also members. This led to the odd situation that a
CREATEROLE holder could create a role but then could not comment on it.
It also seems a bit dubious to let an unprivileged user change his own
comment, let alone those of group roles he belongs to. So, change the
rule to be "you must be superuser to comment on a superuser role, or
hold CREATEROLE to comment on non-superuser roles". This is the same
as the privilege check for creating/dropping roles, and thus fits much
better with the rule for other object types, namely that only the owner
of an object can comment on it.
In passing, clean up the documentation for COMMENT a little bit.
Per complaint from Owen Jacobson and subsequent discussion.
In addition to the
all-foo-recurse: all-bar-recurse
dependencies that constraint the order of the rule execution, we need
install-foo-recurse: install-bar-recurse
dependencies in case one runs make install without a make all first,
as some people apparently do.
We really need an automated check for this ... and did VALIDATE really
need to become a keyword at all, rather than picking some other syntax
using existing keywords?
race condition where SummarizeOldestCommittedSxact() is called even though
another backend already cleared out all finished sxact entries. That's OK,
RegisterSerializableTransactionInt() can just retry getting a news xact
slot from the available-list when that happens.
Reported by YAMAMOTO Takashi, bug #5918.
contains newly-inserted tuples that according to our OldestXmin are not
yet visible to everyone. The value returned by GetOldestXmin() is conservative,
and it can move backwards on repeated calls, so if we see that contradiction
between the PD_ALL_VISIBLE flag and status of tuples on the page, we have to
assume it's because an earlier vacuum calculated a higher OldestXmin value,
and all the tuples really are visible to everyone.
We have received several reports of this bug, with the "PD_ALL_VISIBLE flag
was incorrectly set in relation ..." warning appearing in logs. We were
finally able to hunt it down with David Gould's help to run extra diagnostics
in an environment where this happened frequently.
Also reword the warning, per Robert Haas' suggestion, to not imply that the
PD_ALL_VISIBLE flag is necessarily at fault, as it might also be a symptom
of corruption on a tuple header.
Backpatch to 8.4, where the PD_ALL_VISIBLE flag was introduced.
than doing it aggressively whenever the tail-XID pointer is advanced, because
this way we don't need to do it while holding SerializableXactHashLock.
This also fixes bug #5915 spotted by YAMAMOTO Takashi, and removes an
obsolete comment spotted by Kevin Grittner.
This improves reporting, as the error string now includes the actual
Python exception. As a side effect, this no longer sets the errcode to
ERRCODE_DATA_EXCEPTION, which might be considered a feature, as it's
not documented and not clear why iterator errors should be treated
differently.
Jan Urbański
periodically rescan the archive for new timelines, while waiting for new WAL
segments to arrive. This allows you to set up a standby server that follows
the TLI change if another standby server is promoted to master. Before this,
you had to restart the standby server to make it notice the new timeline.
This patch only scans the archive for TLI changes, it won't follow a TLI
change in streaming replication. That is much needed too, but it would be a
much bigger patch than I dare to sneak in this late in the release cycle.
There was discussion on improving the sanity checking of the WAL segments so
that the system would notice more reliably if the new timeline isn't an
ancestor of the current one, but that is not included in this patch.
Reviewed by Fujii Masao.
Seen with an older gcc version. I'm not sure these represent any real
risk factor, but still a bit scary. Anyway we have lots of other
volatile-marked variables in this code, so a couple more won't hurt.
If a standby is broadcasting reply messages and we have named
one or more standbys in synchronous_standby_names then allow
users who set synchronous_replication to wait for commit, which
then provides strict data integrity guarantees. Design avoids
sending and receiving transaction state information so minimises
bookkeeping overheads. We synchronize with the highest priority
standby that is connected and ready to synchronize. Other standbys
can be defined to takeover in case of standby failure.
This version has very strict behaviour; more relaxed options
may be added at a later date.
Simon Riggs and Fujii Masao, with reviews by Yeb Havinga, Jaime
Casanova, Heikki Linnakangas and Robert Haas, plus the assistance
of many other design reviewers.
Since this field is after a variable-length field, it can't simply be
accessed via the C struct for pg_index. Fortunately, the relcache already
did the dirty work of pulling the information out to where it can be
accessed easily, so this is a one-line fix.
Andres Freund
The original scheme for this was to symlink plpython.$DLSUFFIX to
plpython2.$DLSUFFIX, but that doesn't work on Windows, and only
accidentally failed to fail because of the way that CREATE LANGUAGE created
or didn't create new C functions. My changes of yesterday exposed the
weakness of that approach. To fix, get rid of the symlink and make
pg_pltemplate show what's really going on.
In createlang this is a one-line change. In droplang there's a whole
lot of cruft that can be discarded since the extension mechanism now
manages removal of the language's support functions.
Also, add deprecation notices to these two programs' reference pages,
since per discussion we may toss them overboard altogether in a release
or two.
This mostly just involves creating control, install, and
update-from-unpackaged scripts for them. However, I had to adjust plperl
and plpython to not share the same support functions between variants,
because we can't put the same function into multiple extensions.
catversion bump forced due to new contents of pg_pltemplate, and because
initdb now installs plpgsql as an extension not a bare language.
Add support for regression testing these as extensions not bare
languages.
Fix a couple of other issues that popped up while testing this: my initial
hack at pg_dump binary-upgrade support didn't work right, and we don't want
an extra schema permissions test after all.
Documentation changes still to come, but I'm committing now to see
whether the MSVC build scripts need work (likely they do).
It is possible that an expression ends up with a collatable type but
without a collation. CREATE TABLE AS could then create a table based
on that. But such a column cannot be dumped with valid SQL syntax, so
we disallow creating such a column.
per test report from Noah Misch
Remove the unconditional superuser permissions check in CREATE EXTENSION,
and instead define a "superuser" extension property, which when false
(not the default) skips the superuser permissions check. In this case
the calling user only needs enough permissions to execute the commands
in the extension's installation script. The superuser property is also
enforced in the same way for ALTER EXTENSION UPDATE cases.
In other ALTER EXTENSION cases and DROP EXTENSION, test ownership of
the extension rather than superuserness. ALTER EXTENSION ADD/DROP needs
to insist on ownership of the target object as well; to do that without
duplicating code, refactor comment.c's big switch for permissions checks
into a separate function in objectaddress.c.
I also removed the superuserness checks in pg_available_extensions and
related functions; there's no strong reason why everybody shouldn't
be able to see that info.
Also invent an IF NOT EXISTS variant of CREATE EXTENSION, and use that
in pg_dump, so that dumps won't fail for installed-by-default extensions.
We don't have any of those yet, but we will soon.
This is all per discussion of wrapping the standard procedural languages
into extensions. I'll make those changes in a separate commit; this is
just putting the core infrastructure in place.
Instead, just act as though the certificate file(s) are not present.
There is only one case where this need be a hard failure condition: when
sslmode is verify-ca or verify-full, not having a root cert file is an
error. Change the logic so that we complain only in that case, and
otherwise fall through cleanly. This is how it used to behave pre-9.0,
but my patch 4ed4b6c54e of 2010-05-26 broke
the case. Per report from Christian Kastner.
Instead of manually maintaining the "implementation of XXX operator"
comments in pg_proc.h, delete all those entries and let initdb create
them via a join. To let initdb figure out which name to use when there
is a conflict, change the comments for deprecated operators to say they
are deprecated --- which seems like a good thing to do anyway.
This works around the problem noted by Yamamoto Takashi in bug #5906,
that there were code paths whereby we could reach AtCleanup_Portals
with a portal's cleanup hook still unexecuted. The changes I made
a few days ago were intended to prevent that from happening, and
I think that on balance it's still a good thing to avoid, so I don't
want to remove the Assert in AtCleanup_Portals. Hence do this instead.
Historically, we've not had separate comments for built-in pg_operator
entries, but relied on the comments for the underlying functions. The
trouble with this approach is that there isn't much of anything to suggest
to users that they'd be better off using the operators instead. So, move
all the relevant comments into pg_operator, and give each underlying
function a comment that just says "implementation of XXX operator".
There are only about half a dozen cases where it seems reasonable to use
the underlying function interchangeably with the operator; in these cases
I left the same comment in place on the function as on the operator.
While at it, establish a policy that every built-in function and operator
entry should have a comment: there are now queries in the opr_sanity
regression test that will complain if one doesn't. This only required
adding a dozen or two more entries than would have been there anyway.
I also spent some time trying to eliminate gratuitous inconsistencies in
the style of the comments, though it's hopeless to suppose that more won't
creep in soon enough.
Per my proposal of 2010-10-15.
it a lot more useful for determining which standby is most up-to-date,
for example. There was long discussions on whether overwriting existing
existing WAL makes sense to begin with, and whether we should do some more
extensive variable renaming, but this change nevertheless seems quite
uncontroversial.
Fujii Masao, reviewed by Jeff Janes, Robert Haas, Stephen Frost.
Change the way UPDATEs are handled. Instead of maintaining a chain of
tuple-level locks in shared memory, copy any existing locks on the old
tuple to the new tuple at UPDATE. Any existing page-level lock needs to
be duplicated too, as a lock on the new tuple. That was neglected
previously.
Store xmin on tuple-level predicate locks, to distinguish a lock on an old
already-recycled tuple from a new tuple at the same physical location.
Failure to distinguish them caused loops in the tuple-lock chains, as
reported by YAMAMOTO Takashi. Although we don't use the chain representation
of UPDATEs anymore, it seems like a good idea to store the xmin to avoid
some false positives if no other reason.
CheckSingleTargetForConflictsIn now correctly handles the case where a lock
that's being held is not reflected in the local lock table. That happens
if another backend acquires a lock on our behalf due to an UPDATE or a page
split.
PredicateLockPageCombine now retains locks for the page that is being
removed, rather than removing them. This prevents a potentially dangerous
false-positive inconsistency where the local lock table believes that a lock
is held, but it is actually not.
Dan Ports and Kevin Grittner
Per discussion, this seems important for plans involving writable CTEs,
since there can now be more than one ModifyTable node in the plan.
To retain the same formatting as for target tables of scan nodes, we
show only one target table, which will be the parent table in case of
an UPDATE or DELETE on an inheritance tree. Individual child tables
can be determined by inspecting the child plan trees if needed.
Without this patch, when wal_receiver_status_interval=0, indicating that no
status messages should be sent, Hot Standby feedback messages are instead sent
extremely frequently.
Fujii Masao, with documentation changes by me.
With this patch, portals, SQL functions, and SPI all agree that there
should be only a CommandCounterIncrement between the queries that are
generated from a single SQL command by rule expansion. Fetching a whole
new snapshot now happens only between original queries. This is equivalent
to the existing behavior of EXPLAIN ANALYZE, and it was judged to be the
best choice since it eliminates one source of concurrency hazards for
rules. The patch should also make things marginally faster by reducing the
number of snapshot push/pop operations.
The patch removes pg_parse_and_rewrite(), which is no longer used anywhere.
There was considerable discussion about more aggressive refactoring of the
query-processing functions exported by postgres.c, but for the moment
nothing more has been done there.
I also took the opportunity to refactor snapmgr.c's API slightly: the
former PushUpdatedSnapshot() has been split into two functions.
Marko Tiikkaja, reviewed by Steve Singer and Tom Lane
This provides a separate exception class for each error code that the
backend defines, as well as the ability to get the SQLSTATE from the
exception object.
Jan Urbański, reviewed by Steve Singer
Adds a context manager, obtainable by plpy.subtransaction(), to run a
group of statements in a subtransaction.
Jan Urbański, reviewed by Steve Singer, additional scribbling by me
We don't have complete expected coverage for Python 2.2 anyway, so it
doesn't seem worth keeping this one around that no one appears to be
updating anyway. Visual inspection of the differences ought to be
good enough for those few who care about this obsolete Python version.
The originally committed patch for modifying CTEs didn't interact well
with EXPLAIN, as noted by myself, and also had corner-case problems with
triggers, as noted by Dean Rasheed. Those problems show it is really not
practical for ExecutorEnd to call any user-defined code; so split the
cleanup duties out into a new function ExecutorFinish, which must be called
between the last ExecutorRun call and ExecutorEnd. Some Asserts have been
added to these functions to help verify correct usage.
It is no longer necessary for callers of the executor to call
AfterTriggerBeginQuery/AfterTriggerEndQuery for themselves, as this is now
done by ExecutorStart/ExecutorFinish respectively. If you really need to
suppress that and do it for yourself, pass EXEC_FLAG_SKIP_TRIGGERS to
ExecutorStart.
Also, refactor portal commit processing to allow for the possibility that
PortalDrop will invoke user-defined code. I think this is not actually
necessary just yet, since the portal-execution-strategy logic forces any
non-pure-SELECT query to be run to completion before we will consider
committing. But it seems like good future-proofing.
This allows functions with multiple OUT parameters returning both one
or multiple records (RECORD or SETOF RECORD).
Jan Urbański, reviewed by Hitoshi Harada
We need ExecutorEnd to run the ModifyTable nodes to completion in
reverse order of initialization, not forward order. Easily done
by constructing the list back-to-front.
This patch implements data-modifying WITH queries according to the
semantics that the updates all happen with the same command counter value,
and in an unspecified order. Therefore one WITH clause can't see the
effects of another, nor can the outer query see the effects other than
through the RETURNING values. And attempts to do conflicting updates will
have unpredictable results. We'll need to document all that.
This commit just fixes the code; documentation updates are waiting on
author.
Marko Tiikkaja and Hitoshi Harada
Emit a log message when creating a named restore point, and improve
documentation for pg_create_restore_point().
Euler Taveira de Oliveira, per suggestions from Thom Brown, with some
additional wordsmithing by me.
- ALTER FOREIGN DATA WRAPPER with HANDLER
- ALTER TABLE VALIDATE CONSTRAINT
- ALTER TYPE ADD VALUE
- COPY with ENCODING and FORCE NOT NULL
- CREATE FOREIGN DATA WRAPPER with HANDLER
- CREATE TRIGGER ... INSTEAD OF
The recent additions for FDW support required checking foreign-table-ness
in several places in the parse/plan chain. While it's not clear whether
that would really result in a noticeable slowdown, it seems best to avoid
any performance risk by keeping a copy of the relation's relkind in
RangeTblEntry. That might have some other uses later, anyway.
Per discussion.
Add functions plpy.quote_ident, plpy.quote_literal,
plpy.quote_nullable, which wrap the equivalent SQL functions.
To be able to propagate char * constness properly, make the argument
of quote_literal_cstr() const char *. This also makes it more
consistent with quote_identifier().
Jan Urbański, reviewed by Hitoshi Harada, some refinements by Peter
Eisentraut
"SELECT ... INTO UNLOGGED tabname" works, but wasn't documented; CREATE
UNLOGGED SEQUENCE and CREATE UNLOGGED VIEW failed an assertion, instead
of throwing a sensible error.
Latter issue reported by Itagaki Takahiro; patch review by Tom Lane.
void_send is useful for the same reason that void_out doesn't throw error,
namely that someone might do "select void_returning_func(...)" from a
client that prefers to operate in binary mode. The void_recv function may
or may not have any practical use, but we provide it for symmetry.
Radosław Smogura
This was a leftover from the pre-8.1 design of junkfilters. It doesn't
seem to have any reason to live, since it's merely a combination of two
easy function calls, and not a well-designed combination at that (it
encourages callers to leak the result tuple).
ExecUpdate checked for whether ExecBRUpdateTriggers had returned a new
tuple value by seeing if the returned tuple was pointer-equal to the old
one. But the "old one" was in estate->es_junkFilter's result slot, which
would be scribbled on if we had done an EvalPlanQual update in response to
a concurrent update of the target tuple; therefore we were comparing a
dangling pointer to a live one. Given the right set of circumstances we
could get a false match, resulting in not forcing the tuple to be stored in
the slot we thought it was stored in. In the case reported by Maxim Boguk
in bug #5798, this led to "cannot extract system attribute from virtual
tuple" failures when trying to do "RETURNING ctid". I believe there is a
very-low-probability chance of more serious errors, such as generating
incorrect index entries based on the original rather than the
trigger-modified version of the row.
In HEAD, change all of ExecBRInsertTriggers, ExecIRInsertTriggers,
ExecBRUpdateTriggers, and ExecIRUpdateTriggers so that they continue to
have similar APIs. In the back branches I just changed
ExecBRUpdateTriggers, since there is no bug in the ExecBRInsertTriggers
case.
File encodings can be specified separately from client encoding.
If not specified, client encoding is used for backward compatibility.
Cases when the encoding doesn't match client encoding are slower
than matched cases because we don't have conversion procs for other
encodings. Performance improvement would be be a future work.
Original patch by Hitoshi Harada, and modified by me.
This is both very useful in its own right, and an important test case
for the core FDW support.
This commit includes a small refactoring of copy.c to expose its option
checking code as a separately callable function. The original patch
submission duplicated hundreds of lines of that code, which seemed pretty
unmaintainable.
Shigeru Hanada, reviewed by Itagaki Takahiro and Tom Lane
This commit provides the core code and documentation needed. A contrib
module test case will follow shortly.
Shigeru Hanada, Jan Urbanski, Heikki Linnakangas
Add a new libpq connection option client_encoding (which includes the
existing PGCLIENTENCODING environment variable), which besides an
encoding name accepts a special value "auto" that tries to determine
the encoding from the locale in the client's environment, using the
mechanisms that have been in use in initdb.
psql sets this new connection option to "auto" when running from a
terminal and not overridden by setting PGCLIENTENCODING.
original code by Heikki Linnakangas, with subsequent contributions by
Jaime Casanova, Peter Eisentraut, Stephen Frost, Ibrar Ahmed
Add a fdwhandler column to pg_foreign_data_wrapper, plus HANDLER options
in the CREATE FOREIGN DATA WRAPPER and ALTER FOREIGN DATA WRAPPER commands,
plus pg_dump support for same. Also invent a new pseudotype fdw_handler
with properties similar to language_handler.
This is split out of the "FDW API" patch for ease of review; it's all stuff
we will certainly need, regardless of any other details of the FDW API.
FDW handler functions will not actually get called yet.
In passing, fix some omissions and infelicities in foreigncmds.c.
Shigeru Hanada, Jan Urbanski, Heikki Linnakangas
The previous coding would try to process all SECTION_NONE items in the
initial sequential-restore pass, which failed if they were dependencies of
not-yet-restored items. Fix by postponing such items into the parallel
processing pass once we have skipped any non-PRE_DATA item.
Back-patch into 9.0; the original parallel-restore coding in 8.4 did not
have this bug, so no need to change it.
Report and diagnosis by Arnd Hannemann.
They share the same locking namespace with the existing session-level
advisory locks, but they are automatically released at the end of the
current transaction and cannot be released explicitly via unlock
functions.
Marko Tiikkaja, reviewed by me.
More generally, arrays are turned in Perl array references, and row and
composite types are turned into Perl hash references. This is done
recursively, in a way that's natural to every Perl programmer.
To avoid a backwards compatibility hit, the string representation of
each structure is also available if the function requests it.
Authors: Alexey Klyukin and Alex Hunsaker.
Some code cleanups by me.
ts_typanalyze.c computes MCE statistics as fractions of the non-null rows,
which seems fairly reasonable, and anyway changing it in released versions
wouldn't be a good idea. But then ts_selfuncs.c has to account for that.
Failure to do so results in overestimates in columns with a significant
fraction of null documents. Back-patch to 8.4 where this stuff was
introduced.
Jesper Krogh
That function was supposing that indexoid == 0 for a hypothetical index,
but that is not likely to be true in any non-toy implementation of an index
adviser, since assigning a fake OID is the only way to know at EXPLAIN time
which hypothetical index got selected. Fix by adding a flag to
IndexOptInfo to mark hypothetical indexes. Back-patch to 9.0 where
get_actual_variable_range() was added.
Gurjeet Singh
These are needed to support reloading dumps of 9.0 installations containing
contrib/intarray or contrib/tsearch2. Since not only regular dump/reload
but binary upgrade would fail, it seems worth the trouble to carry these
stubs for awhile. Note that the contrib opclasses referencing these
functions will still work fine, since GIN doesn't actually pay any
attention to the declared signature of a support function.
Standby optionally sends back information about oldestXmin of queries
which is then checked and applied to the WALSender's proc->xmin.
GetOldestXmin() is modified slightly to agree with GetSnapshotData(),
so that all backends on primary include WALSender within their snapshots.
Note this does nothing to change the snapshot xmin on either master or
standby. Feedback piggybacks on the standby reply message.
vacuum_defer_cleanup_age is no longer used on standby, though parameter
still exists on primary, since some use cases still exist.
Simon Riggs, review comments from Fujii Masao, Heikki Linnakangas, Robert Haas
(I'm not entirely sure that we've finished bikeshedding the syntax details,
but the functionality seems OK.)
Pavel Stehule, reviewed by Stephen Frost and Tom Lane
They are expected to be used by extension modules like file_fdw.
There are no user-visible changes.
Itagaki Takahiro
Reviewed and tested by Kevin Grittner and Noah Misch.
Normally, pg_dump summarily excludes functions in pg_catalog from
consideration. However, some extensions may create functions in pg_catalog
(adminpack already does that, and extensions for procedural languages will
likely do it too). In binary-upgrade mode, we have to dump such functions,
or the extension will be incomplete after upgrading. Per experimentation
with adminpack.
Recent releases had a check on rel->rd_refcnt in heap_drop_with_catalog,
but failed to cover the possibility of pending trigger events at DROP time.
(Before 8.4 we didn't even check the refcnt.) When the trigger events were
eventually fired, you'd get "could not open relation with OID nnn" errors,
as in recent report from strk. Better to throw a suitable error when the
DROP is attempted.
Also add a similar check in DROP INDEX.
Back-patch to all supported branches.
though must not update the last transaction timestamp.
Plus comment and message cleanup for recent named restore point.
Fujii Masao, minor changes by me
The original design of pg_available_extensions did not consider the
possibility of version-specific control files. Split it into two views:
pg_available_extensions shows information that is generic about an
extension, while pg_available_extension_versions shows all available
versions together with information that could be version-dependent.
Also, add an SRF pg_extension_update_paths() to assist in checking that
a collection of update scripts provide sane update path sequences.
This allows us to have an unambiguous rule for deconstructing the names
of script files and secondary control files, without having to forbid
extension and version names from containing any dashes. We do have to
forbid them from containing double dashes or leading/trailing dashes,
but neither restriction is likely to bother anyone in practice.
Per discussion, this seems like a better solution overall than the
original design.
This change causes a multi-step update sequence to behave exactly as if the
updates had been commanded one at a time, including updating the "requires"
dependencies afresh at each step. The initial implementation took the
shortcut of examining only the final target version's "requires" and
changing the catalog entry but once. But on reflection that's a bad idea,
since it could lead to executing old update scripts under conditions
different than they were designed/tested for. Better to expend a few extra
cycles and avoid any surprises.
In the same spirit, if a CREATE EXTENSION FROM operation involves applying
a series of update files, it will act as though the CREATE had first been
done using the initial script's target version and then the additional
scripts were invoked with ALTER EXTENSION UPDATE.
I also removed the restriction about not changing encoding in secondary
control files. The new rule is that a script is assumed to be in whatever
encoding the control file(s) specify for its target version. Since this
reimplementation causes us to read each intermediate version's control
file, there's no longer any uncertainty about which encoding setting would
get applied.
relative, by creating a function path_is_relative_and_below_cwd() to
check for specific requirements. It is unclear if this fixes a security
problem or not but the new code is more robust.
- collowner field
- CREATE COLLATION
- ALTER COLLATION
- DROP COLLATION
- COMMENT ON COLLATION
- integration with extensions
- pg_dump support for the above
- dependency management
- psql tab completion
- psql \dO command
When the old type is binary coercible to the new type and the using
clause does not change the column contents, we can avoid a full table
rewrite, though any indexes on the affected columns will still need
to be rebuilt. This applies, for example, when changing a varchar
column to be of type text.
The prior coding assumed that the set of operations that force a
rewrite is identical to the set of operations that must be propagated
to tables making use of the affected table's rowtype. This is
no longer true: even though the tuples in those tables wouldn't
need to be modified, the data type change invalidate indexes built
using those composite type columns. Indexes on the table we're
actually modifying can be invalidated too, of course, but the
existing machinery is sufficient to handle that case.
Along the way, add some debugging messages that make it possible
to understand what operations ALTER TABLE is actually performing
in these cases.
Noah Misch and Robert Haas
Arrange for the control files to be in $SHAREDIR/extension not
$SHAREDIR/contrib, since we're generally trying to deprecate the term
"contrib" and this is a once-in-many-moons opportunity to get rid of it in
install paths. Fix PGXS to install the $EXTENSION file into that directory
no matter what MODULEDIR is set to; a nondefault MODULEDIR should only
affect the script and secondary extension files. Fix the control file
directory parameter to be interpreted relative to $SHAREDIR, to avoid a
surprising disconnect between how you specify that and what you set
MODULEDIR to.
Per discussion with David Wheeler.
This follows recent discussions, so it's quite a bit different from
Dimitri's original. There will probably be more changes once we get a bit
of experience with it, but let's get it in and start playing with it.
This is still just core code. I'll start converting contrib modules
shortly.
Dimitri Fontaine and Tom Lane
Per discussion with Noah Misch, the previous coding, introduced by
my commit 65377e0b9c on 2011-02-06,
was really an abuse of RELKIND_COMPOSITE_TYPE, since the caller in
typecmds.c is actually passing the name of a domain. So go back
having a type name argument, but make the first argument a Relation
rather than just a string so we can tell whether it's a table or
a foreign table and emit the proper error message.
the standby has written, flushed, and applied the WAL. At the moment, this
is for informational purposes only, the values are only shown in
pg_stat_replication system view, but in the future they will also be needed
for synchronous replication.
Extracted from Simon riggs' synchronous replication patch by Robert Haas, with
some tweaking by me.
Tracks one counter for each database, which is reset whenever
the statistics for any individual object inside the database is
reset, and one counter for the background writer.
Tomas Vondra, reviewed by Greg Smith
Flattening of subquery range tables during setrefs.c could lead to the
rangetable indexes in PlanRowMark nodes not matching up with the column
names previously assigned to the corresponding resjunk ctid (resp. tableoid
or wholerow) columns. Typical symptom would be either a "cannot extract
system attribute from virtual tuple" error or an Assert failure. This
wasn't a problem before 9.0 because we didn't support FOR UPDATE below the
top query level, and so the final flattening could never renumber an RTE
that was relevant to FOR UPDATE. Fix by using a plan-tree-wide unique
number for each PlanRowMark to label the associated resjunk columns, so
that the number need not change during flattening.
Per report from David Johnston (though I'm darned if I can see how this got
past initial testing of the relevant code). Back-patch to 9.0.
This follows my proposal of yesterday, namely that we try to recreate the
previous state of the extension exactly, instead of allowing CREATE
EXTENSION to run a SQL script that might create some entirely-incompatible
on-disk state. In --binary-upgrade mode, pg_dump won't issue CREATE
EXTENSION at all, but instead uses a kluge function provided by
pg_upgrade_support to recreate the pg_extension row (and extension-level
pg_depend entries) without creating any member objects. The member objects
are then restored in the same way as if they weren't members, in particular
using pg_upgrade's normal hacks to preserve OIDs that need to be preserved.
Then, for each member object, ALTER EXTENSION ADD is issued to recreate the
pg_depend entry that marks it as an extension member.
In passing, fix breakage in pg_upgrade's enum-type support: somebody didn't
fix it when the noise word VALUE got added to ALTER TYPE ADD. Also,
rationalize parsetree representation of COMMENT ON DOMAIN and fix
get_object_address() to allow OBJECT_DOMAIN.
My original idea of doing extension member identification during
getDependencies() didn't work correctly: we have to mark member tables as
not-to-be-dumped rather earlier than that, else their subsidiary objects
like indexes get dumped anyway. Rearrange code to mark them early enough.
This is an essential component of making the extension feature usable;
first because it's needed in the process of converting an existing
installation containing "loose" objects of an old contrib module into
the extension-based world, and second because we'll have to use it
in pg_dump --binary-upgrade, as per recent discussion.
Loosely based on part of Dimitri Fontaine's ALTER EXTENSION UPGRADE
patch.
Specifying this option makes the server not wait for the
xlog to be archived, or emit a warning that it can't,
instead leaving the responsibility with the client.
This is useful when the log is being streamed using
the streaming protocol in parallel with the backup,
without having log archiving enabled.
Older versions of gcc tend to throw "variable might be clobbered by
`longjmp' or `vfork'" warnings whenever a variable is assigned in more than
one place and then used after the end of a PG_TRY block. That's reasonably
easy to work around in execute_extension_script, and the overhead of
unconditionally saving/restoring the GUC variables seems unlikely to be a
serious concern.
Also clean up logic in ATExecValidateConstraint to make it easier to read
and less likely to provoke "variable might be used uninitialized in this
function" warnings.
This patch adds the server infrastructure to support extensions.
There is still one significant loose end, namely how to make it play nice
with pg_upgrade, so I am not yet committing the changes that would make
all the contrib modules depend on this feature.
In passing, fix a disturbingly large amount of breakage in
AlterObjectNamespace() and callers.
Dimitri Fontaine, reviewed by Anssi Kääriäinen,
Itagaki Takahiro, Tom Lane, and numerous others
This adds collation support for columns and domains, a COLLATE clause
to override it per expression, and B-tree index support.
Peter Eisentraut
reviewed by Pavel Stehule, Itagaki Takahiro, Robert Haas, Noah Misch
new recovery.conf parameter recovery_target_name allows PITR to
specify named points as recovery targets.
Jaime Casanova, reviewed by Euler Taveira de Oliveira, plus minor edits
If the standby was streaming when trigger file arrives, check also in the
archive for additional WAL files. This is a corner case since it is
unlikely that we would trigger a failover while the master is still
available and sending data to standby, while at the same time running in
archive mode and also while the streaming standby has fallen behind archive.
Someone would eventually be unlucky; we must plug all gaps however small.
Fujii Masao
FK constraints that are marked NOT VALID may later be VALIDATED, which uses an
ShareUpdateExclusiveLock on constraint table and RowShareLock on referenced
table. Significantly reduces lock strength and duration when adding FKs.
New state visible from psql.
Simon Riggs, with reviews from Marko Tiikkaja and Robert Haas
Waiting for relation locks can lead to starvation - it pins down an
autovacuum worker for as long as the lock is held. But if we're doing
an anti-wraparound vacuum, then we still wait; maintenance can no longer
be put off.
To assist with troubleshooting, if log_autovacuum_min_duration >= 0,
we log whenever an autovacuum or autoanalyze is skipped for this reason.
Per a gripe by Josh Berkus, and ensuing discussion.
Until now, our Serializable mode has in fact been what's called Snapshot
Isolation, which allows some anomalies that could not occur in any
serialized ordering of the transactions. This patch fixes that using a
method called Serializable Snapshot Isolation, based on research papers by
Michael J. Cahill (see README-SSI for full references). In Serializable
Snapshot Isolation, transactions run like they do in Snapshot Isolation,
but a predicate lock manager observes the reads and writes performed and
aborts transactions if it detects that an anomaly might occur. This method
produces some false positives, ie. it sometimes aborts transactions even
though there is no anomaly.
To track reads we implement predicate locking, see storage/lmgr/predicate.c.
Whenever a tuple is read, a predicate lock is acquired on the tuple. Shared
memory is finite, so when a transaction takes many tuple-level locks on a
page, the locks are promoted to a single page-level lock, and further to a
single relation level lock if necessary. To lock key values with no matching
tuple, a sequential scan always takes a relation-level lock, and an index
scan acquires a page-level lock that covers the search key, whether or not
there are any matching keys at the moment.
A predicate lock doesn't conflict with any regular locks or with another
predicate locks in the normal sense. They're only used by the predicate lock
manager to detect the danger of anomalies. Only serializable transactions
participate in predicate locking, so there should be no extra overhead for
for other transactions.
Predicate locks can't be released at commit, but must be remembered until
all the transactions that overlapped with it have completed. That means that
we need to remember an unbounded amount of predicate locks, so we apply a
lossy but conservative method of tracking locks for committed transactions.
If we run short of shared memory, we overflow to a new "pg_serial" SLRU
pool.
We don't currently allow Serializable transactions in Hot Standby mode.
That would be hard, because even read-only transactions can cause anomalies
that wouldn't otherwise occur.
Serializable isolation mode now means the new fully serializable level.
Repeatable Read gives you the old Snapshot Isolation level that we have
always had.
Kevin Grittner and Dan Ports, reviewed by Jeff Davis, Heikki Linnakangas and
Anssi Kääriäinen
String are converted to UTF8 on the way into perl and to the
database encoding on the way back. This avoids a number of
observed anomalies, and ensures Perl a consistent view of the
world.
Some minor code cleanups are also accomplished.
Alex Hunsaker, reviewed by Andy Colson.
If the foreign table's rowtype is being used as the type of a column in
another table, we can't just up and change its data type. This was
already checked for composite types and ordinary tables, but we
previously failed to enforce it for foreign tables.
Make sure it's clear that the prohibition on adding a column with a default
when the rowtype is used elsewhere is intentional, and be a bit more
explicit about the other cases where we perform this check.
This fixes make distprep, and seems more robust in other ways as well.
Some special handling is required because errcodes.txt is needed by
some stuff in src/port, but just by src/backend as is the case for the
other generated headers.
While I'm at it, fix a few other things that were overlooked in the
original patch.
src/pl/plpgsql/src/plerrcodes.h, src/include/utils/errcodes.h, and a
big chunk of errcodes.sgml are now automatically generated from a single
file, src/backend/utils/errcodes.txt.
Jan Urbański, reviewed by Tom Lane.
Add the current xlog insert location to the response of
IDENTIFY_SYSTEM, and adds result sets containing start
and stop location of backups to BASE_BACKUP responses.
Prior to 9.0, restartpoints never created, deleted, or recycled WAL
files, but now they can. This code makes log_checkpoints treat
checkpoints and restartpoints symmetrically. It also adjusts up
the documentation of the parameter to mention restartpoints.
Fujii Masao. Docs by me, as suggested by Itagaki Takahiro.
This allows the language-specific try/catch construct to catch and
handle exceptions arising from SPI calls, matching the behavior of
other PLs.
As an additional bonus you no longer get all the ugly "unrecognized
error in PLy_spi_execute_query" errors.
Jan Urbański, reviewed by Steve Singer
This is an efficiency change, and means we now no longer have to run
"out $_TD; local $_TD = shift;", which was especially pointless in the case of
non-trigger functions where the passed value was always undef anyway.
A tiny open issue is whether we should get rid of the $prolog argument of
mkfunc, and the corresponding pushed value, which is now just a constant "false".
Tim Bunce, reviewed by Alex Hunsaker.
New versions of libintl redefine setlocale() to a macro
which causes problems when the backend and libintl are
linked against different versions of the runtime, which
is often the case in msvc builds.
Hiroshi Inoue, slightly updated comment by me
Previously reported as ERRCODE_ADMIN_SHUTDOWN, this case is now
reported as ERRCODE_T_R_DATABASE_DROPPED. No message text change.
Unlikely to happen on most servers, so low impact change to allow
session poolers to correctly handle this situation.
Tatsuo Ishii, edits by me, review by Robert Haas
All retryable conflict errors now have an error code that indicates that
a retry is possible, correcting my incomplete fix of 2010/05/12
Tatsuo Ishii and Simon Riggs, input from Robert Haas and Florian Pflug
With this patch, pg_basebackup doesn't write a backup_label file in the
data directory, so it doesn't interfere with a pg_start/stop_backup() based
backup anymore. backup_label is still included in the backup, but it is
injected directly into the tar stream.
Heikki Linnakangas, reviewed by Fujii Masao and Magnus Hagander.
This can be used to build 64 bit Windows binaries, not only on 64 bit
Windows but on supported cross-compiling hosts including 32 bit Windows,
Cygwin, Darwin and Linux.
reduce_outer_joins() mistakenly treated a semijoin like a left join for
purposes of deciding whether not-null constraints created by the join's
quals could be passed down into the join's left-hand side (possibly
resulting in outer-join simplification there). Actually, semijoin works
like inner join for this purpose, ie, we do not need to see any rows that
can't possibly satisfy the quals. Hence, two-line fix to treat semi and
inner joins alike. Per observation by Andres Freund about a performance
gripe from Yazan Suleiman.
Back-patch to 8.4, since this oversight has been there since the current
handling of semijoins was implemented.
When included, this makes the base backup a complete working
"clone" of the initial database, ready to have a postmaster
started against it without the need to set up any log archiving
or similar.
Magnus Hagander, reviewed by Fujii Masao and Heikki Linnakangas
When we need to insert a new entry and the queue is full, compact the
entire queue in the hopes of making room for the new entry. Doing this
on every insertion might worsen contention on BgWriterCommLock, but
when the queue it's full, it's far better than allowing the backend to
perform its own fsync, per testing by Greg Smith as reported in
http://archives.postgresql.org/pgsql-hackers/2011-01/msg02665.php
Original idea from Greg Smith. Patch by me. Review by Chris Browne
and Greg Smith
We only need that header when compiling with icc, since the gcc variant of
ia64_get_bsp() uses in-line assembly code. Per report from Frank Brendel,
the header doesn't exist on all IA64 platforms; so don't include it unless
we need it.
This reverts commit a06e41deeb of 2011-01-26.
Per discussion, this behavior is not wanted, as it would need to change if
we ever made composite types support DEFAULT.
In the case where the initial call of systable_getnext_ordered() returned
NULL, this function would nonetheless call it again. That's undefined
behavior that only by chance failed to not give visibly incorrect results.
Put an if-test around the final loop to prevent that, and in passing
improve some comments. No back-patch since there's no actual failure.
Per report from YAMAMOTO Takashi.
Synchronize pg_config.h.in with configure.in (someone must have
forgotten to run autoheader or autoreconf), and clean up some spurious
change in configure introduced by the last commit there.
Use the built-in TypeError, not SPIError, for errors having to do with
argument counts or types. Use SPIError, not simply plpy.Error, for
errors in PLy_spi_execute_plan. Finally, do not set a Python
exception if PyArg_ParseTuple failed, as it already sets the correct
exception.
Jan Urbański
The previous coding prevented ALTER TABLE .. ADD COLUMN from being used
with a non-NULL default in situations where the table's rowtype was being
used elsewhere. But this is a completely arbitrary restriction since
you could do the same operation in multiple steps (add the column, add
the default, update the table).
Inspired by a patch from Noah Misch, though I didn't use his code.
There isn't any need to track this state on a table-wide basis, and trying
to do so introduces undesirable semantic fuzziness. Move the flag to
pg_index, where it clearly describes just a single index and can be
immutable after index creation.
This feature allows a unique or pkey constraint to be created using an
already-existing unique index. While the constraint isn't very
functionally different from the bare index, it's nice to be able to do that
for documentation purposes. The main advantage over just issuing a plain
ALTER TABLE ADD UNIQUE/PRIMARY KEY is that the index can be created with
CREATE INDEX CONCURRENTLY, so that there is not a long interval where the
table is locked against updates.
On the way, refactor some of the code in DefineIndex() and index_create()
so that we don't have to pass through those functions in order to create
the index constraint's catalog entries. Also, in parse_utilcmd.c, pass
around the ParseState pointer in struct CreateStmtContext to save on
notation, and add error location pointers to some error reports that didn't
have one before.
Gurjeet Singh, reviewed by Steve Singer and Tom Lane
This is still pretty rough - among other things, the documentation
needs work, and the messages need a visit from the style police -
but this gets the basic framework in place.
KaiGai Kohei
While doing this, also move base backup options into
a struct instead of increasing the number of parameters
to multiple functions for each new option.
The target of sscanf with a %o format had better be of integer width,
but "mode_t" conceivably isn't that. Another compiler warning seen
only on some platforms; this one I think is potentially a real bug
and not just a warning.
Include the lefttype/righttype columns explicitly (instead of assuming
the reader can deduce them from the operator or function description),
and move the operator or function description to the end of the string,
to make it clearer that it's a referenced object and not the amop or
amproc item itself. Per extensive discussion of Andreas Karlsson's
original patch.
Andreas Karlsson, Tom Lane
This reverts commit 740e54ca84, which seems
to have tickled an optimization bug in gcc 4.5.x, as reported upstream at
https://bugzilla.redhat.com/show_bug.cgi?id=671899
Since this patch had no purpose beyond code beautification, it's not
worth expending a lot of effort to look for another workaround.
This tool makes it possible to do the pg_start_backup/
copy files/pg_stop_backup step in a single command.
There are still some steps to be done before this is a
complete backup solution, such as the ability to stream
the required WAL logs, but it's still usable, and
could do with some buildfarm coverage.
In passing, make the checkpoint request optionally
fast instead of hardcoding it.
Magnus Hagander, reviewed by Fujii Masao and Dimitri Fontaine
As in commit fb4c5d2798 on 2011-01-21,
this avoids spurious debug messages and allows idempotent changes at
any time. Along the way, make assign_XactIsoLevel allow idempotent
changes even when not within a subtransaction, to be consistent with
the new coding of assign_transaction_read_only and because there's
no compelling reason to do otherwise.
Kevin Grittner, with some adjustments.
It's not clear to me what should happen to the other plpython_unicode
variant expected files, but this patch gets things passing on my own
machines and at least some of the buildfarm.
If wal_buffers is initially set to -1 (which is now the default), it's
replaced by 1/32nd of shared_buffers, with a minimum of 8 (the old default)
and a maximum of the XLOG segment size. The allowed range for manual
settings is still from 4 up to whatever will fit in shared memory.
Greg Smith, with implementation correction by me.
Not sure why I'm seeing this on Fedora 14 and not earlier versions.
Seems like a regression that gcc no longer knows that DIE() doesn't return.
Still, adding a dummy return is harmless enough.
It appears that gcc 4.5 can issue such warnings for whole structs, not
just scalar variables as in the past. Refactor some pg_dump code slightly
so that the OutputContext local variables are always initialized, even
if they won't be used. It's cheap enough to not be worth worrying about.
Global error handling led to confusion and was hard to manage. With
this change, errors from PostgreSQL are immediately reported to Python
as exceptions. This requires setting a Python exception after
reporting the caught PostgreSQL error as a warning, because PLy_elog
destroys the Python exception state.
Ideally, all places where PostgreSQL errors need to be reported back
to Python should be wrapped in subtransactions, to make going back to
Python from a longjmp safe. This will be handled in a separate patch.
Jan Urbański
The previous coding treated anything that wasn't an autovacuum launcher
as a normal backend, which is wrong now that we also have WAL senders.
Fujii Masao, reviewed by Robert Haas, Alvaro Herrera, Tom Lane,
and Bernd Helmle.
The new coding avoids a spurious debug message when a transaction
that has changed the isolation level has been rolled back. It also
allows the property to be freely changed to the current value within
a subtransaction.
Kevin Grittner, with one small change by me.
The way the exception types where added to the module was wrong for
Python 3. Exception classes were not actually available from plpy.
Fix that by factoring out code that is responsible for defining new
Python exceptions and make it work with Python 3. New regression test
makes sure the plpy module has the expected contents.
Jan Urbanśki, slightly revised by me
Failure to do so can lead to constraint violations. This was broken by
commit 1ddc2703a9 on 2010-02-07, so
back-patch to 9.0.
Noah Misch. Regression test by me.
Pay attention to the attisdropped field and skip over TupleDesc fields
that have it set. Not a real problem until we get table returning
functions, but it's the right thing to do anyway.
Jan Urbański
If the function using yield to return rows fails halfway, the iterator
stays open and subsequent calls to the function will resume reading
from it. The fix is to unref the iterator and set it to NULL if there
has been an error.
Jan Urbański
We can get the length of a compressed or out-of-line datum without actually
detoasting it. If the lengths of two strings are unequal, we can then
conclude they are unequal without detoasting. That saves considerable work
in an admittedly less-common case, without costing anything much when the
optimization doesn't apply.
Noah Misch
Two separate hash tables are used for regular procedures and for
trigger procedures, since the way trigger procedures work is quite
different from normal stored procedures. Change the signatures of
PLy_procedure_{get,create} to accept the function OID and a Boolean
flag indicating whether it's a trigger. This should make implementing
a PL/Python validator easier.
Using HTABs instead of Python dictionaries makes error recovery
easier, and allows for procedures to be cached based on their OIDs,
not their names. It also allows getting rid of the PyCObject field
that used to hold a pointer to PLyProcedure, since PyCObjects are
deprecated in Python 2.7 and replaced by Capsules in Python 3.
Jan Urbański
If the slice to be assigned to was before the existing array lower bound
(requiring at least one null element to spring into existence to fill the
gap), the code miscalculated how many entries needed to be copied from
the old array's null bitmap. This could result in trashing the array's
data area (as seen in bug #5840 from Karsten Loesing), or worse.
This has been broken since we first allowed the behavior of assigning to
non-adjacent slices, in 8.2. Back-patch to all affected versions.
Otherwise WAL recovery will replay the un-flushed WAL after walreceiver has
exited, which can lead to a non-recoverable standby if the system crashes hard
at that point.
This closes a race condition where if a tablespace was created
after the enumeration happened but before the do_pg_start_backup()
was called, the backup would be incomplete. Now that it's done
while we are in backup mode, WAL replay will recreate it during
restore.
Noted by Heikki.
backend, as far as the postmaster shutdown logic is concerned. That means,
fast shutdown will wait for WAL sender processes to exit before signaling
bgwriter to finish. This avoids race conditions between a base backup stopping
or starting, and bgwriter writing the shutdown checkpoint WAL record. We don't
want e.g the end-of-backup WAL record to be written after the shutdown
checkpoint.
Makes it easier to parse mainly the BASE_BACKUP command
with it's options, and avoids having to manually deal
with quoted identifiers in the label (previously broken),
and makes it easier to add new commands and options in
the future.
In passing, refactor the case statement in the walsender
to put each command in it's own function.
When the exit waits until the whole backup completes, it may take
a very long time.
In passing, add back an error check in the main loop so we detect
clients that disconnect much earlier if the backup is large.
Fix broken test for pre-existing postmaster, caused by wrong code for
appending lines to the lockfile; don't write a failed listen_address
setting into the lockfile; don't arbitrarily change the location of the
data directory in the lockfile compared to previous releases; provide more
consistent and useful definitions of the socket path and listen_address
entries; avoid assuming that pg_ctl has the same DEFAULT_PGSOCKET_DIR as
the postmaster; assorted code style improvements.
This reverts commit d1001a78ce of 2010-12-05,
which was broken as reported by Jeff Davis. The problem is that the
individual planning steps may have side-effects on substructures of
PlannerGlobal, not only the current PlannerInfo root. Arranging to keep
all such side effects in the main planning context is probably possible,
but it would change this from a quick local hack into a wide-ranging and
rather fragile endeavor. Which it's not worth.
that can be read without blocking. It used to conclude that there isn't, even
though there was data in the socket receive buffer. That lead walreceiver to
flush the WAL after every received chunk, potentially causing big performance
issues.
Backpatch to 9.0, because the performance impact can be very significant.
Changing a file two directory levels deep under src/backend/ would not
cause the postgres binary to be rebuilt. This change fixes it, but no
one knows why.
Instead, run them in the encoding that the locale selects, which is
more representative of real use.
Also document how locale and encoding for regression test runs can be
selected.
In an inherited UPDATE/DELETE, each target table has its own subplan,
because it might have a column set different from other targets. This
means that the resjunk columns we add to support EvalPlanQual might be
at different physical column numbers in each subplan. The EvalPlanQual
rewrite I did for 9.0 failed to account for this, resulting in possible
misbehavior or even crashes during concurrent updates to the same row,
as seen in a recent report from Gordon Shannon. Revise the data structure
so that we track resjunk column numbers separately for each subplan.
I also chose to move responsibility for identifying the physical column
numbers back to executor startup, instead of assuming that numbers derived
during preprocess_targetlist would stay valid throughout subsequent
massaging of the plan. That's a bit slower, so we might want to consider
undoing it someday; but it would complicate the patch considerably and
didn't seem justifiable in a bug fix that has to be back-patched to 9.0.
Some versions of gcc complain about "variable `tablespaces' might be
clobbered by `longjmp' or `vfork'" with the original coding. Fix by
moving the PG_TRY block into a separate subroutine.
Per my note of a couple days ago, create_index_paths would refuse to
consider any path at all for GIN indexes if the selectivity estimate came
out as 1.0; not even if you tried to force it with enable_seqscan. While
this isn't really a bad outcome in practice, it could be annoying for
testing purposes. Adjust the test for "is this path only useful for
sorting" so that it doesn't fire on paths with nil pathkeys, which will
include all GIN paths.
When in streaming mode we can never get out, so it will never
be required, but after a base backup (or other operations)
we can get back to the loop, so the title needs to be cleared.
Add BASE_BACKUP command to walsender, allowing it to stream a
base backup to the client (in tar format). The syntax is still
far from ideal, that will be fixed in the switch to use a proper
grammar for walsender.
No client included yet, will come as a separate commit.
Magnus Hagander and Heikki Linnakangas
Move the actual functionality into a separate function that's
easier to call internally, and change the SQL-callable function
to be a wrapper calling this.
Also create a pg_abort_backup() function, only callable internally,
that does only the most vital parts of pg_stop_backup(), making it
safe(r) to call from error handlers.
This applies the fix for bug #5784 to remaining places where we wish
to reject nulls in user-supplied arrays. In all these places, there's
no reason not to allow a null bitmap to be present, so long as none of
the current elements are actually null.
I did not change some other places where we are looking at system catalog
entries or aggregate transition values, as the presence of a null bitmap
in such an array would be suspicious.
Add support for reading back information about the symbolic
links we've created with pgsymlink(), which are actually
Junction Points. Just like pgsymlink() can only create directory
symlinks, pgreadlink() can only read directory symlinks.
This will support fixing contrib/intarray (and probably other places)
so that they don't have to fail on arrays that contain a null bitmap
but no live null entries.
The only reason this wasn't crashing while testing the core anyarray
operators was that it was disabled for those cases because of passing the
wrong type information to get_opfamily_proc :-(. So fix that too, and make
it insist on finding the support proc --- in hindsight, silently doing
nothing is not as sane a coping mechanism as all that.
The only use we have had for amindexnulls is in determining whether an
index is safe to cluster on; but since the addition of the amclusterable
flag, that usage is pretty redundant.
In passing, clean up assorted sloppiness from the last patch that touched
pg_am.h: Natts_pg_am was wrong, and ambuildempty was not documented.
The original coding could combine duplicate entries only when they
originated from the same qual condition. In particular it could not
combine cases where multiple qual conditions all give rise to full-index
scan requests, which is an expensive case well worth optimizing. Refactor
so that duplicates are recognized across all the quals.
which is stored in pg_largeobject_metadata.
No backpatch to 9.0 because you can't migrate from 9.0 to 9.0 with the
same catversion (because of tablespace conflict), and a pre-9.0
migration to 9.0 has not large object permissions to migrate.
Toast tables have identical pg_class.oid and pg_class.relfilenode, but
for clarity it is good to preserve the pg_class.oid.
Update comments regarding what is preserved, and do some
variable/function renaming for clarity.
Add more "internal" arguments so that these pg_proc entries reflect the
current preferred API. This is purely a cosmetic change, since GIN doesn't
actually consult the pg_proc entry when calling a support function.
Accordingly, no catversion bump.
Per my recent proposal(s). Null key datums can now be returned by
extractValue and extractQuery functions, and will be stored in the index.
Also, placeholder entries are made for indexable items that are NULL or
contain no keys according to extractValue. This means that the index is
now always complete, having at least one entry for every indexed heap TID,
and so we can get rid of the prohibition on full-index scans. A full-index
scan is implemented much the same way as partial-match scans were already:
we build a bitmap representing all the TIDs found in the index, and then
drive the results off that.
Also, introduce a concept of a "search mode" that can be requested by
extractQuery when the operator requires matching to empty items (this is
just as cheap as matching to a single key) or requires a full index scan
(which is not so cheap, but it sure beats failing or giving wrong answers).
The behavior remains backward compatible for opclasses that don't return
any null keys or request a non-default search mode.
Using these features, we can now make the GIN index opclass for anyarray
behave in a way that matches the actual anyarray operators for &&, <@, @>,
and = ... which it failed to do before in assorted corner cases.
This commit fixes the core GIN code and ginarrayprocs.c, updates the
documentation, and adds some simple regression test cases for the new
behaviors using the array operators. The tsearch and contrib GIN opclass
support functions still need to be looked over and probably fixed.
Another thing I intend to fix separately is that this is pretty inefficient
for cases where more than one scan condition needs a full-index search:
we'll run duplicate GinScanEntrys, each one of which builds a large bitmap.
There is some existing logic to merge duplicate GinScanEntrys but it needs
refactoring to make it work for entries belonging to different scan keys.
Note that most of gin.h has been split out into a new file gin_private.h,
so that gin.h doesn't export anything that's not supposed to be used by GIN
opclasses or the rest of the backend. I did quite a bit of other code
beautification work as well, mostly fixing comments and choosing more
appropriate names for things.
This can be overriden by using NOREPLICATION on the CREATE ROLE
statement, but by default they will have it, making it backwards
compatible and "less surprising" (given that superusers normally
override all checks).
Add new function pg_sequence_parameters that returns a sequence's start,
minimum, maximum, increment, and cycle values, and use that in the view.
(bug #5662; design suggestion by Tom Lane)
Also slightly adjust the view's column order and permissions after review of
SQL standard.
Foreign tables are a core component of SQL/MED. This commit does
not provide a working SQL/MED infrastructure, because foreign tables
cannot yet be queried. Support for foreign table scans will need to
be added in a future patch. However, this patch creates the necessary
system catalog structure, syntax support, and support for ancillary
operations such as COMMENT and SECURITY LABEL.
Shigeru Hanada, heavily revised by Robert Haas
This is advantageous first because it allows us to hash the smaller table
regardless of the outer-join type, and second because hash join can be more
flexible than merge join in dealing with arbitrary join quals in a FULL
join. For merge join all the join quals have to be mergejoinable, but hash
join will work so long as there's at least one hashjoinable qual --- the
others can be any condition. (This is true essentially because we don't
keep per-inner-tuple match flags in merge join, while hash join can do so.)
To do this, we need a has-it-been-matched flag for each tuple in the
hashtable, not just one for the current outer tuple. The key idea that
makes this practical is that we can store the match flag in the tuple's
infomask, since there are lots of bits there that are of no interest for a
MinimalTuple. So we aren't increasing the size of the hashtable at all for
the feature.
To write this without turning the hash code into even more of a pile of
spaghetti than it already was, I rewrote ExecHashJoin in a state-machine
style, similar to ExecMergeJoin. Other than that decision, it was pretty
straightforward.
Instead, declare a public wrapper of the sole function using it for
external callers, so that they don't have to always pass a NULL
argument.
Author: Kevin Grittner
The contents of an unlogged table are WAL-logged; thus, they are not
available on standby servers and are truncated whenever the database
system enters recovery. Indexes on unlogged tables are also unlogged.
Unlogged GiST indexes are not currently supported.
This privilege is required to do Streaming Replication, instead of
superuser, making it possible to set up a SR slave that doesn't
have write permissions on the master.
Superuser privileges do NOT override this check, so in order to
use the default superuser account for replication it must be
explicitly granted the REPLICATION permissions. This is backwards
incompatible change, in the interest of higher default security.
The "date" type supports a wider range of dates than int64 timestamps do.
However, there is pre-int64-timestamp code in the planner that assumes that
all date values can be converted to timestamp with impunity. Fortunately,
what we really need out of the conversion is always a double (float8)
value; so even when the date is out of timestamp's range it's possible to
produce a sane answer. All we need is a code path that doesn't try to
force the result into int64. Per trouble report from David Rericha.
Back-patch to all supported versions. Although this is surely a corner
case, there's not much point in advertising a date range wider than
timestamp's if we will choke on such values in unexpected places.
This is to avoid use of the C++ keywords "bitand" and "bitor" in
the header file utils/varbit.h. Note the functions' SQL-level
names are not changed, only their C-level names.
In passing, make some comments in varbit.c conform to project-standard
layout.
This is slower than the original coding but avoids the problem of
including files in an unpredictable order. Aside from being more
trustworthy, we can get rid of some exclusions that were formerly
made for what turn out to be ordering or re-inclusion problems.
I also modified it to include libpq's exported files in the check.
ecpg should be included as well, but I'm unclear on which ecpg .h
files are meant to be included by clients.
"private" is a keyword in C++, so this breaks the poorly-enforced policy
that header files should be include-able in C++ code. Per report from
Craig Ringer and some investigation with cpluspluscheck.
My previous commit, 85cff3ce7f on
2010-12-25, failed to update errcodes.sgml or plerrcodes.h. This patch
corrects that oversight, per a gripe from Tom Lane, and also corrects
a typographical error.
string". This is not really needed because the string gets copied to the output
untranslated anyway, but by adding this rule the lexer stays in sync with the
backend lexer.
cleanup stage to finish incomplete inserts or splits anymore. There was two
reasons for the cleanup step:
1. When a new tuple was inserted to a leaf page, the downlink in the parent
needed to be updated to contain (ie. to be consistent with) the new key.
Updating the parent in turn might require recursively updating the parent of
the parent. We now handle that by updating the parent while traversing down
the tree, so that when we insert the leaf tuple, all the parents are already
consistent with the new key, and the tree is consistent at every step.
2. When a page is split, we need to insert the downlink for the new right
page(s), and update the downlink for the original page to not include keys
that moved to the right page(s). We now handle that by setting a new flag,
F_FOLLOW_RIGHT, on the non-rightmost pages in the split. When that flag is
set, scans always follow the rightlink, regardless of the NSN mechanism used
to detect concurrent page splits. That way the tree is consistent right after
split, even though the downlink is still missing. This is very similar to the
way B-tree splits are handled. When the downlink is inserted in the parent,
the flag is cleared. To keep the insertion algorithm simple, when an
insertion sees an incomplete split, indicated by the F_FOLLOW_RIGHT flag, it
finishes the split before doing anything else.
These changes allow removing the whole "invalid tuple" mechanism, but I
retained the scan code to still follow invalid tuples correctly. While we
don't create any such tuples anymore, we want to handle them gracefully in
case you pg_upgrade a GiST index that has them. If we encounter any on an
insert, though, we just throw an error saying that you need to REINDEX.
The issue that got me into doing this is that if you did a checkpoint while
an insert or split was in progress, and the checkpoint finishes quickly so
that there is no WAL record related to the insert between RedoRecPtr and the
checkpoint record, recovery from that checkpoint would not know to finish
the incomplete insert. IOW, we have the same issue we solved with the
rm_safe_restartpoint mechanism during normal operation too. It's highly
unlikely to happen in practice, and this fix is far too large to backpatch,
so we're just going to live with in previous versions, but this refactoring
fixes it going forward.
With this patch, you don't get the annoying
'index "FOO" needs VACUUM or REINDEX to finish crash recovery' notices
anymore if you crash at an unfortunate moment.
This function is like the PQserverVersion() function except
it returns the version of libpq, making it possible for a client
program or driver to determine which version of libpq is in
use at runtime, and not just at link time.
Suggested by Harald Armin Massa and several others.
It appears that this will be faster for all but the shortest strings;
at least one some platforms, memcmp() can use word-at-a-time comparisons.
Noah Misch, somewhat pared down.
On MacOS X, and apparently also on other BSD-derived systems, attaching
a debugger causes getppid() to return the pid of the debugging process
rather than the actual parent PID. As a result, debugging the
autovacuum launcher, startup process, or WAL sender on such systems
causes it to exit, because the previous coding of PostmasterIsAlive()
detects postmaster death by testing whether getppid() == PostmasterPid.
Work around that behavior by checking the return value of getppid()
more carefully. If it's PostmasterPid, the postmaster must be alive;
if it's 1, assume the postmaster is dead. If it's any other value,
assume we've been debugged and fall through to the less-reliable
kill() test.
Review by Tom Lane.
This case can arise if a transaction has written data, but only to
temporary tables. Loss of the commit record in case of a crash won't
matter, because the temporary tables will be lost anyway.
Reviewed by Heikki Linnakangas and Simon Riggs.
Since we're not multithreaded it only provides marginally useful
information, and it does require a newer version of the Platform SDK
than we target. We may want to reconsider this in the future along
with a fix for MinGW.
eval_const_expressions() can replace CaseTestExprs with constants when
the surrounding CASE's test expression is a constant. This confuses
ruleutils.c's heuristic for deparsing simple-form CASEs, leading to
Assert failures or "unexpected CASE WHEN clause" errors. I had put in
a hack solution for that years ago (see commit
514ce7a331 of 2006-10-01), but bug #5794
from Peter Speck shows that that solution failed to cover all cases.
Fortunately, there's a much better way, which came to me upon reflecting
that Peter's "CASE TRUE WHEN" seemed pretty redundant: we can "simplify"
the simple-form CASE to the general form of CASE, by simply omitting the
constant test expression from the rebuilt CASE construct. This is
intuitively valid because there is no need for the executor to evaluate
the test expression at runtime; it will never be referenced, because any
CaseTestExprs that would have referenced it are now replaced by constants.
This won't save a whole lot of cycles, since evaluating a Const is pretty
cheap, but a cycle saved is a cycle earned. In any case it beats kluging
ruleutils.c still further. So this patch improves const-simplification
and reverts the previous change in ruleutils.c.
Back-patch to all supported branches. The bug exists in 8.1 too, but it's
out of warranty.
After parsing a parenthesized subexpression, we must pop all pending
ANDs and NOTs off the stack, just like the case for a simple operand.
Per bug #5793.
Also fix clones of this routine in contrib/intarray and contrib/ltree,
where input of types query_int and ltxtquery had the same problem.
Back-patch to all supported versions.
Add support for collecting "minidump" style crash dumps on
Windows, by setting up an exception handling filter. Crash
dumps will be generated in PGDATA/crashdumps if the directory
is created (the existance of the directory is used as on/off
switch for the generation of the dumps).
Craig Ringer and Magnus Hagander
This prevents the word "waiting" from briefly disappearing from the ps
status line when ResolveRecoveryConflictWithVirtualXIDs begins a new
iteration of the outer loop.
Along the way, remove some useless pgstat_report_waiting() calls;
the startup process doesn't appear in pg_stat_activity.
Fujii Masao
These comments were not updated when we added the EXEC_BACKEND
mechanism for Windows, even though it rendered them inaccurate.
Also unify two unnecessarily-separate #ifdef __alpha code blocks.
We don't actually need optreset, because we can easily fix the code to
ensure that it's cleanly restartable after having completed a scan over the
argv array; which is the only case we need to restart in. Getting rid of
it avoids a class of interactions with the system libraries and allows
reversion of my change of yesterday in postmaster.c and postgres.c.
Back-patch to 8.4. Before that the getopt code was a bit different anyway.