of this are to centralise the conflict code to allow further change,
as well as to allow passing through the full reason for the conflict
through to the conflicting backends. Backend state alters how we
can handle different types of conflict so this is now required.
As originally suggested by Heikki, no longer optional.
underlying catalog not only the index itself. Otherwise, if the cache
load process touches the catalog (which will happen for many though not
all of these indexes), we are locking index before parent table, which can
result in a deadlock against processes that are trying to lock them in the
normal order. Per today's failure on buildfarm member gothic_moth; it's
surprising the problem hadn't been identified before.
Back-patch to 8.2. Earlier releases didn't have the issue because they
didn't try to lock these indexes during load (instead assuming that they
couldn't change schema at all during multiuser operation).
especially not ROLLBACK. ROLLBACK might need to be executed in an already
aborted transaction, when there is no safe way to revalidate the plan. But
in general there's no point in marking utility statements invalid, since
they have no plans in the normal sense of the word; so we might as well
work a bit harder here to avoid future revalidation cycles.
Back-patch to 8.4, where the bug was introduced.
in the parameter array. Noted while experimenting with an example
from Pavel. This wouldn't come up in normal use, but it ought to honor
the specification that a parameter array can have unused slots.
occurring during a reload, such as query-cancel. Instead of zeroing out
an existing relcache entry and rebuilding it in place, build a new relcache
entry, then swap its contents with the old one, then free the new entry.
This avoids problems with code believing that a previously obtained pointer
to a cache entry must still reference a valid entry, as seen in recent
failures on buildfarm member jaguar. (jaguar is using CLOBBER_CACHE_ALWAYS
which raises the probability of failure substantially, but the problem
could occur in the field without that.) The previous design was okay
when it was made, but subtransactions and the ResourceOwner mechanism
make it unsafe now.
Also, make more use of the already existing rd_isvalid flag, so that we
remember that the entry requires rebuilding even if the first attempt fails.
Back-patch as far as 8.2. Prior versions have enough issues around relcache
reload anyway (due to inadequate locking) that fixing this one doesn't seem
worthwhile.
can upgrade clusters without renaming the tablespace directories. New
directory structure format is, e.g.:
$PGDATA/pg_tblspc/20981/PG_8.5_201001061/719849/83292814
rowtype contains dropped columns. Sometimes the input tuple will be formed
from a select targetlist in which dropped columns are filled with a NULL
of an arbitrary type (the planner typically uses INT4, since it can't tell
what type the dropped column really was). So we need to relax the rowtype
compatibility check to not insist on physical compatibility if the actual
column value is NULL.
In principle we might need to do this for functions returning composite
types, too (see tupledesc_match()). In practice there doesn't seem to be
a bug there, probably because the function will be using the same cached
rowtype descriptor as the caller. Fixing that code path would require
significant rearrangement, so I left it alone for now.
Per complaint from Filip Rembialkowski.
Previously we only cancelled sessions that were in-transaction.
Simple fix is to just cancel all sessions without waiting. Doing
it this way avoids complicating common code paths, which would
not be worth the trouble to cover this rare case.
Problem report and fix by Andres Freund, edited somewhat by me
This silences some warnings on Win64. Not using the proper SOCKET datatype
was actually wrong on Win32 as well, but didn't cause any warnings there.
Also create define PGINVALID_SOCKET to indicate an invalid/non-existing
socket, instead of using a hardcoded -1 value.
Previously, fastgetattr() and heap_getattr() tested their fourth argument
against a null pointer, but any attempt to use them with a literal-NULL
fourth argument evaluated to *(void *)0, resulting in a compiler error.
Remove these NULL tests to avoid leading future readers of this code to
believe that this has a chance of working. Also clean up related legacy
code in nocachegetattr(), heap_getsysattr(), and nocache_index_getattr().
The new coding standard is that any code which calls a getattr-type
function or macro which takes an isnull argument MUST pass a valid
boolean pointer. Per discussion with Bruce Momjian, Tom Lane, Alvaro
Herrera.
extract a system column, and remove a couple of lines that are useless
in light of the fact that we aren't ever going to support this case. There
isn't much point in trying to make this work because a tuple Datum does
not carry many of the system columns. Per experimentation with a case
reported by Dean Rasheed; we'll have to fix his problem somewhere else.
deletion, so that we attempt to unlink the correct filepath. unlink()
errors are ignorable there, so lack of a DatabasePath initialization step
did not cause visible problems until a related bug showed up on Solaris.
Code refactored from xact_redo_commit() to
ProcessCommittedInvalidationMessages() in inval.c. Recovery may replay
shared invalidation messages for many databases, so we cannot
SetDatabasePath() once as we do in normal backends. Read the databaseid
from the shared invalidation messages, then set DatabasePath
temporarily before calling RelationCacheInitFileInvalidate().
Problem report by Robert Treat, analysis and fix by me.
someone else has just updated it, we have to set priorXmax to that tuple's
xmax (ie, the XID of the other xact that updated it) before looping back to
examine the next tuple. Obviously, the next tuple in the update chain should
have that XID as its xmin, not the same xmin as the preceding tuple that we
had been trying to lock. The mismatch would cause the EvalPlanQual logic to
decide that the tuple chain ended in a deletion, when actually there was a
live tuple that should have been found.
I inserted this error when recently adding logic to EvalPlanQual to make it
lock tuples before returning them (as opposed to the old method in which the
lock would occur much later, causing a great deal of work to be wasted if we
only then discover someone else updated it). Sigh. Per today's report from
Takahiro Itagaki of inconsistent results during pgbench runs.
of the string". The previous coding treated only -1 that way, and would
produce an invalid result value for other negative values.
We ought to fix it so that 2-parameter bit substring() is a different C
function and the 3-parameter form throws error for negative length, but
that takes a pg_proc change which is impractical in the back branches;
and in any case somebody might be relying on -1 working this way.
So just do this as a back-patchable fix.
It is absolutely not okay to throw an ereport(ERROR) in any random place in
the code just because DoingCommandRead is set; interrupting, say, OpenSSL
in the midst of its activities is guaranteed to result in heartache.
Instead of that, undo the original optimizations that threw away
QueryCancelPending anytime we were starting or finishing a command read, and
instead discard the cancel request within ProcessInterrupts if we find that
there is no HS reason for forcing a cancel and we are DoingCommandRead.
In passing, may I once again condemn the practice of changing the code
and not fixing the adjacent comment that you just turned into a lie?
we're not going to support that anymore.
I did keep the 64-bit-CRC-with-32-bit-arithmetic code, since it has a
performance excuse to live. It's a bit moot since that's all ifdef'd
out, of course.
Add missing varlena header to TableSpaceOpts structure. And, per
Tom Lane, instead of calling tablespace_reloptions in CacheMemoryContext,
call it in the caller's memory context and copy the value over
afterwards, to reduce the chances of a session-lifetime memory leak.
VACUUM FULL was renamed to VACUUM FULL INPLACE. Also added a new
option -i, --inplace for vacuumdb to perform FULL INPLACE vacuuming.
Since the new VACUUM FULL uses CLUSTER infrastructure, we cannot
use it for system tables. VACUUM FULL for system tables always
fall back into VACUUM FULL INPLACE silently.
Itagaki Takahiro, reviewed by Jeff Davis and Simon Riggs.
peculiar variant of UNION ALL, and so wouldn't likely get written directly
as-is, it's possible for it to arise as a result of simplification of
less-obviously-silly queries. In particular, now that we can do flattening
of subqueries that have constant outputs and are underneath an outer join,
it's possible for the case to result from simplification of queries of the
type exhibited in bug #5263. Back-patch to 8.4 to avoid a functionality
regression for this type of query.
This patch only supports seq_page_cost and random_page_cost as parameters,
but it provides the infrastructure to scalably support many more.
In particular, we may want to add support for effective_io_concurrency,
but I'm leaving that as future work for now.
Thanks to Tom Lane for design help and Alvaro Herrera for the review.
files when they haven't changed. This confuses make because the build fails
to update the file timestamps, and so it keeps on doing the action over again.
pg_attribute, by having genbki.pl derive the information from the various
catalog header files. This greatly simplifies modification of the
"bootstrapped" catalogs.
This patch finally kills genbki.sh and Gen_fmgrtab.sh; we now rely entirely on
Perl scripts for those build steps. To avoid creating a Perl build dependency
where there was not one before, the output files generated by these scripts
are now treated as distprep targets, ie, they will be built and shipped in
tarballs. But you will need a reasonably modern Perl (probably at least
5.6) if you want to build from a CVS pull.
The changes to the MSVC build process are untested, and may well break ---
we'll soon find out from the buildfarm.
John Naylor, based on ideas from Robert Haas and others
recovery instead of reading the backup history file. This is more robust,
as it stops you from prematurely starting up an inconsisten cluster if the
backup history file is lost for some reason, or if the base backup was
never finished with pg_stop_backup().
This also paves the way for a simpler streaming replication patch, which
doesn't need to care about backup history files anymore.
The backup history file is still created and archived as before, but it's
not used by the system anymore. It's just for informational purposes now.
Bump PG_CONTROL_VERSION as the location of the backup startpoint is now
written to a new field in pg_control, and catversion because initdb is
required
Original patch by Fujii Masao per Simon's idea, with further fixes by me.
"column < constant", and the comparison value is in the first or last
histogram bin or outside the histogram entirely, try to fetch the actual
column min or max value using an index scan (if there is an index on the
column). If successful, replace the lower or upper histogram bound with
that value before carrying on with the estimate. This limits the
estimation error caused by moving min/max values when the comparison
value is close to the min or max. Per a complaint from Josh Berkus.
It is tempting to consider using this mechanism for mergejoinscansel as well,
but that would inject index fetches into main-line join estimation not just
endpoint cases. I'm refraining from that until we can get a better handle
on the costs of doing this type of lookup.
indexscans would do the wrong thing if index_rescan() was called with a
NULL instead of a new set of scankeys and the index was DESC order,
because sk_strategy would not get flipped a second time. I think
that those provisions for a NULL argument are dead code now as far as the
core backend goes, but possibly somebody somewhere is still using it.
In any case, this refactoring seems clearer, and it's definitely shorter.
This is needed to avoid unwanted interference with SUBSTRING behavior,
as per bug #5257 from Roman Kononov. Also, add some basic intelligence
about character classes (bracket expressions) since we now have several
behaviors that aren't appropriate inside a character class.
As with the previous patch in this area, I'm reluctant to back-patch
since it might affect applications that are relying on the prior
behavior.
expressions: FormIndexDatum requires the estate's scantuple to already point
at the tuple the values are supposedly being extracted from. Adjust test
case so that this type of confusion will be exposed.
Per report from hubert depesz lubaczewski.
8.2beta but never carried out. This avoids repetitive tests of whether the
argument is of scalar or composite type. Also, be a bit more paranoid about
composite arguments in some places where we previously weren't checking.
to be just a minor extension of the previous patch that made "x IS NULL"
indexable, because we can treat the IS NOT NULL condition as if it were
"x < NULL" or "x > NULL" (depending on the index's NULLS FIRST/LAST option),
just like IS NULL is treated like "x = NULL". Aside from any possible
usefulness in its own right, this is an important improvement for
index-optimized MAX/MIN aggregates: it is now reliably possible to get
a column's min or max value cheaply, even when there are a lot of nulls
cluttering the interesting end of the index.
This is more in keeping with modern practice, and is a first step towards
porting to Win64 (which has sizeof(pointer) > sizeof(long)).
Tsutomu Yamada, Magnus Hagander, Tom Lane
decisions about when to auto-analyze.
The previous code depended on n_live_tuples + n_dead_tuples - last_anl_tuples,
where all three of these numbers could be bad estimates from ANALYZE itself.
Even worse, in the presence of a steady flow of HOT updates and matching
HOT-tuple reclamations, auto-analyze might never trigger at all, even if all
three numbers are exactly right, because n_dead_tuples could hold steady.
To fix, replace last_anl_tuples with an accurately tracked count of the total
number of committed tuple inserts + updates + deletes since the last ANALYZE
on the table. This can still be compared to the same threshold as before, but
it's much more trustworthy than the old computation. Tracking this requires
one more intra-transaction counter per modified table within backends, but no
additional memory space in the stats collector. There probably isn't any
measurable speed difference; if anything it might be a bit faster than before,
since I was able to eliminate some per-tuple arithmetic operations in favor of
adding sums once per (sub)transaction.
Also, simplify the logic around pgstat vacuum and analyze reporting messages
by not trying to fold VACUUM ANALYZE into a single pgstat message.
The original thought behind this patch was to allow scheduling of analyzes
on parent tables by artificially inflating their changes_since_analyze count.
I've left that for a separate patch since this change seems to stand on its
own merit.
at least in some Windows versions, these functions are capable of returning
a failure indication without setting errno. That puts us into an infinite
loop if the previous value happened to be EINTR. Per report from Brendan
Hill.
Back-patch to 8.2. We could take it further back, but since this is only
known to be an issue on Windows and we don't support Windows before 8.2,
it does not seem worth the trouble.
Since the int2vector type is intended only for internal use, this patch doesn't
worry about prettifying the error messages, which has the fringe benefit of
avoiding creating additional translatable strings. For a type intended to be
used by end-users, we would want to do better, but the approach taken here
seems like the correct trade-off for this case.
Caleb Welton
find_inheritance_children(). This is a complete no-op in databases without
any inheritance. In databases where there are just a few entries in
pg_inherits, it could conceivably be a small loss. However, in databases with
many inheritance parents, it can be a big win.
and teach ANALYZE to compute such stats for tables that have subclasses.
Per my proposal of yesterday.
autovacuum still needs to be taught about running ANALYZE on parent tables
when their subclasses change, but the feature is useful even without that.
PL/pgSQL function within an exception handler. Make sure we use the right
resource owner when we create the tuplestore to hold returned tuples.
Simplify tuplestore API so that the caller doesn't need to be in the right
memory context when calling tuplestore_put* functions. tuplestore.c
automatically switches to the memory context used when the tuplestore was
created. Tuplesort was already modified like this earlier. This patch also
removes the now useless MemoryContextSwitch calls from callers.
Report by Aleksei on pgsql-bugs on Dec 22 2009. Backpatch to 8.1, like
the previous patch that broke this.
The temporary hash tables made by pgstat_collect_oids should be allocated
in a short-term memory context, which is not the default behavior of
hash_create. Noted while looking through hash_create calls in connection
with Robert Haas' recent complaint.
This is a pre-existing bug, but it doesn't seem important enough to
back-patch. The hash table is not so large that it would matter unless this
happened many times within a session, which seems quite unlikely.
probably got there via blind copy-and-paste from one of the legitimate
callers, so rearrange and comment that code a bit to make it clearer that
this isn't a necessary prerequisite to hash_create. Per observation
from Robert Haas.
Modify pg_dump --binary-upgrade and add backend support routines to
support the preservation of pg_type oids when doing a binary upgrade.
This allows user-defined composite types and arrays to be binary
upgraded.
index page split. This would result in index corruption, or even more likely
an error during WAL replay, if we were unlucky enough to crash during
end-of-recovery cleanup after having completed an incomplete GIST insertion.
Yoichi Hirai
choose an index name the same as it would do for an unnamed index constraint.
(My recent changes to the index naming logic have helped to ensure that this
will be a reasonable choice.) Per a suggestion from Peter.
A necessary side-effect is to promote CONCURRENTLY to type_func_name_keyword
status, ie, it can't be a table/column/index name anymore unless quoted.
This is not all bad, since we have heard more than once of people typing
CREATE INDEX CONCURRENTLY ON foo (...) and getting a normal index build of
an index named "concurrently", which was not what they wanted. Now this
syntax will result in a concurrent build of an index with system-chosen
name; which they can rename afterwards if they want something else.
their underlying table columns. That code was not bright enough to cope with
collision situations (ie, new name conflicts with some other column of the
index). Since there is no functional reason to do this at all, trying to
upgrade the logic to be bulletproof doesn't seem worth the trouble.
This change means that both the index name and the column names of an index
are set when it's created, and won't be automatically changed when the
underlying table columns are renamed. Neatnik DBAs are still free to rename
them manually, of course.
CREATE FOREIGN DATA WRAPPER. Arguably it wasn't a bug because the
documentation said that it's passed the catalog ID or zero, but surely
we should provide it when it's known. And there isn't currently any
scenario where it's not known, and I can't imagine having one in the
future either, so better remove the "or zero" escape hatch and always
pass a valid catalog ID. Backpatch to 8.4.
Martin Pihlak
Index expression columns are now named after the FigureColname result for
their expressions, rather than always being "pg_expression_N". Digits are
appended to this name if needed to make the column name unique within the
index. (That happens for regular columns too, thus fixing the old problem
that CREATE INDEX fooi ON foo (f1, f1) fails. Before exclusion indexes
there was no real reason to do such a thing, but now maybe there is.)
Default names for indexes and associated constraints now include the column
names of all their columns, not only the first one as in previous practice.
(Of course, this will be truncated as needed to fit in NAMEDATALEN. Also,
pkey indexes retain the historical behavior of not naming specific columns
at all.)
An example of the results:
regression=# create table foo (f1 int, f2 text,
regression(# exclude (f1 with =, lower(f2) with =));
NOTICE: CREATE TABLE / EXCLUDE will create implicit index "foo_f1_lower_exclusion" for table "foo"
CREATE TABLE
regression=# \d foo_f1_lower_exclusion
Index "public.foo_f1_lower_exclusion"
Column | Type | Definition
--------+---------+------------
f1 | integer | f1
lower | text | lower(f2)
btree, for table "public.foo"
and composite types, which are the only relkinds for which pg_dump support
exists for dumping column comments. There is no obvious usefulness for
comments on columns of sequences or toast tables; and while comments on
index columns might have some value, it's not worth the risk of compatibility
problems due to possible changes in the algorithm for assigning names to
index columns. Per discussion.
In consequence, remove now-dead code for copying such comments in CREATE TABLE
LIKE.
Rewrite or adjust various comments for clarity. Remove one bogus comment that
doesn't reflect what the code actually does. Improve the description of the
lo_compat_privileges option.
have hard-wired knowledge of the rules for naming index columns. It can
just look at the actual names in the source index, instead. Do some minor
formatting cleanup too.
Enabled by recovery_connections = on (default) and forcing archive recovery using a recovery.conf. Recovery processing now emulates the original transactions as they are replayed, providing full locking and MVCC behaviour for read only queries. Recovery must enter consistent state before connections are allowed, so there is a delay, typically short, before connections succeed. Replay of recovering transactions can conflict and in some cases deadlock with queries during recovery; these result in query cancellation after max_standby_delay seconds have expired. Infrastructure changes have minor effects on normal running, though introduce four new types of WAL record.
New test mode "make standbycheck" allows regression tests of static command behaviour on a standby server while in recovery. Typical and extreme dynamic behaviours have been checked via code inspection and manual testing. Few port specific behaviours have been utilised, though primary testing has been on Linux only so far.
This commit is the basic patch. Additional changes will follow in this release to enhance some aspects of behaviour, notably improved handling of conflicts, deadlock detection and query cancellation. Changes to VACUUM FULL are also required.
Simon Riggs, with significant and lengthy review by Heikki Linnakangas, including streamlined redesign of snapshot creation and two-phase commit.
Important contributions from Florian Pflug, Mark Kirkwood, Merlin Moncure, Greg Stark, Gianni Ciolli, Gabriele Bartolini, Hannu Krosing, Robert Haas, Tatsuo Ishii, Hiroyuki Yamada plus support and feedback from many other community members.
This was possibly linked to a deadlock-like situation in glibc syslog code
invoked by the ereport call in quickdie(). In any case, a signal handler
should not unblock its own signal unless there is a specific reason to.
presented with an UNKNOWN-type Var, which can happen in cases where an
unknown literal appeared in a subquery. While many such cases will fail
later on anyway in the planner, there are some cases where the planner is
able to flatten the query and replace the Var by the constant before it has
to coerce the union column to the final type. I had added this check in 8.4
to provide earlier/better error detection, but it causes a regression for
some cases that worked OK before. Fix by not making the check if the input
node is UNKNOWN type and not a Const or Param. If it isn't going to work,
it will fail anyway at plan time, with the only real loss being inability to
provide an error cursor. Per gripe from Britt Piehler.
In passing, rename a couple of variables to remove confusion from an
inner scope masking the same variable names in an outer scope.
ExplainSeparatePlans() was busted for both JSON and YAML output - the present
code is a holdover from the original version of my machine-readable explain
patch, which didn't have the grouping_stack machinery. Also, fix an odd
distribution of labor between ExplainBeginGroup() and ExplainYAMLLineStarting()
when marking lists with "- ", with each providing one character. This broke
the output format for multi-query statements. Also, fix ExplainDummyGroup()
for the YAML output format.
Along the way, make the YAML format use escape_yaml() in situations where the
JSON format uses escape_json(). Right now, it doesn't matter because all the
values are known not to need escaping, but it seems safer this way. Finally,
I added some comments to better explain what the YAML output format is doing.
Greg Sabino Mullane reported the issues with multi-query statements.
Analysis and remaining cleanups by me.
For long source strings the copying results in O(N^2) behavior, and the
multiplier can be significant if wide-char conversion is involved.
Andres Freund, reviewed by Kevin Grittner.
non-kluge method for controlling the order in which values are fed to an
aggregate function. At the same time eliminate the old implementation
restriction that DISTINCT was only supported for single-argument aggregates.
Possibly release-notable behavioral change: formerly, agg(DISTINCT x)
dropped null values of x unconditionally. Now, it does so only if the
agg transition function is strict; otherwise nulls are treated as DISTINCT
normally would, ie, you get one copy.
Andrew Gierth, reviewed by Hitoshi Harada
This patch also removes buffer-usage statistics from the track_counts
output, since this (or the global server statistics) is deemed to be a better
interface to this information.
Itagaki Takahiro, reviewed by Euler Taveira de Oliveira.
we have to cope with the possibility that the declared result rowtype contains
dropped columns. This fails in 8.4, as per bug #5240.
While at it, be more paranoid about inserting binary coercions when inlining.
The pre-8.4 code did not really need to worry about that because it could not
inline at all in any case where an added coercion could change the behavior
of the function's statement. However, when inlining a SRF we allow sorting,
grouping, and set-ops such as UNION. In these cases, modifying one of the
targetlist entries that the sort/group/setop depends on could conceivably
change the behavior of the function's statement --- so don't inline when
such a case applies.
does a search for the user in the directory first, and then binds with
the DN found for this user.
This allows for LDAP logins in scenarios where the DN of the user cannot
be determined simply by prefix and suffix, such as the case where different
users are located in different containers.
The old way of authentication can be significantly faster, so it's kept
as an option.
Robert Fleming and Magnus Hagander
correctly when the output bit width is wider than the given integer by
something other than a multiple of 8 bits.
This has been wrong since I first wrote that code for 8.0 :-(. Kudos to
Roman Kononov for being the first to notice, though I didn't use his
patch. Per bug #5237.
Without these functions, anyone outside of explain.c can't actually use
ExplainPrintPlan, because the ExplainState won't be initialized properly.
The user-visible result of this was a crash when using auto_explain with
the JSON output format.
Report by Euler Taveira de Oliveira. Analysis by Tom Lane. Patch by me.
before we zap the input tuple. Otherwise, pass-by-reference columns of
the result slot are likely to contain just references to the input
tuple, leading to big trouble if the pfree'd space is reused. Per
trouble report from Jaime Casanova. This is a new bug in the recent
rewrite of EvalPlanQual, so nothing to back-patch.
an allegedly immutable index function. It was previously recognized that
we had to prevent such a function from executing SET/RESET ROLE/SESSION
AUTHORIZATION, or it could trivially obtain the privileges of the session
user. However, since there is in general no privilege checking for changes
of session-local state, it is also possible for such a function to change
settings in a way that might subvert later operations in the same session.
Examples include changing search_path to cause an unexpected function to
be called, or replacing an existing prepared statement with another one
that will execute a function of the attacker's choosing.
The present patch secures VACUUM, ANALYZE, and CREATE INDEX/REINDEX against
these threats, which are the same places previously deemed to need protection
against the SET ROLE issue. GUC changes are still allowed, since there are
many useful cases for that, but we prevent security problems by forcing a
rollback of any GUC change after completing the operation. Other cases are
handled by throwing an error if any change is attempted; these include temp
table creation, closing a cursor, and creating or deleting a prepared
statement. (In 7.4, the infrastructure to roll back GUC changes doesn't
exist, so we settle for rejecting changes of "search_path" in these contexts.)
Original report and patch by Gurjeet Singh, additional analysis by
Tom Lane.
Security: CVE-2009-4136
attacks where an attacker would put <attack>\0<propername> in the field and
trick the validation code that the certificate was for <attack>.
This is a very low risk attack since it reuqires the attacker to trick the
CA into issuing a certificate with an incorrect field, and the common
PostgreSQL deployments are with private CAs, and not external ones. Also,
default mode in 8.4 does not do any name validation, and is thus also not
vulnerable - but the higher security modes are.
Backpatch all the way. Even though versions 8.3.x and before didn't have
certificate name validation support, they still exposed this field for
the user to perform the validation in the application code, and there
is no way to detect this problem through that API.
Security: CVE-2009-4034
support any indexable commutative operator, not just equality. Two rows
violate the exclusion constraint if "row1.col OP row2.col" is TRUE for
each of the columns in the constraint.
Jeff Davis, reviewed by Robert Haas
Instead of expensive cross joins to resolve the ACL, add table-returning
function aclexplode() that expands the ACL into a useful form, and join
against that.
Also, implement the role_*_grants views as a thin layer over the respective
*_privileges views instead of essentially repeating the same code twice.
fixes bug #4596
by Joachim Wieland, with cleanup by me
in a subtransaction stays open even if the subtransaction is aborted, so
any temporary files related to it must stay alive as well. With the patch,
we use ResourceOwners to track open temporary files and don't automatically
close them at subtransaction end (though in the normal case temporary files
are registered with the subtransaction resource owner and will therefore be
closed).
At end of top transaction, we still check that there's no temporary files
marked as close-at-end-of-transaction open, but that's now just a debugging
cross-check as the resource owner cleanup should've closed them already.
to the client by the server. This might seem pretty pointless but apparently
it will help pgbouncer, and perhaps other connection poolers. Anyway it's
practically free to do so for the normal use-case where appname is only set
in the startup packet --- we're just adding a few more bytes to the initial
ParameterStatus response packet. Per comments from Marko Kreen.
locale-dependent character classification properly when the database encoding
is UTF8.
The previous coding worked okay in single-byte encodings, or in any case for
ASCII characters, but failed entirely on multibyte characters. The fix
assumes that the <wctype.h> functions use Unicode code points as the wchar
representation for Unicode, ie, wchar matches pg_wchar.
This is only a partial solution, since we're still stupid about non-ASCII
characters in multibyte encodings other than UTF8. The practical effect
of that is limited, however, since those cases are generally Far Eastern
glyphs for which concepts like case-folding don't apply anyway. Certainly
all or nearly all of the field reports of problems have been about UTF8.
A more general solution would require switching to the platform's wchar
representation for all regex operations; which is possible but would have
substantial disadvantages. Let's try this and see if it's sufficient in
practice.
by adding a requirement that build_join_rel add new join RelOptInfos to the
appropriate list immediately at creation. Per report from Robert Haas,
the list_concat_unique_ptr() calls that this change eliminates were taking
the lion's share of the runtime in larger join problems. This doesn't do
anything to fix the fundamental combinatorial explosion in large join
problems, but it should push out the threshold of pain a bit further.
Note: because this changes the order in which joinrel lists are built,
it might result in changes in selected plans in cases where different
alternatives have exactly the same costs. There is one example in the
regression tests.
be part of multixacts, so allocate a slot for each prepared transaction in
the "oldest member" array in multixact.c. On PREPARE TRANSACTION, transfer
the oldest member value from the current backends slot to the prepared xact
slot. Also save and recover the value from the 2pc state file.
The symptom of the bug was that after a transaction prepared, a shared lock
still held by the prepared transaction was sometimes ignored by other
transactions.
Fix back to 8.1, where both 2PC and multixact were introduced.
checked to determine whether the trigger should be fired.
For BEFORE triggers this is mostly a matter of spec compliance; but for AFTER
triggers it can provide a noticeable performance improvement, since queuing of
a deferred trigger event and re-fetching of the row(s) at end of statement can
be short-circuited if the trigger does not need to be fired.
Takahiro Itagaki, reviewed by KaiGai Kohei.
output filename if CSV logging was enabled and only one of the two possible
output files got rotated during a particular call (which would, in fact,
typically be the case during a size-based rotation). This would amount to
about MAXPGPATH (1KB) per rotation, and it's been there since the CSV
code was put in, so it's surprising that nobody noticed it before.
Per bug #5196 from Thomas Poindessous.
strength of database passwords, and create a sample implementation of
such a hook as a new contrib module "passwordcheck".
Laurenz Albe, reviewed by Takahiro Itagaki
adopted for EXPLAIN. This will allow additional options to be implemented
in future without having to make them fully-reserved keywords. The old syntax
remains available for existing options, however.
Itagaki Takahiro
non-Var sort/group expressions using ressortgroupref labels instead of
depending entirely on equal()-ity of the upper node's tlist expressions
to the lower node's. This avoids emitting the wrong outputs in cases
where there are textually identical volatile sort/group expressions,
as for example
select distinct random(),random() from generate_series(1,10);
Per report from Andrew Gierth.
Backpatch to 8.4. Arguably this is wrong all the way back, but the only known
case where there's an observable problem is when using hash aggregation to
implement DISTINCT, which is new as of 8.4. So for the moment I'll refrain
from backpatching further.
mergejoin to shield it from doing mark/restore and refetches. Put an explicit
flag in MergePath so we can centralize the logic that knows about this,
and add costing logic that considers using Materialize even when it's not
forced by the previously-existing considerations. This is in response to
a discussion back in August that suggested that materializing an inner
indexscan can be helpful when the refetch percentage is high enough.
but the transformed ArrayExpr claimed to have a return type of "domain",
even though the domain constraint was only checked by the enclosing
CoerceToDomain node. With this fix, the ArrayExpr is correctly labeled with
the base type of the domain. Per gripe by Tom Lane.
we need to check domain constraints. We used to do it correctly, but 8.4
introduced a separate code path for the "ARRAY[]::arraytype" case to infer
the type of an empty ARRAY construct from the cast target, and forgot to take
domains into account.
Per report from Florian G. Pflug.
User-defined consistent functions believes the check array
contains at least one true element which was not a true for
scanning pending list.
Per report from Yury Don <yura@vpcit.ru>
Per discussion, this should result in defaulting to SQL_ASCII encoding.
The original coding could not support that because it conflated selection
of SQL_ASCII encoding with not being able to determine the encoding.
Adjust pg_get_encoding_from_locale()'s API to distinguish these cases,
and fix callers appropriately. Only initdb actually changes behavior,
since the other callers were perfectly content to consider these cases
equivalent.
Per bug #5178 from Boh Yap. Not going to bother back-patching, since
no one has complained before and there's an easy workaround (namely,
specify the encoding you want).
directly. This was a lot of trouble, but should be worth it in terms of
not having to keep the plpgsql lexer in step with core anymore. In addition
the handling of keywords is significantly better-structured, allowing us to
de-reserve a number of words that plpgsql formerly treated as reserved.
The main motivation for this is that it's required for Informix compatibility
in ECPG.
This patch makes the ECPG and core grammars a bit closer to one another for
these productions.
Author: Zoltan Boszormenyi
Apple has fixed that bug in 10.6.2, and we should encourage users to
update to that version rather than trusting this cosmetic patch.
As was recently noted by Stephen Tyler, this patch was only masking
the problem in the context of DROP TABLESPACE, but the failure could
occur in other places such as pg_xlog cleanup.
In VACUUM FULL, an interrupt after the initial transaction has been recorded
as committed can cause postmaster to restart with the following error message:
PANIC: cannot abort transaction NNNN, it was already committed
This problem has been reported many times.
In lazy VACUUM, an interrupt after the table has been truncated by
lazy_truncate_heap causes other backends' relcache to still point to the
removed pages; this can cause future INSERT and UPDATE queries to error out
with the following error message:
could not read block XX of relation 1663/NNN/MMMM: read only 0 of 8192 bytes
The window to this race condition is extremely narrow, but it has been seen in
the wild involving a cancelled autovacuum process.
The solution for both problems is to inhibit interrupts in both operations
until after the respective transactions have been committed. It's not a
complete solution, because the transaction could theoretically be aborted by
some other error, but at least fixes the most common causes of both problems.
of different parsers having different YYSTYPE unions that they want to use
with it. I defined a new union core_YYSTYPE that is just the (very short)
list of semantic values returned by the core scanner. I had originally
worried that this would require an extra interface layer, but actually we can
have parser.c's base_yylex (formerly filtered_base_yylex) take care of that at
no extra cost. Names associated with the core scanner are now "core_yy_foo",
with "base_yy_foo" being used in the core Bison parser and the parser.c
interface layer.
This solves the last serious stumbling block to eliminating plpgsql's separate
lexer. One restriction that will still be present is that plpgsql and the
core will have to agree on the token numbers assigned to tokens that can be
returned by the core lexer. Since Bison doesn't seem willing to accept
external assignments of those numbers, we'll have to live with decreeing that
core and plpgsql grammars declare these tokens first and in the same order.
can be the name of a plpgsql cursor variable, which formerly was converted
to $N before the core parser saw it, but that's no longer the case.
Deal with plain name references to plpgsql variables, and add a regression
test case that exposes the failure.
it works just as well to have them be ordinary identifiers, and this gets rid
of a number of ugly special cases. Plus we aren't interfering with non-rule
usage of these names.
catversion bump because the names change internally in stored rules.
that are not handled by find_coercion_pathway, notably composite->RECORD.
Now that 8.4 supports composites as primary keys, it's worth dealing with
this case.
tarballs under 100 characters. This should avoid failures with certain
untarring tools (WinZip and Midnight Commander have been mentioned as
likely suspects). Per my proposal of yesterday.
catversion bumped since the initial contents of pg_proc change.
at the first keyword of the expression, rather than drawing a rather
artificial distinction between the ESCAPE subclause and the rest.
Per gripe from Gokulakannan Somasundaram and subsequent discusssion.
As proof of concept, modify plpgsql to use the hooks. plpgsql is still
inserting $n symbols textually, but the "back end" of the parsing process now
goes through the ParamRef hook instead of using a fixed parameter-type array,
and then execution only fetches actually-referenced parameters, using a hook
added to ParamListInfo.
Although there's a lot left to be done in plpgsql, this already cures the
"if (TG_OP = 'INSERT' and NEW.foo ...)" problem, as illustrated by the
changed regression test.
that it can scribble on scan->xs_ctup.t_self while following HOT chains,
so we can't rely on that to stay valid between hashgettuple() calls.
Introduce a private variable in HashScanOpaque, instead.
hash indexes keep entries sorted by hash value. First, the original plans for
concurrency assumed that insertions would happen only at the end of a page,
which is no longer true; this could cause scans to transiently fail to find
index entries in the presence of concurrent insertions. We can compensate
by teaching scans to re-find their position after re-acquiring read locks.
Second, neither the bucket split nor the bucket compaction logic had been
fixed to preserve hashvalue ordering, so application of either of those
processes could lead to permanent corruption of an index, in the sense
that searches might fail to find entries that are present.
This patch fixes the split and compaction logic to preserve hashvalue
ordering, but it cannot do anything about pre-existing corruption. We will
need to recommend reindexing all hash indexes in the 8.4.2 release notes.
To buy back the performance loss hereby induced in split and compaction,
fix them to use PageIndexMultiDelete instead of retail PageIndexDelete
operations. We might later want to do something with qsort'ing the
page contents rather than doing a binary search for each insertion,
but that seemed more invasive than I cared to risk in a back-patch.
Per bug #5157 from Jeff Janes and subsequent investigation.
recent proposal. As proof of concept, remove knowledge of Params from the
core parser, arranging for them to be handled entirely by parser hook
functions. It turns out we need an additional hook for that --- I had
forgotten about the code that handles inferring a parameter's type from
context.
This is a preliminary step towards letting plpgsql handle its variables
through parser hooks. Additional work remains to be done to expose the
facility through SPI, but I think this is all the changes needed in the core
parser.
The original coding ensured nbuckets and nbatch didn't exceed INT_MAX,
which while not insane on its own terms did nothing to protect subsequent
code like "palloc(nbatch * sizeof(BufFile *))". Since enormous join size
estimates might well be planner error rather than reality, it seems best
to constrain the initial sizes to be not more than work_mem/sizeof(pointer),
thus ensuring the allocated arrays don't exceed work_mem. We will allow
nbatch to get bigger than that during subsequent ExecHashIncreaseNumBatches
calls, but we should still guard against integer overflow in those palloc
requests. Per bug #5145 from Bernt Marius Johnsen.
Although the given test case only seems to fail back to 8.2, previous
releases have variants of this issue, so patch all supported branches.
adding the ModifyTable node type --- I had been thinking ModifyTable should
replace Append as a special case in push_plan(), but actually both of them
have to be special-cased.
when FOR UPDATE is propagated down into a sub-select expanded from a view.
Similar bug to parser's isLockedRel issue that I fixed yesterday; likewise
seems not quite worth the effort to back-patch.
underneath the Limit node, not atop it. This fixes the old problem that such
a query might unexpectedly return fewer rows than the LIMIT says, due to
LockRows discarding updated rows.
There is a related problem that LockRows might destroy the sort ordering
produced by earlier steps; but fixing that by pushing LockRows below Sort
would create serious performance problems that are unjustified in many
real-world applications, as well as potential deadlock problems from locking
many more rows than expected. Instead, keep the present semantics of applying
FOR UPDATE after ORDER BY within a single query level; but allow the user to
specify the other way by writing FOR UPDATE in a sub-select. To make that
work, track whether FOR UPDATE appeared explicitly in sub-selects or got
pushed down from the parent, and don't flatten a sub-select that contained an
explicit FOR UPDATE.
that it's called within an AfterTriggerBeginQuery/AfterTriggerEndQuery pair.
The RI cascade triggers suppress that overhead on the assumption that they
are always run non-deferred, so it's possible to violate the condition if
someone mistakenly changes pg_trigger to mark such a trigger deferred.
We don't really care about supporting that, but throwing an error instead
of crashing seems desirable. Per report from Marcelo Costa.
for example in
WITH w AS (SELECT * FROM foo) SELECT * FROM w, bar ... FOR UPDATE
the FOR UPDATE will now affect bar but not foo. This is more useful and
consistent than the original 8.4 behavior, which tried to propagate FOR UPDATE
into the WITH query but always failed due to assorted implementation
restrictions. Even though we are in process of removing those restrictions,
it seems correct on philosophical grounds to not let the outer query's
FOR UPDATE affect the WITH query.
In passing, fix isLockedRel which frequently got things wrong in
nested-subquery cases: "FOR UPDATE OF foo" applies to an alias foo in the
current query level, not subqueries. This has been broken for a long time,
but it doesn't seem worth back-patching further than 8.4 because the actual
consequences are minimal. At worst the parser would sometimes get
RowShareLock on a relation when it should be AccessShareLock or vice versa.
That would only make a difference if someone were using ExclusiveLock
concurrently, which no standard operation does, and anyway FOR UPDATE
doesn't result in visible changes so it's not clear that the someone would
notice any problem. Between that and the fact that FOR UPDATE barely works
with subqueries at all in existing releases, I'm not excited about worrying
about it.
those accepted by date_in(). I confused julian day numbers and number of
days since the postgres epoch 2000-01-01 in the original patch.
I just noticed that it's still easy to get such out-of-range values into
the database using to_date or +- operators, but this patch doesn't do
anything about those functions.
Per report from James Pye.
a lot of strange behaviors that occurred in join cases. We now identify the
"current" row for every joined relation in UPDATE, DELETE, and SELECT FOR
UPDATE/SHARE queries. If an EvalPlanQual recheck is necessary, we jam the
appropriate row into each scan node in the rechecking plan, forcing it to emit
only that one row. The former behavior could rescan the whole of each joined
relation for each recheck, which was terrible for performance, and what's much
worse could result in duplicated output tuples.
Also, the original implementation of EvalPlanQual could not re-use the recheck
execution tree --- it had to go through a full executor init and shutdown for
every row to be tested. To avoid this overhead, I've associated a special
runtime Param with each LockRows or ModifyTable plan node, and arranged to
make every scan node below such a node depend on that Param. Thus, by
signaling a change in that Param, the EPQ machinery can just rescan the
already-built test plan.
This patch also adds a prohibition on set-returning functions in the
targetlist of SELECT FOR UPDATE/SHARE. This is needed to avoid the
duplicate-output-tuple problem. It seems fairly reasonable since the
other restrictions on SELECT FOR UPDATE are meant to ensure that there
is a unique correspondence between source tuples and result tuples,
which an output SRF destroys as much as anything else does.
style by default. Per discussion, there seems to be hardly anything that
really relies on being able to change the regex flavor, so the ability to
select it via embedded options ought to be enough for any stragglers.
Also, if we didn't remove the GUC, we'd really be morally obligated to
mark the regex functions non-immutable, which'd possibly create performance
issues.
Per recent discussion, add_missing_from has been deprecated for long enough to
consider removing, and it's getting in the way of planned parser refactoring.
The system now always behaves as though add_missing_from were OFF.
pam_message array contains exactly one PAM_PROMPT_ECHO_OFF message.
Instead, deal with however many messages there are, and don't throw error
for PAM_ERROR_MSG and PAM_TEXT_INFO messages. This logic is borrowed from
openssh 5.2p1, which hopefully has seen more real-world PAM usage than we
have. Per bug #5121 from Ryan Douglas, which turned out to be caused by
the conv_proc being called with zero messages. Apparently that is normal
behavior given the combination of Linux pam_krb5 with MS Active Directory
as the domain controller.
Patch all the way back, since this code has been essentially untouched
since 7.4. (Surprising we've not heard complaints before.)
are named in the UPDATE's SET list.
Note: the schema of pg_trigger has not actually changed; we've just started
to use a column that was there all along. catversion bumped anyway so that
this commit is included in the history of potentially interesting changes
to system catalog contents.
Itagaki Takahiro
and SSPI athentication methods. While the old 2000 byte limit was more than
enough for Unix Kerberos implementations, tickets issued by Windows Domain
Controllers can be much larger.
Ian Turner
Also insert a couple of Asserts that check for stack overflow.
Bogus coding appears to be new in 8.4 --- older releases had a much
simpler algorithm here. Per bug #5111.
execMain.c and into a new plan node type LockRows. Like the recent change
to put table updating into a ModifyTable plan node, this increases planning
flexibility by allowing the operations to occur below the top level of the
plan tree. It's necessary in any case to restore the previous behavior of
having FOR UPDATE locking occur before ModifyTable does.
This partially refactors EvalPlanQual to allow multiple rows-under-test
to be inserted into the EPQ machinery before starting an EPQ test query.
That isn't sufficient to fix EPQ's general bogosity in the face of plans
that return multiple rows per test row, though. Since this patch is
mostly about getting some plan node infrastructure in place and not about
fixing ten-year-old bugs, I will leave EPQ improvements for another day.
Another behavioral change that we could now think about is doing FOR UPDATE
before LIMIT, but that too seems like it should be treated as a followon
patch.
* Stop escaping ? and {. As of SQL:2008, SIMILAR TO is defined to have
POSIX-compatible interpretation of ? as well as {m,n} and related constructs,
so we should allow these things through to our regex engine.
* Escape ^ and $. It appears that our regex engine will treat ^^ at the
beginning of the string the same as ^, and similarly for $$ at the end of
the string, which meant that SIMILAR TO was effectively ignoring ^ at the
start of the pattern and $ at the end. Since these are not supposed to be
metacharacters, this is a bug.
The second part of this is arguably a back-patchable bug fix, but I'm
hesitant to do that because it might break applications that are expecting
something like "col SIMILAR TO '^foo$'" to work like a POSIX pattern.
Seems safer to only change it at a major version boundary.
Per discussion of an example from Doug Gorley.
They are now handled by a new plan node type called ModifyTable, which is
placed at the top of the plan tree. In itself this change doesn't do much,
except perhaps make the handling of RETURNING lists and inherited UPDATEs a
tad less klugy. But it is necessary preparation for the intended extension of
allowing RETURNING queries inside WITH.
Marko Tiikkaja
Add a variant of pg_get_triggerdef with a second argument "pretty" that
causes the output to be formatted in the way pg_dump used to do. Use this
variant in pg_dump with server versions >= 8.5.
This insulates pg_dump from most future trigger feature additions, such as
the upcoming column triggers patch.
Author: Itagaki Takahiro <itagaki.takahiro@oss.ntt.co.jp>
friends). This code has all been ifdef'd out for many years, and doesn't
seem to have any prospect of becoming any more useful in the future.
EXPLAIN ANALYZE is what people use in practice, and I think if we did want
process-wide counters we'd be more likely to put in dtrace events for that
than try to resurrect this code. Get rid of it so as to have one less detail
to worry about while refactoring execMain.c.
Create a new catalog pg_db_role_setting where they are now stored, and better
encapsulate the code that deals with settings into its realm. The old
datconfig and rolconfig columns are removed.
psql has gained a \drds command to display the settings.
Backwards compatibility warning: while the backwards-compatible system views
still have the config columns, they no longer completely represent the
configuration for a user or database.
Catalog version bumped.
Partially revert the previous patch I installed and replace it with a more
general fix: any time a snapshot is pushed as Active, we need to ensure that it
will not be modified in the future. This means that if the same snapshot is
used as CurrentSnapshot, it needs to be copied separately. This affects
serializable transactions only, because CurrentSnapshot has already been copied
by RegisterSnapshot and so PushActiveSnapshot does not think it needs another
copy. However, CommandCounterIncrement would modify CurrentSnapshot, whereas
ActiveSnapshots must not have their command counters incremented.
I say "partially" because the regression test I added for the previous bug
has been kept.
(This restores 8.3 behavior, because before snapmgr.c existed, any snapshot set
as Active was copied.)
Per bug report from Stuart Bishop in
6bc73d4c0910042358k3d1adff3qa36f8df75198ecea@mail.gmail.com
inheritance parent tables are compared using equal(), instead of doing
strcmp() on the nodeToString representation. The old implementation was
always a tad cheesy, and it finally fails completely as of 8.4, now that the
node tree might contain syntax location information. equal() knows it's
supposed to ignore those fields, but strcmp() hardly can. Per recent
report from Scott Ribe.
the privileges that will be applied to subsequently-created objects.
Such adjustments are always per owning role, and can be restricted to objects
created in particular schemas too. A notable benefit is that users can
override the traditional default privilege settings, eg, the PUBLIC EXECUTE
privilege traditionally granted by default for functions.
Petr Jelinek
large number of SIGHUP cycles, these would have run the postmaster out
of memory. Noted while testing memory-leak scenario in postgresql.conf
configuration-change-printing patch.
settings: avoid calling superuser() in contexts where it's not defined,
don't leak the transient copies of GetConfigOption output, and avoid the
whole exercise in postmaster child processes.
I found that actually no current caller of GetConfigOption has any use for
its internal check of GUC_SUPERUSER_ONLY. But rather than just remove
that entirely, it seemed better to add a parameter indicating whether to
enforce the check.
Per report from Simon and subsequent testing.
tuple size limit. Improve the error message for index-tuple-too-large
so that it includes the actual size, the limit, and the index name.
Sync with the btree occurrences of the same error.
Back-patch to 8.4 because it appears that the out-of-sync problem
is occurring in the field.
Teodor and Tom
in CREATE OR REPLACE FUNCTION. The original code would update pg_shdepend
as if a new function was being created, even if it wasn't, with two bad
consequences: pg_shdepend might record the wrong owner for the function,
and any dependencies for roles mentioned in the function's ACL would be lost.
The fix is very easy: just don't touch pg_shdepend at all when doing a
function replacement.
Also update the CREATE FUNCTION reference page, which never explained
exactly what changes and doesn't change in a function replacement.
In passing, fix the CREATE VIEW reference page similarly; there's no
code bug there, but the docs didn't say what happens.
The old coding was using a regular snapshot, referenced elsewhere, that was
subject to having its command counter updated. Fix by creating a private copy
of the snapshot exclusively for the cursor.
Backpatch to 8.4, which is when the bug was introduced during the snapshot
management rewrite.
The original coding correctly noted that these aren't just redundancies
(they're effectively X IS NOT NULL, assuming = is strict). However, they
got treated that way if X happened to be in a single-member EquivalenceClass
already, which could happen if there was an ORDER BY X clause, for instance.
The simplest and most reliable solution seems to be to not try to process
such clauses through the EquivalenceClass machinery; just throw them back
for traditional processing. The amount of work that'd be needed to be
smarter than that seems out of proportion to the benefit.
Per bug #5084 from Bernt Marius Johnsen, and analysis by Andrew Gierth.
TupleTableSlot nodes. This eliminates the need to count in advance
how many Slots will be needed, which seems more than worth the small
increase in the amount of palloc traffic during executor startup.
The ExecCountSlots infrastructure is now all dead code, but I'll remove it
in a separate commit for clarity.
Per a comment from Robert Haas.
the strings seen during the bootstrap run. There might have been some
actual point to doing that, many years ago, but as far as I can see the only
value now is to conserve a bit of memory. Even if we cared about wasting
a megabyte or so during the initdb run, it'd be far more effective to
arrange to release memory at the end of each BKI command, instead of
intentionally hanging onto strings that might never be used again.
Not maintaining the table probably makes it faster too; but the main point
of this patch is to get rid of a couple hundred lines of unnecessary and
rather crufty code.
relation rowtype OID into the relcache entries it builds. This ensures
that catcache copies of the relation tupdescs will be fully correct.
While the deficiency doesn't seem to have any effect in the current
sources, we have been bitten by not-quite-right catcache tupdescs before,
so it seems like a good idea to maintain the rule that they should be right.
hand-assigned rowtype OIDs, even when they are not "bootstrapped" catalogs
that have handmade type rows in pg_type.h. Give pg_database such an OID.
Restore the availability of C macros for the rowtype OIDs of the bootstrapped
catalogs. (These macros are now in the individual catalogs' .h files,
though, not in pg_type.h.)
This commit doesn't do anything especially useful by itself, but it's
necessary infrastructure for reverting some ill-considered changes in
relcache.c.
possibility of shared-inval messages causing a relcache flush while it tries
to fill in missing data in preloaded relcache entries. There are actually
two distinct failure modes here:
1. The flush could delete the next-to-be-processed cache entry, causing
the subsequent hash_seq_search calls to go off into the weeds. This is
the problem reported by Michael Brown, and I believe it also accounts
for bug #5074. The simplest fix is to restart the hashtable scan after
we've read any new data from the catalogs. It appears that pre-8.4
branches have not suffered from this failure, because by chance there were
no other catalogs sharing the same hash chains with the catalogs that
RelationCacheInitializePhase2 had work to do for. However that's obviously
pretty fragile, and it seems possible that derivative versions with
additional system catalogs might be vulnerable, so I'm back-patching this
part of the fix anyway.
2. The flush could delete the *current* cache entry, in which case the
pointer to the newly-loaded data would end up being stored into an
already-deleted Relation struct. As long as it was still deleted, the only
consequence would be some leaked space in CacheMemoryContext. But it seems
possible that the Relation struct could already have been recycled, in
which case this represents a hard-to-reproduce clobber of cached data
structures, with unforeseeable consequences. The fix here is to pin the
entry while we work on it.
In passing, also change RelationCacheInitializePhase2 to Assert that
formrdesc() set up the relation's cached TupleDesc (rd_att) with the
correct type OID and hasoids values. This is more appropriate than
silently updating the values, because the original tupdesc might already
have been copied into the catcache. However this part of the patch is
not in HEAD because it fails due to some questionable recent changes in
formrdesc :-(. That will be cleaned up in a subsequent patch.
to create a function for it.
Procedural languages now have an additional entry point, namely a function
to execute an inline code block. This seemed a better design than trying
to hide the transient-ness of the code from the PL. As of this patch, only
plpgsql has an inline handler, but probably people will soon write handlers
for the other standard PLs.
In passing, remove the long-dead LANCOMPILER option of CREATE LANGUAGE.
Petr Jelinek
This is intentionally similar to the recently revised syntax for EXPLAIN
options, ie, (name value, ...). The old syntax is still supported for
backwards compatibility, but we intend that any options added in future
will be provided only in the new syntax.
Robert Haas, Emmanuel Cecchet
tests into a small common subroutine, and eliminate an unnecessary difference
in the order in which conditions are tested. Per a comment from Robert Haas.
is unique and is not referenced above the join. In this case the inner
side doesn't affect the query result and can be thrown away entirely.
Although perhaps nobody would ever write such a thing by hand, it's
a reasonably common case in machine-generated SQL.
The current implementation only recognizes the case where the inner side
is a simple relation with a unique index matching the query conditions.
This is enough for the use-cases that have been shown so far, but we
might want to try to handle other cases later.
Robert Haas, somewhat rewritten by Tom
In practice these mistakes were always masked when full_page_writes was on,
because XLogInsert would always choose to log the full page, and then
ginRedoInsertListPage wouldn't try to do anything. But with full_page_writes
off a WAL replay failure was certain.
The GIN_INSERT_LISTPAGE record type could probably be eliminated entirely
in favor of using XLOG_HEAP_NEWPAGE, but I refrained from doing that now
since it would have required a significantly more invasive patch.
In passing do a little bit of code cleanup, including making the accounting
for free space on GIN list pages more precise. (This wasn't a bug as the
errors were always in the conservative direction.)
Per report from Simon. Back-patch to 8.4 which contains the identical code.
if salt_len == 0. This seems to be mostly academic, since nearly all calling
code paths guarantee nonempty salt; the only case that doesn't is
PQencryptPassword where the caller could mistakenly pass an empty username.
So, fix it but don't bother backpatching. Per ljb.
of checkpoint. Although the checkpoint has been written to WAL at that point
already, so that all data is safe, and we'll retry removing the WAL segment at
the next checkpoint, if such a failure persists we won't be able to remove any
other old WAL segments either and will eventually run out of disk space. It's
better to treat the failure as non-fatal, and move on to clean any other WAL
segment and continue with any other end-of-checkpoint cleanup.
We don't normally expect any such failures, but on Windows it can happen with
some anti-virus or backup software that lock files without FILE_SHARE_DELETE
flag.
Also, the loop in pgrename() to retry when the file is locked was broken. If a
file is locked on Windows, you get ERROR_SHARE_VIOLATION, not
ERROR_ACCESS_DENIED, at least on modern versions. Fix that, although I left
the check for ERROR_ACCESS_DENIED in there as well (presumably it was correct
in some environment), and added ERROR_LOCK_VIOLATION to be consistent with
similar checks in pgwin32_open(). Reduce the timeout on the loop from 30s to
10s, on the grounds that since it's been broken, we've effectively had a
timeout of 0s and no-one has complained, so a smaller timeout is actually
closer to the old behavior. A longer timeout would mean that if recycling a
WAL file fails because it's locked for some reason, InstallXLogFileSegment()
will hold ControlFileLock for longer, potentially blocking other backends, so
a long timeout isn't totally harmless.
While we're at it, set errno correctly in pgrename().
Backpatch to 8.2, which is the oldest version supported on Windows. The xlog.c
changes would make sense on other platforms and thus on older versions as
well, but since there's no such locking issues on other platforms, it's not
worth it.
an explicit model of rescan costs being different from first-time costs.
The costing of Material nodes in particular now has some visible relationship
to the actual runtime behavior, where before it was essentially fantasy.
This also fixes up a couple of places where different materialized plan types
were treated differently for no very good reason (probably just oversights).
A couple of the regression tests are affected, because the planner now chooses
to put the other relation on the inside of a nestloop-with-materialize.
So far as I can see both changes are sane, and the planner is now more
consistently following the expectation that it should prefer to materialize
the smaller of two relations.
Per a recent discussion with Robert Haas.
If Apple doesn't fix that reasonably soon, we'll have to consider
back-patching a workaround; but for now, just hack it in HEAD so that
we can get buildfarm reports on HEAD from OS X machines.
Per Jan Otto.
In this case we generate two PathKey references to the expression (one for
DISTINCT and one for ORDER BY) and they really need to refer to the same
EquivalenceClass. However get_eclass_for_sort_expr was being overly paranoid
and creating two different EC's. Correct behavior is to use the SortGroupRef
index to decide whether two references to volatile expressions that are
equal() (ie textually equivalent) should be considered the same.
Backpatch to 8.4. Possibly this should be changed in 8.3 as well, but
I'll refrain in the absence of evidence of a visible failure in that branch.
Per bug #5049.
use that value when the backend is new enough to allow it. This responds
to bug report from Keh-Cheng Chu pointing out that although 2 extra digits
should be sufficient to dump and restore float8 exactly, it is possible to
need 3 extra digits for float4 values.
file handle on it, the file goes into "pending deletion" state where it
still shows up in directory listing, but isn't accessible otherwise. That
confuses RemoveOldXLogFiles(), making it think that the file hasn't been
archived yet, while it actually was, and it was deleted along with the .done
file.
Fix that by renaming the file with ".deleted" extension before deleting it.
Also check the return value of rename() and unlink(), so that if the removal
fails for any reason (e.g another process is holding the file locked), we
don't delete the .done file until the WAL file is really gone.
Backpatch to 8.2, which is the oldest version supported on Windows.
Before, PL/Python converted data between SQL and Python by going
through a C string representation. This broke for bytea in two ways:
- On input (function parameters), you would get a Python string that
contains bytea's particular external representation with backslashes
etc., instead of a sequence of bytes, which is what you would expect
in a Python environment. This problem is exacerbated by the new
bytea output format.
- On output (function return value), null bytes in the Python string
would cause truncation before the data gets stored into a bytea
datum.
This is now fixed by converting directly between the PostgreSQL datum
and the Python representation.
The required generalized infrastructure also allows for other
improvements in passing:
- When returning a boolean value, the SQL datum is now true if and
only if Python considers the value that was passed out of the
PL/Python function to be true. Previously, this determination was
left to the boolean data type input function. So, now returning
'foo' results in true, because Python considers it true, rather than
false because PostgreSQL considers it false.
- On input, we can convert the integer and float types directly to
their Python equivalents without having to go through an
intermediate string representation.
original patch by Caleb Welton, with updates by myself
code was already okay with this, but the hack that obtained the output
column types of a recursive union in advance of doing real parse analysis
of the recursive union forgot to handle the case where there was an inner
WITH clause available to the non-recursive term. Best fix seems to be to
refactor so that we don't need the "throwaway" parse analysis step at all.
Instead, teach the transformSetOperationStmt code to set up the CTE's output
column information after it's processed the non-recursive term normally.
Per report from David Fetter.
build actually attempts to advertise itself via Bonjour. Formerly it always
did so, which meant that packagers had to decide for their users whether
this behavior was wanted or not. The default is "off" to be on the safe
side, though this represents a change in the default behavior of a
Bonjour-enabled build. Per discussion.
with the not-so-deprecated DNSServiceRegister. This patch shouldn't change
any user-visible behavior, it just gets rid of a deprecation warning in
--with-bonjour builds. The new code will fail on OS X releases before 10.3,
but it seems unlikely that anyone will want to run Postgres 8.5 on 10.2.
Formerly, these message types would be discarded unless there was already
a stats hash table entry for the target table. However, the intent of
saving hash table space for unused tables was subverted by the fact that
the physical I/O done by the vacuum or analyze would result in an immediately
following tabstat message, which would create the hash table entry anyway.
All that we had left was surprising loss of statistical data, as in a recent
complaint from Jaime Casanova.
It seems unlikely that a real database would have many tables that go totally
untouched over the long haul, so the consensus is that this "optimization"
serves little purpose anyhow. Remove it, and just create the hash table
entry on demand in all cases.
input functions don't accept either. While the backend can handle such
values fine, they can cause trouble in clients and in pg_dump/restore.
This is followup to the original issue on time datatype reported by Andrew
McNamara a while ago. Like that one, none of these seem worth
back-patching.
specify an encoding explicitly, we used to treat it as being in database
encoding when we parsed it, but then perform a UTF-8 -> database encoding
conversion on it, which was completely bogus. It's now consistently treated as
UTF-8.
to unload and re-load the library.
The difficulty with unloading a library is that we haven't defined safe
protocols for doing so. In particular, there's no safe mechanism for
getting out of a "hook" function pointer unless libraries are unloaded
in reverse order of loading. And there's no mechanism at all for undefining
a custom GUC variable, so GUC would be left with a pointer to an old value
that might or might not still be valid, and very possibly wouldn't be in
the same place anymore.
While the unload and reload behavior had some usefulness in easing
development of new loadable libraries, it's of no use whatever to normal
users, so just disabling it isn't giving up that much. Someday we might
care to expend the effort to develop safe unload protocols; but even if
we did, there'd be little certainty that every third-party loadable module
was following them, so some security restrictions would still be needed.
Back-patch to 8.2; before that, LOAD was superuser-only anyway.
Security: unprivileged users could crash backend. CVE not assigned yet
functions.
This extends the previous patch that forbade SETting these variables inside
security-definer functions. RESET is equally a security hole, since it
would allow regaining privileges of the caller; furthermore it can trigger
Assert failures and perhaps other internal errors, since the code is not
expecting these variables to change in such contexts. The previous patch
did not cover this case because assign hooks don't really have enough
information, so move the responsibility for preventing this into guc.c.
Problem discovered by Heikki Linnakangas.
Security: no CVE assigned yet, extends CVE-2007-6600
to occur for division by zero, even though the code is carefully avoiding
that. All available evidence is that the only functions affected are
int24div, int48div, and int28div, so patch just those three functions to
include a "return" after the ereport() call.
Backpatch to 8.4 so that the fix can be tested in production builds.
For older branches our recommendation will continue to be to use -O1
on affected platforms (which are mostly non-mainstream anyway).
that's generated for a whole-row Var referencing the subquery, when the
subquery is in the nullable side of an outer join. The previous coding
instead put PlaceHolderVars around the elements of the RowExpr. The effect
was that when the outer join made the subquery outputs go to null, the
whole-row Var produced ROW(NULL,NULL,...) rather than just NULL. There
are arguments afoot about whether those things ought to be semantically
indistinguishable, but for the moment they are not entirely so, and the
planner needs to take care that its machinations preserve the difference.
Per bug #5025.
Making this feasible required refactoring ResolveNew() to allow more caller
control over what is substituted for a Var. I chose to make ResolveNew()
a wrapper around a new general-purpose function replace_rte_variables().
I also fixed the ancient bogosity that ResolveNew might fail to set
a query's hasSubLinks field after inserting a SubLink in it. Although
all current callers make sure that happens anyway, we've had bugs of that
sort before, and it seemed like a good time to install a proper solution.
Back-patch to 8.4. The problem can be demonstrated clear back to 8.0,
but the fix would be too invasive in earlier branches; not to mention
that people may be depending on the subtly-incorrect behavior. The
8.4 series is new enough that fixing this probably won't cause complaints,
but it might in older branches. Also, 8.4 shows the incorrect behavior
in more cases than older branches do, because it is able to flatten
subqueries in more cases.
own database's datfrozenxid, if the current value is old enough to be
forcing autovacuums or warning messages. This ensures that a bogus
value is replaced as soon as possible. Per a comment from Heikki.
Recent commits have removed the various uses it was supporting. It was a
performance bottleneck, according to bug report #4919 by Lauris Ulmanis; seems
it slowed down user creation after a billion users.
to fix the problem that SetClientEncoding needs to be done before
InitializeClientEncoding, as reported by Zdenek Kotala. We get at least
the small consolation of being able to remove the bizarre API detail that
had InitPostgres returning whether user is a superuser.
via the "flat files" facility. This requires making it enough like a backend
to be able to run transactions; it's no longer an "auxiliary process" but
more like the autovacuum worker processes. Also, its signal handling has
to be brought into line with backends/workers. In particular, since it
now has to handle procsignal.c processing, the special autovac-launcher-only
signal conditions are moved to SIGUSR2.
Alvaro, with some cleanup from Tom
XID) in checkpoint records. This eliminates the need to recompute the value
from scratch during database startup, which is one of the two remaining
reasons for the flatfile code to exist. It should also simplify life for
hot-standby operation.
To avoid bloating the checkpoint records unreasonably, I switched from
tracking the oldest database by name to tracking it by OID. This turns
out to save cycles in general (everywhere but the warning-generating
paths, which we hardly care about) and also helps us deal with the case
that the oldest database got dropped instead of being vacuumed. The prior
coding might go for a long time without updating the wrap limit in that case,
which is bad because it might result in a lot of useless autovacuum activity.