Commit Graph

5664 Commits

Author SHA1 Message Date
Tom Lane
5cfa8dd300 Use mutex hint bit in PPC LWARX instructions, where possible.
The hint bit makes for a small but measurable performance improvement
in access to contended spinlocks.

On the other hand, some PPC chips give an illegal-instruction failure.
There doesn't seem to be a completely bulletproof way to tell whether the
hint bit will cause an illegal-instruction failure other than by trying
it; but most if not all 64-bit PPC machines should accept it, so follow
the Linux kernel's lead and assume it's okay to use it in 64-bit builds.
Of course we must also check whether the assembler accepts the command,
since even with a recent CPU the toolchain could be old.

Patch by Manabu Ori, significantly modified by me.
2012-01-02 00:02:00 -05:00
Bruce Momjian
e126958c2e Update copyright notices for year 2012. 2012-01-01 18:01:58 -05:00
Simon Riggs
64233902d2 Send new protocol keepalive messages to standby servers.
Allows streaming replication users to calculate transfer latency
and apply delay via internal functions. No external functions yet.
2011-12-31 13:30:26 +00:00
Peter Eisentraut
d383c23f6f Remove support for on_exit()
All supported platforms support the C89 standard function atexit()
(SunOS 4 probably being the last one not to), and supporting both
makes the code clumsy.
2011-12-27 20:57:59 +02:00
Tom Lane
472d3935a2 Rethink representation of index clauses' mapping to index columns.
In commit e2c2c2e8b1 I made use of nested
list structures to show which clauses went with which index columns, but
on reflection that's a data structure that only an old-line Lisp hacker
could love.  Worse, it adds unnecessary complication to the many places
that don't much care which clauses go with which index columns.  Revert
to the previous arrangement of flat lists of clauses, and instead add a
parallel integer list of column numbers.  The places that care about the
pairing can chase both lists with forboth(), while the places that don't
care just examine one list the same as before.

The only real downside to this is that there are now two more lists that
need to be passed to amcostestimate functions in case they care about
column matching (which btcostestimate does, so not passing the info is not
an option).  Rather than deal with 11-argument amcostestimate functions,
pass just the IndexPath and expect the functions to extract fields from it.
That gets us down to 7 arguments which is better than 11, and it seems
more future-proof against likely additions to the information we keep
about an index path.
2011-12-24 19:03:21 -05:00
Tom Lane
e2c2c2e8b1 Improve planner's handling of duplicated index column expressions.
It's potentially useful for an index to repeat the same indexable column
or expression in multiple index columns, if the columns have different
opclasses.  (If they share opclasses too, the duplicate column is pretty
useless, but nonetheless we've allowed such cases since 9.0.)  However,
the planner failed to cope with this, because createplan.c was relying on
simple equal() matching to figure out which index column each index qual
is intended for.  We do have that information available upstream in
indxpath.c, though, so the fix is to not flatten the multi-level indexquals
list when putting it into an IndexPath.  Then we can rely on the sublist
structure to identify target index columns in createplan.c.  There's a
similar issue for index ORDER BYs (the KNNGIST feature), so introduce a
multi-level-list representation for that too.  This adds a bit more
representational overhead, but we might more or less buy that back by not
having to search for matching index columns anymore in createplan.c;
likewise btcostestimate saves some cycles.

Per bug #6351 from Christian Rudolph.  Likely symptoms include the "btree
index keys must be ordered by attribute" failure shown there, as well as
"operator MMMM is not a member of opfamily NNNN".

Although this is a pre-existing problem that can be demonstrated in 9.0 and
9.1, I'm not going to back-patch it, because the API changes in the planner
seem likely to break things such as index plugins.  The corner cases where
this matters seem too narrow to justify possibly breaking things in a minor
release.
2011-12-23 18:45:14 -05:00
Robert Haas
d5448c7d31 Add bytea_agg, parallel to string_agg.
Pavel Stehule
2011-12-23 08:40:25 -05:00
Robert Haas
99b60fc04e Catversion bump for commit 0e4611c023.
It changed the format of stored rules.
2011-12-22 17:25:35 -05:00
Robert Haas
0e4611c023 Add a security_barrier option for views.
When a view is marked as a security barrier, it will not be pulled up
into the containing query, and no quals will be pushed down into it,
so that no function or operator chosen by the user can be applied to
rows not exposed by the view.  Views not configured with this
option cannot provide robust row-level security, but will perform far
better.

Patch by KaiGai Kohei; original problem report by Heikki Linnakangas
(in October 2009!).  Review (in earlier versions) by Noah Misch and
others.  Design advice by Tom Lane and myself.  Further review and
cleanup by me.
2011-12-22 16:16:31 -05:00
Peter Eisentraut
f90dd28062 Add ALTER DOMAIN ... RENAME
You could already rename domains using ALTER TYPE, but with this new
command it is more consistent with how other commands treat domains as
a subcategory of types.
2011-12-22 22:43:56 +02:00
Robert Haas
cbe24a6dd8 Improve behavior of concurrent CLUSTER.
In the previous coding, a user could queue up for an AccessExclusiveLock
on a table they did not have permission to cluster, thus potentially
interfering with access by authorized users who got stuck waiting behind
the AccessExclusiveLock.  This approach avoids that.  cluster() has the
same permissions-checking requirements as REINDEX TABLE, so this commit
moves the now-shared callback to tablecmds.c and renames it, per
discussion with Noah Misch.
2011-12-21 15:17:28 -05:00
Robert Haas
d573e239f0 Take fewer snapshots.
When a PORTAL_ONE_SELECT query is executed, we can opportunistically
reuse the parse/plan shot for the execution phase.  This cuts down the
number of snapshots per simple query from 2 to 1 for the simple
protocol, and 3 to 2 for the extended protocol.  Since we are only
reusing a snapshot taken early in the processing of the same protocol
message, the change shouldn't be user-visible, except that the remote
possibility of the planning and execution snapshots being different is
eliminated.

Note that this change does not make it safe to assume that the parse/plan
snapshot will certainly be reused; that will currently only happen if
PortalStart() decides to use the PORTAL_ONE_SELECT strategy.  It might
be worth trying to provide some stronger guarantees here in the future,
but for now we don't.

Patch by me; review by Dimitri Fontaine.
2011-12-21 09:16:55 -05:00
Peter Eisentraut
729205571e Add support for privileges on types
This adds support for the more or less SQL-conforming USAGE privilege
on types and domains.  The intent is to be able restrict which users
can create dependencies on types, which restricts the way in which
owners can alter types.

reviewed by Yeb Havinga
2011-12-20 00:05:19 +02:00
Alvaro Herrera
05e992e90e Forgot catversion bump on previous patch
Per Tom
2011-12-19 17:45:17 -03:00
Tom Lane
8f57b064fd Rename updateNodeLink to spgUpdateNodeLink.
On reflection, the original name seems way too generic for a global
symbol.  A quick check shows this is the only exported function name
in SP-GiST that doesn't begin with "spg" or contain "SpGist", so the
rest of them seem all right.
2011-12-19 15:38:32 -05:00
Alvaro Herrera
61d81bd28d Allow CHECK constraints to be declared ONLY
This makes them enforceable only on the parent table, not on children
tables.  This is useful in various situations, per discussion involving
people bitten by the restrictive behavior introduced in 8.4.

Message-Id:
8762mp93iw.fsf@comcast.net
CAFaPBrSMMpubkGf4zcRL_YL-AERUbYF_-ZNNYfb3CVwwEqc9TQ@mail.gmail.com

Authors: Nikhil Sontakke, Alex Hunsaker
Reviewed by Robert Haas and myself
2011-12-19 17:30:23 -03:00
Tom Lane
9220362493 Teach SP-GiST to do index-only scans.
Operator classes can specify whether or not they support this; this
preserves the flexibility to use lossy representations within an index.

In passing, move constant data about a given index into the rd_amcache
cache area, instead of doing fresh lookups each time we start an index
operation.  This is mainly to try to make sure that spgcanreturn() has
insignificant cost; I still don't have any proof that it matters for
actual index accesses.  Also, get rid of useless copying of FmgrInfo
pointers; we can perfectly well use the relcache's versions in-place.
2011-12-19 14:58:41 -05:00
Tom Lane
3695a55513 Replace simple constant pg_am.amcanreturn with an AM support function.
The need for this was debated when we put in the index-only-scan feature,
but at the time we had no near-term expectation of having AMs that could
support such scans for only some indexes; so we kept it simple.  However,
the SP-GiST AM forces the issue, so let's fix it.

This patch only installs the new API; no behavior actually changes.
2011-12-18 15:50:37 -05:00
Tom Lane
5577ca5bfb Remove bogus entries in gist point_ops operator class.
These entries could never be matched to an index clause because they don't
have the index datatype on the left-hand side of the operator.  (Their
commutators are in the opclass, which is sensible, but that doesn't mean
these operators should be.)  Spotted by a test that I recently added to
opr_sanity to catch exactly this type of thinko.  AFAICT there is no code
in gistproc.c that is specifically meant to cover these cases, so nothing
to remove at that level.
2011-12-17 18:51:00 -05:00
Tom Lane
8daeb5ddd6 Add SP-GiST (space-partitioned GiST) index access method.
SP-GiST is comparable to GiST in flexibility, but supports non-balanced
partitioned search structures rather than balanced trees.  As described at
PGCon 2011, this new indexing structure can beat GiST in both index build
time and query speed for search problems that it is well matched to.

There are a number of areas that could still use improvement, but at this
point the code seems committable.

Teodor Sigaev and Oleg Bartunov, with considerable revisions by Tom Lane
2011-12-17 16:42:30 -05:00
Robert Haas
0d76b60db4 Various micro-optimizations for GetSnapshopData().
Heikki Linnakangas had the idea of rearranging GetSnapshotData to
avoid checking for sub-XIDs when no top-level XID is present.  This
patch does that plus further a bit of further, related rearrangement.
Benchmarking show a significant improvement on unlogged tables at
higher concurrency levels, and mostly indifferent result on permanent
tables (which are presumably bottlenecked elsewhere).  Most of the
benefit seems to come from using the new NormalTransactionIdPrecedes()
macro rather than the function call TransactionIdPrecedes().
2011-12-16 21:48:47 -05:00
Andrew Dunstan
6d09b2105f include_if_exists facility for config file.
This works the same as include, except that an error is not thrown
if the file is missing. Instead the fact that it's missing is
logged.

Greg Smith, reviewed by Euler Taveira de Oliveira.
2011-12-15 19:40:58 -05:00
Robert Haas
74a1d4fe7c Improve behavior of concurrent rename statements.
Previously, renaming a table, sequence, view, index, foreign table,
column, or trigger checked permissions before locking the object, which
meant that if permissions were revoked during the lock wait, we would
still allow the operation.  Similarly, if the original object is dropped
and a new one with the same name is created, the operation will be allowed
if we had permissions on the old object; the permissions on the new
object don't matter.  All this is now fixed.

Along the way, attempting to rename a trigger on a foreign table now gives
the same error message as trying to create one there in the first place
(i.e. that it's not a table or view) rather than simply stating that no
trigger by that name exists.

Patch by me; review by Noah Misch.
2011-12-15 19:02:38 -05:00
Robert Haas
f6835ea90a Fix typo. 2011-12-15 18:22:29 -05:00
Tom Lane
2dd9322ba6 Move BKP_REMOVABLE bit from individual WAL records to WAL page headers.
Removing this bit from xl_info allows us to restore the old limit of four
(not three) separate pages touched by a WAL record, which is needed for the
upcoming SP-GiST feature, and will likely be useful elsewhere in future.

When we implemented XLR_BKP_REMOVABLE in 2007, we had to do it like that
because no special WAL-visible action was taken when starting a backup.
However, now we force a segment switch when starting a backup, so a
compressing WAL archiver (such as pglesslog) that uses the state shown in
the current page header will not be fooled as to removability of backup
blocks.  The only downside is that the archiver will not return to
compressing mode for up to one WAL page after the backup is over, which is
a small price to pay for getting back the extra xl_info bit.  In any case
the archiver could look for XLOG_BACKUP_END records if it thought it was
worth the trouble to do so.

Bump XLOG_PAGE_MAGIC since this is effectively a change in WAL format.
2011-12-12 16:22:14 -05:00
Heikki Linnakangas
8409b60476 Revert the behavior of inet/cidr functions to not unpack the arguments.
I forgot to change the functions to use the PG_GETARG_INET_PP() macro,
when I changed DatumGetInetP() to unpack the datum, like Datum*P macros
usually do. Also, I screwed up the definition of the PG_GETARG_INET_PP()
macro, and didn't notice because it wasn't used.

This fixes the memory leak when sorting inet values, as reported
by Jochen Erwied and debugged by Andres Freund. Backpatch to 8.3, like
the previous patch that broke it.
2011-12-12 10:10:53 +02:00
Andrew Dunstan
8e461ca5a9 Remove define inadvertantly left over from testing. 2011-12-10 16:29:37 -05:00
Andrew Dunstan
1a0c76c32f Enable compiling with the mingw-w64 32 bit compiler.
Original patch by Lars Kanis, reviewed by Nishiyama Tomoaki and tweaked some by me.

This compiler, or at least the latest version of it, is currently broken, and
only passes the regression tests if built with -O0.
2011-12-10 15:35:41 -05:00
Peter Eisentraut
5bcf8ede45 Add ALTER FOREIGN DATA WRAPPER / RENAME and ALTER SERVER / RENAME 2011-12-09 20:42:30 +02:00
Heikki Linnakangas
9f0d2bdc88 Don't set reachedMinRecoveryPoint during crash recovery. In crash recovery,
we don't reach consistency before replaying all of the WAL. Rename the
variable to reachedConsistency, to make its intention clearer.

In master, that was an active bug because of the recent patch to
immediately PANIC if a reference to a missing page is found in WAL after
reaching consistency, as Tom Lane's test case demonstrated. In 9.1 and 9.0,
the only consequence was a misleading "consistent recovery state reached at
%X/%X" message in the log at the beginning of crash recovery (the database
is not consistent at that point yet). In 8.4, the log message was not
printed in crash recovery, even though there was a similar
reachedMinRecoveryPoint local variable that was also set early. So,
backpatch to 9.1 and 9.0.
2011-12-09 15:21:12 +02:00
Heikki Linnakangas
5d8a894e30 Cancel running query if it is detected that the connection to the client is
lost. The only way we detect that at the moment is when write() fails when
we try to write to the socket.

Florian Pflug with small changes by me, reviewed by Greg Jaskiewicz.
2011-12-09 14:21:36 +02:00
Peter Eisentraut
d5f23af6bf Add const qualifiers to node inspection functions
Thomas Munro
2011-12-07 21:46:56 +02:00
Magnus Hagander
16d8e594ac Remove spclocation field from pg_tablespace
Instead, add a function pg_tablespace_location(oid) used to return
the same information, and do this by reading the symbolic link.

Doing it this way makes it possible to relocate a tablespace when the
database is down by simply changing the symbolic link.
2011-12-07 10:37:33 +01:00
Tom Lane
c6e3ac11b6 Create a "sort support" interface API for faster sorting.
This patch creates an API whereby a btree index opclass can optionally
provide non-SQL-callable support functions for sorting.  In the initial
patch, we only use this to provide a directly-callable comparator function,
which can be invoked with a bit less overhead than the traditional
SQL-callable comparator.  While that should be of value in itself, the real
reason for doing this is to provide a datatype-extensible framework for
more aggressive optimizations, as in Peter Geoghegan's recent work.

Robert Haas and Tom Lane
2011-12-07 00:19:39 -05:00
Heikki Linnakangas
1e616f6391 During recovery, if we reach consistent state and still have entries in the
invalid-page hash table, PANIC immediately. Immediate PANIC is much better
than waiting for end-of-recovery, which is what we did before, because the
end-of-recovery might not come until months later if this is a standby
server.

Also refrain from creating a restartpoint if there are invalid-page entries
in the hash table. Restarting recovery from such a restartpoint would not
see the invalid references, and wouldn't be able to cross-check them when
consistency is reached. That wouldn't matter when things are going smoothly,
but the more sanity checks you have the better.

Fujii Masao
2011-12-02 10:49:54 +02:00
Robert Haas
2ad36c4e44 Improve table locking behavior in the face of current DDL.
In the previous coding, callers were faced with an awkward choice:
look up the name, do permissions checks, and then lock the table; or
look up the name, lock the table, and then do permissions checks.
The first choice was wrong because the results of the name lookup
and permissions checks might be out-of-date by the time the table
lock was acquired, while the second allowed a user with no privileges
to interfere with access to a table by users who do have privileges
(e.g. if a malicious backend queues up for an AccessExclusiveLock on
a table on which AccessShareLock is already held, further attempts
to access the table will be blocked until the AccessExclusiveLock
is obtained and the malicious backend's transaction rolls back).

To fix, allow callers of RangeVarGetRelid() to pass a callback which
gets executed after performing the name lookup but before acquiring
the relation lock.  If the name lookup is retried (because
invalidation messages are received), the callback will be re-executed
as well, so we get the best of both worlds.  RangeVarGetRelid() is
renamed to RangeVarGetRelidExtended(); callers not wishing to supply
a callback can continue to invoke it as RangeVarGetRelid(), which is
now a macro.  Since the only one caller that uses nowait = true now
passes a callback anyway, the RangeVarGetRelid() macro defaults nowait
as well.  The callback can also be used for supplemental locking - for
example, REINDEX INDEX needs to acquire the table lock before the index
lock to reduce deadlock possibilities.

There's a lot more work to be done here to fix all the cases where this
can be a problem, but this commit provides the general infrastructure
and fixes the following specific cases: REINDEX INDEX, REINDEX TABLE,
LOCK TABLE, and and DROP TABLE/INDEX/SEQUENCE/VIEW/FOREIGN TABLE.

Per discussion with Noah Misch and Alvaro Herrera.
2011-11-30 10:27:00 -05:00
Tom Lane
dd3bab5fd7 Ensure that whole-row junk Vars are always of composite type.
The EvalPlanQual machinery assumes that whole-row Vars generated for the
outputs of non-table RTEs will be of composite types.  However, for the
case where the RTE is a function call returning a scalar type, we were
doing the wrong thing, as a result of sharing code with a parser case
where the function's scalar output is wanted.  (Or at least, that's what
that case has done historically; it does seem a bit inconsistent.)

To fix, extend makeWholeRowVar's API so that it can support both use-cases.
This fixes Belinda Cussen's report of crashes during concurrent execution
of UPDATEs involving joins to the result of UNNEST() --- in READ COMMITTED
mode, we'd run the EvalPlanQual machinery after a conflicting row update
commits, and it was expecting to get a HeapTuple not a scalar datum from
the "wholerowN" variable referencing the function RTE.

Back-patch to 9.0 where the current EvalPlanQual implementation appeared.

In 9.1 and up, this patch also fixes failure to attach the correct
collation to the Var generated for a scalar-result case.  An example:
regression=# select upper(x.*) from textcat('ab', 'cd') x;
ERROR:  could not determine which collation to use for upper() function
2011-11-27 22:27:24 -05:00
Tom Lane
c66e4f138b Improve GiST range-contained-by searches by adding a flag for empty ranges.
In the original implementation, a range-contained-by search had to scan
the entire index because an empty range could be lurking anywhere.
Improve that by adding a flag to upper GiST entries that says whether the
represented subtree contains any empty ranges.

Also, make a simple mod to the penalty function to discourage empty ranges
from getting pushed into subtrees without any.  This needs more work, and
the picksplit function should be taught about it too, but that code can be
improved without causing an on-disk compatibility break; so we'll leave it
for another day.

Since we're breaking on-disk compatibility of range values anyway, I took
the opportunity to reorganize the range flags bits; the unused
RANGE_xB_NULL bits are now adjacent, which might open the door for using
them in some other way later.

In passing, remove the GiST range opclass entry for <>, which doesn't seem
like it can really be indexed usefully.

Alexander Korotkov, with some editorializing by Tom
2011-11-27 16:51:29 -05:00
Alvaro Herrera
9d3b502443 Improve logging of autovacuum I/O activity
This adds some I/O stats to the logging of autovacuum (when the
operation takes long enough that log_autovacuum_min_duration causes it
to be logged), so that it is easier to tune.  Notably, it adds buffer
I/O counts (hits, misses, dirtied) and read and write rate.

Authors: Greg Smith and Noah Misch
2011-11-25 16:34:32 -03:00
Robert Haas
ed0b409d22 Move "hot" members of PGPROC into a separate PGXACT array.
This speeds up snapshot-taking and reduces ProcArrayLock contention.
Also, the PGPROC (and PGXACT) structures used by two-phase commit are
now allocated as part of the main array, rather than in a separate
array, and we keep ProcArray sorted in pointer order.  These changes
are intended to minimize the number of cache lines that must be pulled
in to take a snapshot, and testing shows a substantial increase in
performance on both read and write workloads at high concurrencies.

Pavan Deolasee, Heikki Linnakangas, Robert Haas
2011-11-25 08:02:10 -05:00
Tom Lane
9ed439a9c0 Fix unsupported options in CREATE TABLE ... AS EXECUTE.
The WITH [NO] DATA option was not supported, nor the ability to specify
replacement column names; the former limitation wasn't even documented, as
per recent complaint from Naoya Anzai.  Fix by moving the responsibility
for supporting these options into the executor.  It actually takes less
code this way ...

catversion bump due to change in representation of IntoClause, which might
affect stored rules.
2011-11-24 23:21:45 -05:00
Tom Lane
74c1723fc8 Remove user-selectable ANALYZE option for range types.
It's not clear that a per-datatype typanalyze function would be any more
useful than a generic typanalyze for ranges.  What *is* clear is that
letting unprivileged users select typanalyze functions is a crash risk or
worse.  So remove the option from CREATE TYPE AS RANGE, and instead put in
a generic typanalyze function for ranges.  The generic function does
nothing as yet, but hopefully we'll improve that before 9.2 release.
2011-11-23 00:03:22 -05:00
Tom Lane
df73584431 Remove zero- and one-argument range constructor functions.
Per discussion, the zero-argument forms aren't really worth the catalog
space (just write 'empty' instead).  The one-argument forms have some use,
but they also have a serious problem with looking too much like functional
cast notation; to the point where in many real use-cases, the parser would
misinterpret what was wanted.

Committing this as a separate patch, with the thought that we might want
to revert part or all of it if we can think of some way around the cast
ambiguity.
2011-11-22 20:45:05 -05:00
Tom Lane
cddc819e45 Improve implementation of range-contains-element tests.
Implement these tests directly instead of constructing a singleton range
and then applying range-contains.  This saves a range serialize/deserialize
cycle as well as a couple of redundant bound-comparison steps, and adds
very little code on net.

Remove elem_contained_by_range from the GiST opclass: it doesn't belong
there because there is no way to use it in an index clause (where the
indexed column would have to be on the left).  Its commutator is in the
opclass, and that's what counts.
2011-11-22 17:45:37 -05:00
Tom Lane
766948bedd Still more review for range-types patch.
Per discussion, relax the range input/construction rules so that the
only hard error is lower bound > upper bound.  Cases where the lower
bound is <= upper bound, but the range nonetheless normalizes to empty,
are now permitted.

Fix core dump in range_adjacent when bounds are infinite.  Marginal
cleanup of regression test cases, some more code commenting.
2011-11-22 16:06:26 -05:00
Tom Lane
a4ffcc8e11 More code review for rangetypes patch.
Fix up some infelicitous coding in DefineRange, and add some missing error
checks.  Rearrange operator strategy number assignments for GiST anyrange
opclass so that they don't make such a mess of opr_sanity's table of
operator names associated with different strategy numbers.  Assign
hopefully-temporary selectivity estimators to range operators that didn't
have one --- poor as the estimates are, they're still a lot better than the
default 0.5 estimate, and they'll shut up the opr_sanity test that wants to
see selectivity estimators on all built-in operators.
2011-11-21 16:19:53 -05:00
Tom Lane
b985d48779 Further code review for range types patch.
Fix some bugs in coercion logic and pg_dump; more comment cleanup;
minor cosmetic improvements.
2011-11-20 23:50:27 -05:00
Tom Lane
a1a233af66 Further review of range-types patch.
Lots of documentation cleanup today, and still more type_sanity tests.
2011-11-18 18:24:32 -05:00
Tom Lane
f6438f6622 Do missed autoheader run for previous commit. 2011-11-17 22:39:14 -05:00
Robert Haas
fc6d1006bd Further consolidation of DROP statement handling.
This gets rid of an impressive amount of duplicative code, with only
minimal behavior changes.  DROP FOREIGN DATA WRAPPER now requires object
ownership rather than superuser privileges, matching the documentation
we already have.  We also eliminate the historical warning about dropping
a built-in function as unuseful.  All operations are now performed in the
same order for all object types handled by dropcmds.c.

KaiGai Kohei, with minor revisions by me
2011-11-17 21:32:34 -05:00