Commit Graph

13623 Commits

Author SHA1 Message Date
Noah Misch d41cb869aa Ignore interrupts during quickdie().
Once the administrator has called for an immediate shutdown or a backend
crash has triggered a reinitialization, no mere SIGINT or SIGTERM should
change that course.  Such derailment remains possible when the signal
arrives before quickdie() blocks signals.  That being a narrow race
affecting most PostgreSQL signal handlers in some way, leave it for
another patch.  Back-patch this to all supported versions.
2013-09-11 20:10:15 -04:00
Peter Eisentraut b34f8f409b Show schemas in information_schema.schemata that the current has access to
Before, it would only show schemas that the current user owns.  Per
discussion, the new behavior is more useful and consistent for PostgreSQL.
2013-09-09 22:25:37 -04:00
Robert Haas 71901ab6da Introduce InvalidCommandId.
This allows a 32-bit field to represent an *optional* command ID
without a separate flag bit.

Andres Freund
2013-09-09 16:25:29 -04:00
Noah Misch b8104730c8 Don't VALGRIND_PRINTF() each query string.
Doing so was helpful for some Valgrind usage and distracting for other
usage.  One can achieve the same effect by changing log_statement and
pointing both PostgreSQL and Valgrind logging to stderr.

Per gripe from Andres Freund.
2013-09-06 19:42:00 -04:00
Kevin Grittner 277607d600 Eliminate pg_rewrite.ev_attr column and related dead code.
Commit 95ef6a3448 removed the
ability to create rules on an individual column as of 7.3, but
left some residual code which has since been useless.  This cleans
up that dead code without any change in behavior other than
dropping the useless column from the catalog.
2013-09-05 14:03:43 -05:00
Heikki Linnakangas 20cb18db46 Make catalog cache hash tables resizeable.
If the hash table backing a catalog cache becomes too full (fillfactor > 2),
enlarge it. A new buckets array, double the size of the old, is allocated,
and all entries in the old hash are moved to the right bucket in the new
hash.

This has two benefits. First, cache lookups don't get so expensive when
there are lots of entries in a cache, like if you access hundreds of
thousands of tables. Second, we can make the (initial) sizes of the caches
much smaller, which saves memory.

This patch dials down the initial sizes of the catcaches. The new sizes are
chosen so that a backend that only runs a few basic queries still won't need
to enlarge any of them.
2013-09-05 20:20:03 +03:00
Jeff Davis b1892aaeaa Revert WAL posix_fallocate() patches.
This reverts commit 269e780822
and commit 5b571bb8c8.

Unfortunately, the initial patch had insufficient performance testing,
and resulted in a regression.

Per report by Thom Brown.
2013-09-04 23:43:41 -07:00
Bruce Momjian f5c2f5a8f6 Add GUC descriptions for compile-time postgresql.conf settings
Previous text was "No description available".

Tianyin Xu
2013-09-04 17:44:04 -04:00
Heikki Linnakangas 375d8526f2 Keep heavily-contended fields in XLogCtlInsert on different cache lines.
Performance testing shows that if the insertpos_lck spinlock and the fields
that it protects are on the same cache line with other variables that are
frequently accessed, the false sharing can hurt performance a lot. Keep
them apart by adding some padding.
2013-09-04 23:14:33 +03:00
Robert Haas cc52d5b33f Expose fsync_fname as a public API.
Andres Freund
2013-09-04 11:15:00 -04:00
Tom Lane 0c66a22377 Update comments concerning PGC_S_TEST.
This GUC context value was once only used by ALTER DATABASE SET and
ALTER USER SET.  That's not true anymore, though, so rewrite the
comments to be a bit more general.

Patch in HEAD only, since this is just an internal documentation issue.
2013-09-03 18:56:22 -04:00
Tom Lane 546f7c2e38 Don't fail for bad GUCs in CREATE FUNCTION with check_function_bodies off.
The previous coding attempted to activate all the GUC settings specified
in SET clauses, so that the function validator could operate in the GUC
environment expected by the function body.  However, this is problematic
when restoring a dump, since the SET clauses might refer to database
objects that don't exist yet.  We already have the parameter
check_function_bodies that's meant to prevent forward references in
function definitions from breaking dumps, so let's change CREATE FUNCTION
to not install the SET values if check_function_bodies is off.

Authors of function validators were already advised not to make any
"context sensitive" checks when check_function_bodies is off, if indeed
they're checking anything at all in that mode.  But extend the
documentation to point out the GUC issue in particular.

(Note that we still check the SET clauses to some extent; the behavior
with !check_function_bodies is now approximately equivalent to what ALTER
DATABASE/ROLE have been doing for awhile with context-dependent GUCs.)

This problem can be demonstrated in all active branches, so back-patch
all the way.
2013-09-03 18:32:20 -04:00
Tom Lane 0d3f4406df Allow aggregate functions to be VARIADIC.
There's no inherent reason why an aggregate function can't be variadic
(even VARIADIC ANY) if its transition function can handle the case.
Indeed, this patch to add the feature touches none of the planner or
executor, and little of the parser; the main missing stuff was DDL and
pg_dump support.

It is true that variadic aggregates can create the same sort of ambiguity
about parameters versus ORDER BY keys that was complained of when we
(briefly) had both one- and two-argument forms of string_agg().  However,
the policy formed in response to that discussion only said that we'd not
create any built-in aggregates with varying numbers of arguments, not that
we shouldn't allow users to do it.  So the logical extension of that is
we can allow users to make variadic aggregates as long as we're wary about
shipping any such in core.

In passing, this patch allows aggregate function arguments to be named, to
the extent of remembering the names in pg_proc and dumping them in pg_dump.
You can't yet call an aggregate using named-parameter notation.  That seems
like a likely future extension, but it'll take some work, and it's not what
this patch is really about.  Likewise, there's still some work needed to
make window functions handle VARIADIC fully, but I left that for another
day.

initdb forced because of new aggvariadic field in Aggref parse nodes.
2013-09-03 17:08:46 -04:00
Heikki Linnakangas a93bdfc711 Fix typo in comment.
Also line-wrap an over-wide line in a comment that's ignored by pgindent.
2013-09-03 13:17:09 +03:00
Peter Eisentraut 6a007fa1eb Translation updates 2013-09-02 02:43:18 -04:00
Tom Lane 8e2b71d2d0 Reset the binary heap in MergeAppend rescans.
Failing to do so can cause queries to return wrong data, error out or crash.
This requires adding a new binaryheap_reset() method to binaryheap.c,
but that probably should have been there anyway.

Per bug #8410 from Terje Elde.  Diagnosis and patch by Andres Freund.
2013-08-30 19:15:21 -04:00
Alvaro Herrera 9381cb5229 Make error wording more consistent 2013-08-29 12:42:28 -04:00
Robert Haas 090d0f2050 Allow discovery of whether a dynamic background worker is running.
Using the infrastructure provided by this patch, it's possible either
to wait for the startup of a dynamically-registered background worker,
or to poll the status of such a worker without waiting.  In either
case, the current PID of the worker process can also be obtained.
As usual, worker_spi is updated to demonstrate the new functionality.

Patch by me.  Review by Andres Freund.
2013-08-28 14:08:13 -04:00
Robert Haas c9e2e2db5c Partially restore comments discussing enum renumbering hazards.
As noted by Tom Lane, commit 813fb03155
was overly optimistic about how safe it is to concurrently change
enumsortorder values under MVCC catalog scan semantics.  Restore
some of the previous text, with hopefully-correct adjustments for
the new state of play.
2013-08-28 13:21:08 -04:00
Alvaro Herrera e246cfc95f Initialize cached OID to Invalid in new hash entries
Andres Freund; bug detected by valgrind
2013-08-27 14:53:17 -04:00
Tom Lane 2aac3399ae Account better for planning cost when choosing whether to use custom plans.
The previous coding in plancache.c essentially used 10% of the estimated
runtime as its cost estimate for planning.  This can be pretty bogus,
especially when the estimated runtime is very small, such as in a simple
expression plan created by plpgsql, or a simple INSERT ... VALUES.

While we don't have a really good handle on how planning time compares
to runtime, it seems reasonable to use an estimate based on the number of
relations referenced in the query, with a rather large multiplier.  This
patch uses 1000 * cpu_operator_cost * (nrelations + 1), so that even a
trivial query will be charged 1000 * cpu_operator_cost for planning.
This should address the problem reported by Marc Cousin and others that
9.2 and up prefer custom plans in cases where the planning time greatly
exceeds what can be saved.
2013-08-24 15:14:17 -04:00
Magnus Hagander db4ef73760 Don't crash when pg_xlog is empty and pg_basebackup -x is used
The backup will not work (without a logarchive, and that's the whole
point of -x) in this case, this patch just changes it to throw an
error instead of crashing when this happens.

Noticed and diagnosed by TAKATSUKA Haruka
2013-08-24 17:13:49 +02:00
Tom Lane fcf9ecad57 In locate_grouping_columns(), don't expect an exact match of Var typmods.
It's possible that inlining of SQL functions (or perhaps other changes?)
has exposed typmod information not known at parse time.  In such cases,
Vars generated by query_planner might have valid typmod values while the
original grouping columns only have typmod -1.  This isn't a semantic
problem since the behavior of grouping only depends on type not typmod,
but it breaks locate_grouping_columns' use of tlist_member to locate the
matching entry in query_planner's result tlist.

We can fix this without an excessive amount of new code or complexity by
relying on the fact that locate_grouping_columns only gets called when
make_subplanTargetList has set need_tlist_eval == false, and that can only
happen if all the grouping columns are simple Vars.  Therefore we only need
to search the sub_tlist for a matching Var, and we can reasonably define a
"match" as being a match of the Var identity fields
varno/varattno/varlevelsup.  The code still Asserts that vartype matches,
but ignores vartypmod.

Per bug #8393 from Evan Martin.  The added regression test case is
basically the same as his example.  This has been broken for a very long
time, so back-patch to all supported branches.
2013-08-23 17:30:53 -04:00
Tom Lane 3454876314 Fix hash table size estimation error in choose_hashed_distinct().
We should account for the per-group hashtable entry overhead when
considering whether to use a hash aggregate to implement DISTINCT.  The
comparable logic in choose_hashed_grouping() gets this right, but I think
I omitted it here in the mistaken belief that there would be no overhead
if there were no aggregate functions to be evaluated.  This can result in
more than 2X underestimate of the hash table size, if the tuples being
aggregated aren't very wide.  Per report from Tomas Vondra.

This bug is of long standing, but per discussion we'll only back-patch into
9.3.  Changing the estimation behavior in stable branches seems to carry too
much risk of destabilizing plan choices for already-tuned applications.
2013-08-21 13:38:34 -04:00
Tom Lane 20fe870753 Be more wary of unwanted whitespace in pgstat_reset_remove_files().
sscanf isn't the easiest thing to use for exact pattern checks ...
also, don't use strncmp where strcmp will do.
2013-08-19 19:36:04 -04:00
Alvaro Herrera f9b50b7c18 Fix removal of files in pgstats directories
Instead of deleting all files in stats_temp_directory and the permanent
directory on a crash, only remove those files that match the pattern of
files we actually write in them, to avoid possibly clobbering existing
unrelated contents of the temporary directory.  Per complaint from Jeff
Janes, and subsequent discussion, starting at message
CAMkU=1z9+7RsDODnT4=cDFBRBp8wYQbd_qsLcMtKEf-oFwuOdQ@mail.gmail.com

Also, fix a bug in the same routine to avoid removing files from the
permanent directory twice (instead of once from that directory and then
from the temporary directory), also per report from Jeff Janes, in
message
CAMkU=1wbk947=-pAosDMX5VC+sQw9W4ttq6RM9rXu=MjNeEQKA@mail.gmail.com
2013-08-19 17:48:17 -04:00
Heikki Linnakangas 3619a20d33 Rename the "fast_promote" file to just "promote".
This keeps the usual trigger file name unchanged from 9.2, avoiding nasty
issues if you use a pre-9.3 pg_ctl binary with a 9.3 server or vice versa.
The fallback behavior of creating a full checkpoint before starting up is now
triggered by a file called "fallback_promote". That can be useful for
debugging purposes, but we don't expect any users to have to resort to that
and we might want to remove that in the future, which is why the fallback
mechanism is undocumented.
2013-08-19 20:59:51 +03:00
Tom Lane c64de21e96 Fix qual-clause-misplacement issues with pulled-up LATERAL subqueries.
In an example such as
SELECT * FROM
  i LEFT JOIN LATERAL (SELECT * FROM j WHERE i.n = j.n) j ON true;
it is safe to pull up the LATERAL subquery into its parent, but we must
then treat the "i.n = j.n" clause as a qual clause of the LEFT JOIN.  The
previous coding in deconstruct_recurse mistakenly labeled the clause as
"is_pushed_down", resulting in wrong semantics if the clause were applied
at the join node, as per an example submitted awhile ago by Jeremy Evans.
To fix, postpone processing of such clauses until we return back up to
the appropriate recursion depth in deconstruct_recurse.

In addition, tighten the is-safe-to-pull-up checks in is_simple_subquery;
we previously missed the possibility that the LATERAL subquery might itself
contain an outer join that makes lateral references in lower quals unsafe.

A regression test case equivalent to Jeremy's example was already in my
commit of yesterday, but was giving the wrong results because of this
bug.  This patch fixes the expected output for that, and also adds a
test case for the second problem.
2013-08-19 13:19:41 -04:00
Alvaro Herrera 78e1220104 Fix pg_upgrade failure from servers older than 9.3
When upgrading from servers of versions 9.2 and older, and MultiXactIds
have been used in the old server beyond the first page (that is, 2048
multis or more in the default 8kB-page build), pg_upgrade would set the
next multixact offset to use beyond what has been allocated in the new
cluster.  This would cause a failure the first time the new cluster
needs to use this value, because the pg_multixact/offsets/ file wouldn't
exist or wouldn't be large enough.  To fix, ensure that the transient
server instances launched by pg_upgrade extend the file as necessary.

Per report from Jesse Denardo in
CANiVXAj4c88YqipsyFQPboqMudnjcNTdB3pqe8ReXqAFQ=HXyA@mail.gmail.com
2013-08-19 12:56:18 -04:00
Peter Eisentraut a2f2e902b8 Translation updates 2013-08-18 23:41:03 -04:00
Kevin Grittner 28154bb23b Remove relcache entry invalidation in REFRESH MATERIALIZED VIEW.
This was added as part of the attempt to support unlogged matviews
along with a populated status.  It got missed when unlogged
support was removed pre-commit.

Noticed by Noah Misch.  Back-patched to 9.3 branch.
2013-08-18 16:19:22 -05:00
Tom Lane f1d5fce7cf Fix thinko in comment. 2013-08-17 20:36:29 -04:00
Tom Lane 9e7e29c75a Fix planner problems with LATERAL references in PlaceHolderVars.
The planner largely failed to consider the possibility that a
PlaceHolderVar's expression might contain a lateral reference to a Var
coming from somewhere outside the PHV's syntactic scope.  We had a previous
report of a problem in this area, which I tried to fix in a quick-hack way
in commit 4da6439bd8, but Antonin Houska
pointed out that there were still some problems, and investigation turned
up other issues.  This patch largely reverts that commit in favor of a more
thoroughly thought-through solution.  The new theory is that a PHV's
ph_eval_at level cannot be higher than its original syntactic level.  If it
contains lateral references, those don't change the ph_eval_at level, but
rather they create a lateral-reference requirement for the ph_eval_at join
relation.  The code in joinpath.c needs to handle that.

Another issue is that createplan.c wasn't handling nested PlaceHolderVars
properly.

In passing, push knowledge of lateral-reference checks for join clauses
into join_clause_is_movable_to.  This is mainly so that FDWs don't need
to deal with it.

This patch doesn't fix the original join-qual-placement problem reported by
Jeremy Evans (and indeed, one of the new regression test cases shows the
wrong answer because of that).  But the PlaceHolderVar problems need to be
fixed before that issue can be addressed, so committing this separately
seems reasonable.
2013-08-17 20:22:37 -04:00
Robert Haas 2dee7998f9 Move more bgworker code to bgworker.c; also, some renaming.
Per discussion on pgsql-hackers.

Michael Paquier, slightly modified by me.  Original suggestion
from Amit Kapila.
2013-08-16 15:31:28 -04:00
Heikki Linnakangas 05cbce6f30 Fix typo in comment. 2013-08-16 16:26:22 +03:00
Kevin Grittner 3f78b1715c Don't allow ALTER MATERIALIZED VIEW ADD UNIQUE.
Was accidentally allowed, but not documented and lacked support
for rename or drop once created.

Per report from Noah Misch.
2013-08-15 13:14:48 -05:00
Peter Eisentraut 229fb58d4f Treat timeline IDs as unsigned in replication parser
Timeline IDs are unsigned ints everywhere, except the replication parser
treated them as signed ints.
2013-08-14 23:18:49 -04:00
Peter Eisentraut 32f7c0ae17 Improve error message when view is not updatable
Avoid using the term "updatable" in confusing ways.  Suggest a trigger
first, before a rule.
2013-08-14 23:02:59 -04:00
Tom Lane 1b1d3d92c3 Remove ph_may_need from PlaceHolderInfo, with attendant simplifications.
The planner logic that attempted to make a preliminary estimate of the
ph_needed levels for PlaceHolderVars seems to be completely broken by
lateral references.  Fortunately, the potential join order optimization
that this code supported seems to be of relatively little value in
practice; so let's just get rid of it rather than trying to fix it.

Getting rid of this allows fairly substantial simplifications in
placeholder.c, too, so planning in such cases should be a bit faster.

Issue noted while pursuing bugs reported by Jeremy Evans and Antonin
Houska, though this doesn't in itself fix either of their reported cases.
What this does do is prevent an Assert crash in the kind of query
illustrated by the added regression test.  (I'm not sure that the plan for
that query is stable enough across platforms to be usable as a regression
test output ... but we'll soon find out from the buildfarm.)

Back-patch to 9.3.  The problem case can't arise without LATERAL, so
no need to touch older branches.
2013-08-14 18:38:47 -04:00
Kevin Grittner e2cd368678 Remove Assert that matview is not in system schema from REFRESH.
We don't want to prevent an extension which creates a matview from
being installed in pg_catalog.

Issue was raised by Hitoshi Harada.
Backpatched to 9.3.
2013-08-14 12:36:55 -05:00
Tom Lane 3d5282c6f0 Emit a log message if output is about to be redirected away from stderr.
We've seen multiple cases of people looking at the postmaster's original
stderr output to try to diagnose problems, not realizing/remembering that
their logging configuration is set up to send log messages somewhere else.
This seems particularly likely to happen in prepackaged distributions,
since many packagers patch the code to change the factory-standard logging
configuration to something more in line with their platform conventions.

In hopes of reducing confusion, emit a LOG message about this at the point
in startup where we are about to switch log output away from the original
stderr, providing a pointer to where to look instead.  This message will
appear as the last thing in the original stderr output.  (We might later
also try to emit such link messages when logging parameters are changed
on-the-fly; but that case seems to be both noticeably harder to do nicely,
and much less frequently a problem in practice.)

Per discussion, back-patch to 9.3 but not further.
2013-08-13 15:24:52 -04:00
Peter Eisentraut 072457b360 Message punctuation and pluralization fixes 2013-08-09 08:02:44 -04:00
Peter Eisentraut 9d775d8894 Message style improvements 2013-08-07 22:48:40 -04:00
Fujii Masao 91c3613d37 Fix assertion failure by an immediate shutdown.
In PM_WAIT_DEAD_END state, checkpointer process must be dead already.
But an immediate shutdown could make postmaster's state machine
transition to PM_WAIT_DEAD_END state even if checkpointer process is
still running,  and which caused assertion failure. This bug was introduced
in commit 457d6cf049.

This patch ensures that postmaster's state machine doesn't transition to
PM_WAIT_DEAD_END state in an immediate shutdown while checkpointer
process is running.
2013-08-08 02:48:53 +09:00
Tom Lane 3ced8837db Simplify query_planner's API by having it return the top-level RelOptInfo.
Formerly, query_planner returned one or possibly two Paths for the topmost
join relation, so that grouping_planner didn't see the join RelOptInfo
(at least not directly; it didn't have any hesitation about examining
cheapest_path->parent, though).  However, correct selection of the Paths
involved a significant amount of coupling between query_planner and
grouping_planner, a problem which has gotten worse over time.  It seems
best to give up on this API choice and instead return the topmost
RelOptInfo explicitly.  Then grouping_planner can pull out the Paths it
wants from the rel's path list.  In this way we can remove all knowledge
of grouping behaviors from query_planner.

The only real benefit of the old way is that in the case of an empty
FROM clause, we never made any RelOptInfos at all, just a Path.  Now
we have to gin up a dummy RelOptInfo to represent the empty FROM clause.
That's not a very big deal though.

While at it, simplify query_planner's API a bit more by having the caller
set up root->tuple_fraction and root->limit_tuples, rather than passing
those values as separate parameters.  Since query_planner no longer does
anything with either value, requiring it to fill the PlannerInfo fields
seemed pretty arbitrary.

This patch just rearranges code; it doesn't (intentionally) change any
behaviors.  Followup patches will do more interesting things.
2013-08-05 15:01:09 -04:00
Kevin Grittner 841c29c8b3 Various cleanups for REFRESH MATERIALIZED VIEW CONCURRENTLY.
Open and lock each index before checking definition in RMVC.  The
ExclusiveLock on the related table is not viewed as sufficient to
ensure that no changes are made to the index definition, and
invalidation messages from other backends might have been missed.
Additionally, use RelationGetIndexExpressions() and check for NIL
rather than doing our own loop.

Protect against redefinition of tid and rowvar operators in RMVC.
While working on this, noticed that the fixes for bugs found during
the CF made the UPDATE statement useless, since no rows could
qualify for that treatment any more.  Ripping out code to support
the UPDATE statement simplified the operator cleanups.

Change slightly confusing local field name.

Use meaningful alias names on queries in refresh_by_match_merge().

Per concerns of raised by Andres Freund and comments and
suggestions from Noah Misch.  Some additional issues remain, which
will be addressed separately.
2013-08-05 09:57:56 -05:00
Tom Lane 221e92f64c Make sure float4in/float8in accept all standard spellings of "infinity".
The C99 and POSIX standards require strtod() to accept all these spellings
(case-insensitively): "inf", "+inf", "-inf", "infinity", "+infinity",
"-infinity".  However, pre-C99 systems might accept only some or none of
these, and apparently Windows still doesn't accept "inf".  To avoid
surprising cross-platform behavioral differences, manually check for each
of these spellings if strtod() fails.  We were previously handling just
"infinity" and "-infinity" that way, but since C99 is most of the world
now, it seems likely that applications are expecting all these spellings
to work.

Per bug #8355 from Basil Peace.  It turns out this fix won't actually
resolve his problem, because Python isn't being this careful; but that
doesn't mean we shouldn't be.
2013-08-03 12:40:27 -04:00
Alvaro Herrera 706f9dd914 Fix old visibility bug in HeapTupleSatisfiesDirty
If a tuple is locked but not updated by a concurrent transaction,
HeapTupleSatisfiesDirty would return that transaction's Xid in xmax,
causing callers to wait on it, when it is not necessary (in fact, if the
other transaction had used a multixact instead of a plain Xid to mark
the tuple, HeapTupleSatisfiesDirty would have behave differently and
*not* returned the Xmax).

This bug was introduced in commit 3f7fbf85dc, dated December 1998,
so it's almost 15 years old now.  However, it's hard to see this
misbehave, because before we had NOWAIT the only consequence of this is
that transactions would wait for slightly more time than necessary; so
it's not surprising that this hasn't been reported yet.

Craig Ringer and Andres Freund
2013-08-02 17:02:36 -04:00
Alvaro Herrera 88c556680c Fix crash in error report of invalid tuple lock
My tweak of these error messages in commit c359a1b082 contained the
thinko that a query would always have rowMarks set for a query
containing a locking clause.  Not so: when declaring a cursor, for
instance, rowMarks isn't set at the point we're checking, so we'd be
dereferencing a NULL pointer.

The fix is to pass the lock strength to the function raising the error,
instead of trying to reverse-engineer it.  The result not only is more
robust, but it also seems cleaner overall.

Per report from Robert Haas.
2013-08-02 13:18:37 -04:00
Robert Haas 05ee328d66 Fix typo in comment.
Etsuro Fujita
2013-08-02 09:15:42 -04:00
Kevin Grittner f31c149f13 Improve comments for IncrementalMaintenance DML enabling functions.
Move the static functions after the comment and expand the comment.

Per complaint from Andres Freund, although using different comment
text.
2013-08-01 14:31:09 -05:00
Robert Haas 149e38e5ee Assorted bgworker-related comment fixes.
Per gripes by Amit Kapila.
2013-08-01 12:20:31 -04:00
Robert Haas 813fb03155 Remove SnapshotNow and HeapTupleSatisfiesNow.
We now use MVCC catalog scans, and, per discussion, have eliminated
all other remaining uses of SnapshotNow, so that we can now get rid of
it.  This will break third-party code which is still using it, which
is intentional, as we want such code to be updated to do things the
new way.
2013-08-01 10:46:19 -04:00
Stephen Frost ddef1a39c6 Allow a context to be passed in for error handling
As pointed out by Tom Lane, we can allow other users of the error
handler callbacks to provide their own memory context by adding
the context to use to ErrorData and using that instead of explicitly
using ErrorContext.

This then allows GetErrorContextStack() to be called from inside
exception handlers, so modify plpgsql to take advantage of that and
add an associated regression test for it.
2013-08-01 01:07:20 -04:00
Alvaro Herrera a59516b631 Fix mis-indented lines
Per Coverity
2013-07-31 17:57:15 -04:00
Tom Lane d074b4e50d Fix regexp_matches() handling of zero-length matches.
We'd find the same match twice if it was of zero length and not immediately
adjacent to the previous match.  replace_text_regexp() got similar cases
right, so adjust this search logic to match that.  Note that even though
the regexp_split_to_xxx() functions share this code, they did not display
equivalent misbehavior, because the second match would be considered
degenerate and ignored.

Jeevan Chalke, with some cosmetic changes by me.
2013-07-31 11:31:22 -04:00
Fujii Masao c876fb4241 Fix typo in comment.
Hitoshi Harada
2013-07-31 22:53:20 +09:00
Noah Misch 16f38f72ab Restore REINDEX constraint validation.
Refactoring as part of commit 8ceb245680
had the unintended effect of making REINDEX TABLE and REINDEX DATABASE
no longer validate constraints enforced by the indexes in question;
REINDEX INDEX still did so.  Indexes marked invalid remained so, and
constraint violations arising from data corruption went undetected.
Back-patch to 9.0, like the causative commit.
2013-07-30 18:36:52 -04:00
Greg Stark c62736cc37 Add SQL Standard WITH ORDINALITY support for UNNEST (and any other SRF)
Author: Andrew Gierth, David Fetter
Reviewers: Dean Rasheed, Jeevan Chalke, Stephen Frost
2013-07-29 16:38:01 +01:00
Peter Eisentraut 626092a2e1 Message style improvements 2013-07-28 07:01:13 -04:00
Tom Lane 3d13623d75 Prevent leakage of SPI tuple tables during subtransaction abort.
plpgsql often just remembers SPI-result tuple tables in local variables,
and has no mechanism for freeing them if an ereport(ERROR) causes an escape
out of the execution function whose local variable it is.  In the original
coding, that wasn't a problem because the tuple table would be cleaned up
when the function's SPI context went away during transaction abort.
However, once plpgsql grew the ability to trap exceptions, repeated
trapping of errors within a function could result in significant
intra-function-call memory leakage, as illustrated in bug #8279 from
Chad Wagner.

We could fix this locally in plpgsql with a bunch of PG_TRY/PG_CATCH
coding, but that would be tedious, probably slow, and prone to bugs of
omission; moreover it would do nothing for similar risks elsewhere.
What seems like a better plan is to make SPI itself responsible for
freeing tuple tables at subtransaction abort.  This patch attacks the
problem that way, keeping a list of live tuple tables within each SPI
function context.  Currently, such freeing is automatic for tuple tables
made within the failed subtransaction.  We might later add a SPI call to
mark a tuple table as not to be freed this way, allowing callers to opt
out; but until someone exhibits a clear use-case for such behavior, it
doesn't seem worth bothering.

A very useful side-effect of this change is that SPI_freetuptable() can
now defend itself against bad calls, such as duplicate free requests;
this should make things more robust in many places.  (In particular,
this reduces the risks involved if a third-party extension contains
now-redundant SPI_freetuptable() calls in error cleanup code.)

Even though the leakage problem is of long standing, it seems imprudent
to back-patch this into stable branches, since it does represent an API
semantics change for SPI users.  We'll patch this in 9.3, but live with
the leakage in older branches.
2013-07-25 16:46:14 -04:00
Robert Haas ed93feb808 Change currtid functions to use an MVCC snapshot, not SnapshotNow.
This has a slight performance cost, but the only known consumers
of these functions, known at the SQL level as currtid and currtid2,
is pgsql-odbc; whose usage, we hope, is not sufficiently intensive
to make this a problem.

Per discussion.
2013-07-25 16:32:02 -04:00
Robert Haas 3483f4332d Don't use SnapshotNow in get_actual_variable_range.
Instead, use the active snapshot.  Per Tom Lane, this function is
most interested in knowing the range of tuples our scan will actually
see.

This is another step towards full removal of SnapshotNow.
2013-07-25 14:30:00 -04:00
Stephen Frost 9bd0feeba8 Improvements to GetErrorContextStack()
As GetErrorContextStack() borrowed setup and tear-down code from other
places, it was less than clear that it must only be called as a
top-level entry point into the error system and can't be called by an
exception handler (unlike the rest of the error system, which is set up
to be reentrant-safe).

Being called from an exception handler is outside the charter of
GetErrorContextStack(), so add a bit more protection against it,
improve the comments addressing why we have to set up an errordata
stack for this function at all, and add a few more regression tests.

Lack of clarity pointed out by Tom Lane; all bugs are mine.
2013-07-25 09:41:55 -04:00
Stephen Frost 8312832567 Add GET DIAGNOSTICS ... PG_CONTEXT in PL/PgSQL
This adds the ability to get the call stack as a string from within a
PL/PgSQL function, which can be handy for logging to a table, or to
include in a useful message to an end-user.

Pavel Stehule, reviewed by Rushabh Lathia and rather heavily whacked
around by Stephen Frost.
2013-07-24 18:53:27 -04:00
Tom Lane fa2fad3c06 Improve ilist.h's support for deletion of slist elements during iteration.
Previously one had to use slist_delete(), implying an additional scan of
the list, making this infrastructure considerably less efficient than
traditional Lists when deletion of element(s) in a long list is needed.
Modify the slist_foreach_modify() macro to support deleting the current
element in O(1) time, by keeping a "prev" pointer in addition to "cur"
and "next".  Although this makes iteration with this macro a bit slower,
no real harm is done, since in any scenario where you're not going to
delete the current list element you might as well just use slist_foreach
instead.  Improve the comments about when to use each macro.

Back-patch to 9.3 so that we'll have consistent semantics in all branches
that provide ilist.h.  Note this is an ABI break for callers of
slist_foreach_modify().

Andres Freund and Tom Lane
2013-07-24 17:42:34 -04:00
Tom Lane b32a25c3d5 Fix booltestsel() for case where we have NULL stats but not MCV stats.
In a boolean column that contains mostly nulls, ANALYZE might not find
enough non-null values to populate the most-common-values stats,
but it would still create a pg_statistic entry with stanullfrac set.
The logic in booltestsel() for this situation did the wrong thing for
"col IS NOT TRUE" and "col IS NOT FALSE" tests, forgetting that null
values would satisfy these tests (so that the true selectivity would
be close to one, not close to zero).  Per bug #8274.

Fix by Andrew Gierth, some comment-smithing by me.
2013-07-24 00:44:09 -04:00
Tom Lane 10a509d829 Move strip_implicit_coercions() from optimizer to nodeFuncs.c.
Use of this function has spread into the parser and rewriter, so it seems
like time to pull it out of the optimizer and put it into the more central
nodeFuncs module.  This eliminates the need to #include optimizer/clauses.h
in most of the calling files, demonstrating that this function was indeed a
bit outside the normal code reference patterns.
2013-07-23 18:21:19 -04:00
Tom Lane ef655663c5 Further hacking on ruleutils' new column-alias-assignment code.
After further thought about implicit coercions appearing in a joinaliasvars
list, I realized that they represent an additional reason why we might need
to reference the join output column directly instead of referencing an
underlying column.  Consider SELECT x FROM t1 LEFT JOIN t2 USING (x) where
t1.x is of type date while t2.x is of type timestamptz.  The merged output
variable is of type timestamptz, but it won't go to null when t2 does,
therefore neither t1.x nor t2.x is a valid substitute reference.

The code in get_variable() actually gets this case right, since it knows
it shouldn't look through a coercion, but we failed to ensure that the
unqualified output column name would be globally unique.  To fix, modify
the code that trawls for a dangerous situation so that it actually scans
through an unnamed join's joinaliasvars list to see if there are any
non-simple-Var entries.
2013-07-23 17:55:04 -04:00
Tom Lane a7cd853b75 Change post-rewriter representation of dropped columns in joinaliasvars.
It's possible to drop a column from an input table of a JOIN clause in a
view, if that column is nowhere actually referenced in the view.  But it
will still be there in the JOIN clause's joinaliasvars list.  We used to
replace such entries with NULL Const nodes, which is handy for generation
of RowExpr expansion of a whole-row reference to the view.  The trouble
with that is that it can't be distinguished from the situation after
subquery pull-up of a constant subquery output expression below the JOIN.
Instead, replace such joinaliasvars with null pointers (empty expression
trees), which can't be confused with pulled-up expressions.  expandRTE()
still emits the old convention, though, for convenience of RowExpr
generation and to reduce the risk of breaking extension code.

In HEAD and 9.3, this patch also fixes a problem with some new code in
ruleutils.c that was failing to cope with implicitly-casted joinaliasvars
entries, as per recent report from Feike Steenbergen.  That oversight was
because of an inadequate description of the data structure in parsenodes.h,
which I've now corrected.  There were some pre-existing oversights of the
same ilk elsewhere, which I believe are now all fixed.
2013-07-23 16:23:45 -04:00
Alvaro Herrera c359a1b082 Tweak FOR UPDATE/SHARE error message wording (again)
In commit 0ac5ad5134 I changed some error messages from "FOR
UPDATE/SHARE" to a rather long gobbledygook which nobody liked.  Then,
in commit cb9b66d31 I changed them again, but the alternative chosen
there was deemed suboptimal by Peter Eisentraut, who in message
1373937980.20441.8.camel@vanquo.pezone.net proposed an alternative
involving a dynamically-constructed string based on the actual locking
strength specified in the SQL command.  This patch implements that
suggestion.
2013-07-23 14:03:09 -04:00
Robert Haas 765ad89be3 Use InvalidSnapshot, now SnapshotNow, as the default snapshot.
As far as I can determine, there's no code in the core distribution
that fails to explicitly set the snapshot of a scan or executor
state.  If there is any such code, this will probably cause it to
seg fault; friendlier suggestions were discussed on pgsql-hackers,
but there was no consensus that anything more than this was
needed.

This is another step towards the hoped-for complete removal of
SnapshotNow.
2013-07-23 10:58:32 -04:00
Robert Haas 21e28e4531 Fix cache flush hazard in ExecRefreshMatView.
Andres Freund
2013-07-22 18:10:05 -04:00
Robert Haas f40a318eea Remove bgw_sighup and bgw_sigterm.
Per discussion on pgsql-hackers, these aren't really needed.  Interim
versions of the background worker patch had the worker starting with
signals already unblocked, which would have made this necessary.
But the final version does not, so we don't really need it; and it
doesn't work well with the new facility for starting dynamic background
workers, so just rip it out.

Also per discussion on pgsql-hackers, back-patch this change to 9.3.
It's best to get the API break out of the way before we do an
official release of this facility, to avoid more pain for extension
authors later.
2013-07-22 14:13:00 -04:00
Robert Haas 0518eceec3 Adjust HeapTupleSatisfies* routines to take a HeapTuple.
Previously, these functions took a HeapTupleHeader, but upcoming
patches for logical replication will introduce new a new snapshot
type under which the tuple's TID will be used to lookup (CMIN, CMAX)
for visibility determination purposes.  This makes that information
available.  Code churn is minimal since HeapTupleSatisfiesVisibility
took the HeapTuple anyway, and deferenced it before calling the
satisfies function.

Independently of logical replication, this allows t_tableOid and
t_self to be cross-checked via assertions in tqual.c.  This seems
like a useful way to make sure that all callers are setting these
values properly, which has been previously put forward as
desirable.

Andres Freund, reviewed by Álvaro Herrera
2013-07-22 13:38:44 -04:00
Alvaro Herrera 0aeb5ae204 Silence compiler warning on an unused variable
Also, tweak wording in comments (per Andres) and documentation (myself)
to point out that it's the database's default tablespace that can be
passed as 0, not DEFAULTTABLESPACE_OID.  Robert Haas noticed the bug in
the code, but didn't update the accompanying prose.
2013-07-22 13:15:13 -04:00
Robert Haas f01d1ae3a1 Add infrastructure for mapping relfilenodes to relation OIDs.
Future patches are expected to introduce logical replication that
works by decoding WAL.  WAL contains relfilenodes rather than relation
OIDs, so this infrastructure will be needed to find the relation OID
based on WAL contents.

If logical replication does not make it into this release, we probably
should consider reverting this, since it will add some overhead to DDL
operations that create new relations.  One additional index insert per
pg_class row is not a large overhead, but it's more than zero.
Another way of meeting the needs of logical replication would be to
the relation OID to WAL, but that would burden DML operations, not
only DDL.

Andres Freund, with some changes by me.  Design review, in earlier
versions, by Álvaro Herrera.
2013-07-22 11:09:10 -04:00
Peter Eisentraut ff41a5de09 Clean up new JSON API typedefs
The new JSON API uses a bit of an unusual typedef scheme, where for
example OkeysState is a pointer to okeysState.  And that's not applied
consistently either.  Change that to the more usual PostgreSQL style
where struct typedefs are upper case, and use pointers explicitly.
2013-07-20 06:38:31 -04:00
Alvaro Herrera 6737aa72ba Fix HeapTupleSatisfiesVacuum on aborted updater xacts
By using only the macro that checks infomask bits
HEAP_XMAX_IS_LOCKED_ONLY to verify whether a multixact is not an
updater, and not the full HeapTupleHeaderIsOnlyLocked, it would come to
the wrong result in case of a multixact containing an aborted update;
therefore returning the wrong result code.  This would cause predicate.c
to break completely (as in bug report #8273 from David Leverton), and
certain index builds would misbehave.  As far as I can tell, other
callers of the bogus routine would make harmless mistakes or not be
affected by the difference at all; so this was a pretty narrow case.

Also, no other user of the HEAP_XMAX_IS_LOCKED_ONLY macro is as
careless; they all check specifically for the HEAP_XMAX_IS_MULTI case,
and they all verify whether the updater is InvalidXid before concluding
that it's a valid updater.  So there doesn't seem to be any similar bug.
2013-07-19 18:47:37 -04:00
Tom Lane d9f37e6661 Add checks for valid multibyte character length in UtfToLocal, LocalToUtf.
This is mainly to suppress "uninitialized variable" warnings from very
recent versions of gcc.  But it seems like a good robustness thing anyway,
not to mention that we might someday decide to support 6-byte UTF8.

Per report from Karol Trzcionka.  No back-patch since there's no reason
at the moment to think this is more than cosmetic.
2013-07-18 21:55:38 -04:00
Tom Lane e2bd904955 Fix regex match failures for backrefs combined with non-greedy quantifiers.
An ancient logic error in cfindloop() could cause the regex engine to fail
to find matches that begin later than the start of the string.  This
function is only used when the regex pattern contains a back reference,
and so far as we can tell the error is only reachable if the pattern is
non-greedy (i.e. its first quantifier uses the ? modifier).  Furthermore,
the actual match must begin after some potential match that satisfies the
DFA but then fails the back-reference's match test.

Reported and fixed by Jeevan Chalke, with cosmetic adjustments by me.
2013-07-18 21:22:37 -04:00
Stephen Frost 4cbe3ac3e8 WITH CHECK OPTION support for auto-updatable VIEWs
For simple views which are automatically updatable, this patch allows
the user to specify what level of checking should be done on records
being inserted or updated.  For 'LOCAL CHECK', new tuples are validated
against the conditionals of the view they are being inserted into, while
for 'CASCADED CHECK' the new tuples are validated against the
conditionals for all views involved (from the top down).

This option is part of the SQL specification.

Dean Rasheed, reviewed by Pavel Stehule
2013-07-18 17:10:16 -04:00
Andrew Dunstan d26888bc4d Move checking an explicit VARIADIC "any" argument into the parser.
This is more efficient and simpler . It does mean that an untyped NULL
can no longer be used in such cases, which should be mentioned in
Release Notes, but doesn't seem a terrible loss. The workaround is to
cast the NULL to some array type.

Pavel Stehule, reviewed by Jeevan Chalke.
2013-07-18 11:52:12 -04:00
Tom Lane 405a468b02 Fix direct access to Relation->rd_indpred.
Should use RelationGetIndexPredicate(), since rd_indpred is just a cache
that is not computed until/unless demanded.  Per buildfarm failure on
CLOBBER_CACHE_ALWAYS animals; diagnosis and fix by Hitoshi Harada.
2013-07-18 01:02:18 -04:00
Heikki Linnakangas 107cbc90a7 Fix variable names mentioned in comment to match the code.
Also, in another comment, explain why holding an insertion slot is a
critical section.

Per review by Amit Kapila.
2013-07-17 23:32:32 +03:00
Heikki Linnakangas 59c02a36f0 Fix assert failure at end of recovery, broken by XLogInsert scaling patch.
Initialization of the first XLOG buffer at end-of-recovery was broken for
the case that the last read WAL record ended at a page boundary. Instead of
trying to copy the last full xlog page to the buffer cache in that case,
just set shared state so that the next page is initialized when the first
WAL record after startup is inserted. (that's what we did in earlier
version, too)

To make the shared state required for that case less surprising, replace the
XLogCtl->curridx variable, which was the index of the latest initialized
buffer, with an XLogRecPtr of how far the buffers have been initialized.
That also allows us to get rid of the XLogRecEndPtrToBufIdx macro.

While we're at it, make a similar change for XLogCtl->Write.curridx, getting
rid of that variable and calculating the next buffer to write from
XLogCtl->LogwrtResult instead.
2013-07-17 23:12:22 +03:00
Heikki Linnakangas 3f2adace1e Fix end-of-loop optimization in pglz_find_match() function.
After the recent pglz optimization patch, the next/prev pointers in the
hash table are never NULL, INVALID_ENTRY_PTR is used to represent invalid
entries instead. The end-of-loop check in pglz_find_match() function didn't
get the memo. The result was the same from a correctness point of view, but
because the NULL-check would never fail, the tiny optimization turned into
a pessimization.

Reported by Stephen Frost, using Coverity scanner.
2013-07-17 20:37:09 +03:00
Noah Misch ffcf654547 Fix systable_recheck_tuple() for MVCC scan snapshots.
Since this function assumed non-MVCC snapshots, it broke when commit
568d4138c6 switched its one caller from
SnapshotNow scans to MVCC-snapshot scans.

Reviewed by Robert Haas, Tom Lane and Andres Freund.
2013-07-16 20:16:32 -04:00
Noah Misch b560ec1b0d Implement the FILTER clause for aggregate function calls.
This is SQL-standard with a few extensions, namely support for
subqueries and outer references in clause expressions.

catversion bump due to change in Aggref and WindowFunc.

David Fetter, reviewed by Dean Rasheed.
2013-07-16 20:15:36 -04:00
Noah Misch 7a8e9f298e Comment on why planagg.c punts "MIN(x ORDER BY y)". 2013-07-16 20:14:37 -04:00
Kevin Grittner cc1965a99b Add support for REFRESH MATERIALIZED VIEW CONCURRENTLY.
This allows reads to continue without any blocking while a REFRESH
runs.  The new data appears atomically as part of transaction
commit.

Review questioned the Assert that a matview was not a system
relation.  This will be addressed separately.

Reviewed by Hitoshi Harada, Robert Haas, Andres Freund.
Merged after review with security patch f3ab5d4.
2013-07-16 12:55:44 -05:00
Robert Haas 7f7485a0cd Allow background workers to be started dynamically.
There is a new API, RegisterDynamicBackgroundWorker, which allows
an ordinary user backend to register a new background writer during
normal running.  This means that it's no longer necessary for all
background workers to be registered during processing of
shared_preload_libraries, although the option of registering workers
at that time remains available.

When a background worker exits and will not be restarted, the
slot previously used by that background worker is automatically
released and becomes available for reuse.  Slots used by background
workers that are configured for automatic restart can't (yet) be
released without shutting down the system.

This commit adds a new source file, bgworker.c, and moves some
of the existing control logic for background workers there.
Previously, there was little enough logic that it made sense to
keep everything in postmaster.c, but not any more.

This commit also makes the worker_spi contrib module into an
extension and adds a new function, worker_spi_launch, which can
be used to demonstrate the new facility.
2013-07-16 13:02:15 -04:00
Stephen Frost 4ed22e891f Check get_tle_by_resno() result before deref
When creating a sort to support a group by, we need to look up the
target entry in the target list by the resno using get_tle_by_resno().
This particular code-path didn't check the result prior to attempting
to dereference it, while all other callers did.  While I can't see a
way for this usage of get_tle_by_resno() to fail (you can't ask for
a column to be sorted on which isn't included in the group by), it's
probably best to check that we didn't end up with a NULL somehow
anyway than risk the segfault.

I'm willing to back-patch this if others feel it's necessary, but my
guess is new features are what might tickle this rather than anything
existing.

Missing check spotted by the Coverity scanner.
2013-07-15 15:04:19 -04:00
Robert Haas 42c80c696e Assert that syscache lookups don't happen outside transactions.
Andres Freund
2013-07-15 13:31:36 -04:00
Stephen Frost 273dcd1628 Ensure 64bit arithmetic when calculating tapeSpace
In tuplesort.c:inittapes(), we calculate tapeSpace by first figuring
out how many 'tapes' we can use (maxTapes) and then multiplying the
result by the tape buffer overhead for each.  Unfortunately, when
we are on a system with an 8-byte long, we allow work_mem to be
larger than 2GB and that allows maxTapes to be large enough that the
32bit arithmetic can overflow when multiplied against the buffer
overhead.

When this overflow happens, we end up adding the overflow to the
amount of space available, causing the amount of memory allocated to
be larger than work_mem.

Note that to reach this point, you have to set work mem to at least
24GB and be sorting a set which is at least that size.  Given that a
user who can set work_mem to 24GB could also set it even higher, if
they were looking to run the system out of memory, this isn't
considered a security issue.

This overflow risk was found by the Coverity scanner.

Back-patch to all supported branches, as this issue has existed
since before 8.4.
2013-07-14 16:26:16 -04:00
Peter Eisentraut 070518ddab Add session_preload_libraries configuration parameter
This is like shared_preload_libraries except that it takes effect at
backend start and can be changed without a full postmaster restart.  It
is like local_preload_libraries except that it is still only settable by
a superuser.  This can be a better way to load modules such as
auto_explain.

Since there are now three preload parameters, regroup the documentation
a bit.  Put all parameters into one section, explain common
functionality only once, update the descriptions to reflect current and
future realities.

Reviewed-by: Dimitri Fontaine <dimitri@2ndQuadrant.fr>
2013-07-12 21:23:50 -04:00
Noah Misch f3ab5d4696 Switch user ID to the object owner when populating a materialized view.
This makes superuser-issued REFRESH MATERIALIZED VIEW safe regardless of
the object's provenance.  REINDEX is an earlier example of this pattern.
As a downside, functions called from materialized views must tolerate
running in a security-restricted operation.  CREATE MATERIALIZED VIEW
need not change user ID.  Nonetheless, avoid creation of materialized
views that will invariably fail REFRESH by making it, too, start a
security-restricted operation.

Back-patch to 9.3 so materialized views have this from the beginning.

Reviewed by Kevin Grittner.
2013-07-12 18:21:22 -04:00
Noah Misch 448fee2e23 Make comments reflect that omission of SPI_gettypmod() is intentional. 2013-07-12 18:07:46 -04:00
Peter Eisentraut 8dead08c54 Fix lack of message pluralization 2013-07-09 20:49:44 -04:00
Peter Eisentraut 7888c61238 Fix bool abuse
path_encode's "closed" argument used to take three values: TRUE, FALSE,
or -1, while being of type bool.  Replace that with a three-valued enum
for more clarity.
2013-07-08 22:42:39 -04:00
Heikki Linnakangas f489470f8a Fix Windows build.
Was broken by my xloginsert scaling patch. XLogCtl global variable needs
to be initialized in each process, as it's not inherited by fork() on
Windows.
2013-07-08 17:28:48 +03:00
Heikki Linnakangas 9a20a9b21b Improve scalability of WAL insertions.
This patch replaces WALInsertLock with a number of WAL insertion slots,
allowing multiple backends to insert WAL records to the WAL buffers
concurrently. This is particularly useful for parallel loading large amounts
of data on a system with many CPUs.

This has one user-visible change: switching to a new WAL segment with
pg_switch_xlog() now fills the remaining unused portion of the segment with
zeros. This potentially adds some overhead, but it has been a very common
practice by DBA's to clear the "tail" of the segment with an external
pg_clearxlogtail utility anyway, to make the WAL files compress better.
With this patch, it's no longer necessary to do that.

This patch adds a new GUC, xloginsert_slots, to tune the number of WAL
insertion slots. Performance testing suggests that the default, 8, works
pretty well for all kinds of worklods, but I left the GUC in place to allow
others with different hardware to test that easily. We might want to remove
that before release.

Reviewed by Andres Freund.
2013-07-08 11:23:56 +03:00
Tom Lane 5372275b4b Fix planning of parameterized appendrel paths with expensive join quals.
The code in set_append_rel_pathlist() for building parameterized paths
for append relations (inheritance and UNION ALL combinations) supposed
that the cheapest regular path for a child relation would still be cheapest
when reparameterized.  Which might not be the case, particularly if the
added join conditions are expensive to compute, as in a recent example from
Jeff Janes.  Fix it to compare child path costs *after* reparameterizing.
We can short-circuit that if the cheapest pre-existing path is already
parameterized correctly, which seems likely to be true often enough to be
worth checking for.

Back-patch to 9.2 where parameterized paths were introduced.
2013-07-07 22:37:24 -04:00
Jeff Davis 5b571bb8c8 Handle posix_fallocate() errors.
On some platforms, posix_fallocate() is available but may still return
EINVAL if the underlying filesystem does not support it.  So, in case
of an error, fall through to the alternate implementation that just
writes zeros.

Per buildfarm failure and analysis by Tom Lane.
2013-07-06 13:46:04 -07:00
Noah Misch 02d2b694ee Update messages, comments and documentation for materialized views.
All instances of the verbiage lagging the code.  Back-patch to 9.3,
where materialized views were introduced.
2013-07-05 15:37:51 -04:00
Jeff Davis 269e780822 Use posix_fallocate() for new WAL files, where available.
This function is more efficient than actually writing out zeroes to
the new file, per microbenchmarks by Jon Nelson. Also, it may reduce
the likelihood of WAL file fragmentation.

Jon Nelson, with review by Andres Freund, Greg Smith and me.
2013-07-05 12:30:29 -07:00
Magnus Hagander c87ff71f37 Expose the estimation of number of changed tuples since last analyze
This value, now pg_stat_all_tables.n_mod_since_analyze, was already
tracked and used by autovacuum, but not exposed to the user.

Mark Kirkwood, review by Laurenz Albe
2013-07-05 15:10:15 +02:00
Noah Misch 79e0f87a15 Use type "int64" for memory accounting in tuplesort.c/tuplestore.c.
Commit 263865a489 switched tuplesort.c and
tuplestore.c variables representing memory usage from type "long" to
type "Size".  This was unnecessary; I thought doing so avoided overflow
scenarios on 64-bit Windows, but guc.c already limited work_mem so as to
prevent the overflow.  It was also incomplete, not touching the logic
that assumed a signed data type.  Change the affected variables to
"int64".  This is perfect for 64-bit platforms, and it reduces the need
to contemplate platform-specific overflow scenarios.  It also puts us
close to being able to support work_mem over 2 GiB on 64-bit Windows.

Per report from Andres Freund.
2013-07-04 23:13:54 -04:00
Fujii Masao 7842d41df5 Fix typo in comment.
Michael Paquier
2013-07-05 02:47:49 +09:00
Robert Haas 6bc8ef0b7f Add new GUC, max_worker_processes, limiting number of bgworkers.
In 9.3, there's no particular limit on the number of bgworkers;
instead, we just count up the number that are actually registered,
and use that to set MaxBackends.  However, that approach causes
problems for Hot Standby, which needs both MaxBackends and the
size of the lock table to be the same on the standby as on the
master, yet it may not be desirable to run the same bgworkers in
both places.  9.3 handles that by failing to notice the problem,
which will probably work fine in nearly all cases anyway, but is
not theoretically sound.

A further problem with simply counting the number of registered
workers is that new workers can't be registered without a
postmaster restart.  This is inconvenient for administrators,
since bouncing the postmaster causes an interruption of service.
Moreover, there are a number of applications for background
processes where, by necessity, the background process must be
started on the fly (e.g. parallel query).  While this patch
doesn't actually make it possible to register new background
workers after startup time, it's a necessary prerequisite.

Patch by me.  Review by Michael Paquier.
2013-07-04 11:24:24 -04:00
Fujii Masao 2ef085d0e6 Get rid of pg_class.reltoastidxid.
Treat TOAST index just the same as normal one and get the OID
of TOAST index from pg_index but not pg_class.reltoastidxid.
This change allows us to handle multiple TOAST indexes, and
which is required infrastructure for upcoming
REINDEX CONCURRENTLY feature.

Patch by Michael Paquier, reviewed by Andres Freund and me.
2013-07-04 03:24:09 +09:00
Tom Lane 5530a82643 Fix handling of auto-updatable views on inherited tables.
An INSERT into such a view should work just like an INSERT into its base
table, ie the insertion should go directly into that table ... not be
duplicated into each child table, as was happening before, per bug #8275
from Rushabh Lathia.  On the other hand, the current behavior for
UPDATE/DELETE seems reasonable: the update/delete traverses the child
tables, or not, depending on whether the view specifies ONLY or not.
Add some regression tests covering this area.

Dean Rasheed
2013-07-03 12:26:52 -04:00
Alvaro Herrera 620935ad08 Unbreak postmaster restart-after-crash sequence
In patch 82233ce7ea, AbortStartTime wasn't being reset appropriately
after the restart sequence, causing subsequent iterations through
ServerLoop to malfunction.
2013-07-03 11:08:52 -04:00
Robert Haas 3682025015 Add support for multiple kinds of external toast datums.
To that end, support tags rather than lengths for external datums.
As an example of how this can be used, add support or "indirect"
tuples which point to some externally allocated memory containing
a toast tuple.  Similar infrastructure could be used for other
purposes, including, perhaps, support for alternative compression
algorithms.

Andres Freund, reviewed by Hitoshi Harada and myself
2013-07-02 13:38:55 -04:00
Robert Haas 568d4138c6 Use an MVCC snapshot, rather than SnapshotNow, for catalog scans.
SnapshotNow scans have the undesirable property that, in the face of
concurrent updates, the scan can fail to see either the old or the new
versions of the row.  In many cases, we work around this by requiring
DDL operations to hold AccessExclusiveLock on the object being
modified; in some cases, the existing locking is inadequate and random
failures occur as a result.  This commit doesn't change anything
related to locking, but will hopefully pave the way to allowing lock
strength reductions in the future.

The major issue has held us back from making this change in the past
is that taking an MVCC snapshot is significantly more expensive than
using a static special snapshot such as SnapshotNow.  However, testing
of various worst-case scenarios reveals that this problem is not
severe except under fairly extreme workloads.  To mitigate those
problems, we avoid retaking the MVCC snapshot for each new scan;
instead, we take a new snapshot only when invalidation messages have
been processed.  The catcache machinery already requires that
invalidation messages be sent before releasing the related heavyweight
lock; else other backends might rely on locally-cached data rather
than scanning the catalog at all.  Thus, making snapshot reuse
dependent on the same guarantees shouldn't break anything that wasn't
already subtly broken.

Patch by me.  Review by Michael Paquier and Andres Freund.
2013-07-02 09:47:01 -04:00
Robert Haas 0d22987ae9 Add a convenience routine makeFuncCall to reduce duplication.
David Fetter and Andrew Gierth, reviewed by Jeevan Chalke
2013-07-01 14:46:54 -04:00
Bruce Momjian 7408c5d29b Add timezone offset output option to to_char()
Add ability for to_char() to output the timezone's UTC offset (OF).  We
already have the ability to return the timezone abbeviation (TZ/tz).
Per request from Andrew Dunstan
2013-07-01 13:40:32 -04:00
Heikki Linnakangas 031cc55bbe Optimize pglz compressor for small inputs.
The pglz compressor has a significant startup cost, because it has to
initialize to zeros the history-tracking hash table. On a 64-bit system, the
hash table was 64kB in size. While clearing memory is pretty fast, for very
short inputs the relative cost of that was quite large.

This patch alleviates that in two ways. First, instead of storing pointers
in the hash table, store 16-bit indexes into the hist_entries array. That
slashes the size of the hash table to 1/2 or 1/4 of the original, depending
on the pointer width. Secondly, adjust the size of the hash table based on
input size. For very small inputs, you don't need a large hash table to
avoid collisions.

Review by Amit Kapila.
2013-07-01 11:00:14 +03:00
Heikki Linnakangas 79ce29c734 Retry short writes when flushing WAL.
We don't normally bother retrying when the number of bytes written by
write() is short of what was requested. It is generally assumed that a
write() to disk doesn't return short, unless you run out of disk space.
While writing the WAL, however, it seems prudent to try a bit harder,
because a failure leads to PANIC. The write() is also much larger than most
write()s in the backend (up to wal_buffers), so there's more room for
surprises.

Also retry on EINTR. All signals used in the backend are flagged SA_RESTART
nowadays, so it shouldn't happen, but better to be defensive.
2013-07-01 09:36:00 +03:00
Heikki Linnakangas ee6556555b Inline ginCompareItemPointers function for speed.
ginCompareItemPointers function is called heavily in gin index scans -
inlining it speeds up some kind of queries a lot.
2013-06-29 12:55:34 +03:00
Simon Riggs d51b271059 Change errcode for lock_timeout to match NOWAIT
Set errcode to ERRCODE_LOCK_NOT_AVAILABLE

Zoltán Bsöszörményi
2013-06-29 00:57:25 +01:00
Simon Riggs f177cbfe67 ALTER TABLE ... ALTER CONSTRAINT for FKs
Allow constraint attributes to be altered,
so the default setting of NOT DEFERRABLE
can be altered to DEFERRABLE and back.

Review by Abhijit Menon-Sen
2013-06-29 00:27:30 +01:00
Simon Riggs 2f74e4ec50 Assert that ALTER TABLE subcommands have pass set 2013-06-29 00:26:46 +01:00
Alvaro Herrera 82233ce7ea Send SIGKILL to children if they don't die quickly in immediate shutdown
On immediate shutdown, or during a restart-after-crash sequence,
postmaster used to send SIGQUIT (and then abandon ship if shutdown); but
this is not a good strategy if backends don't die because of that
signal.  (This might happen, for example, if a backend gets tangled
trying to malloc() due to gettext(), as in an example illustrated by
MauMau.)  This causes problems when later trying to restart the server,
because some processes are still attached to the shared memory segment.

Instead of just abandoning such backends to their fates, we now have
postmaster hang around for a little while longer, send a SIGKILL after
some reasonable waiting period, and then exit.  This makes immediate
shutdown more reliable.

There is disagreement on whether it's best for postmaster to exit after
sending SIGKILL, or to stick around until all children have reported
death.  If this controversy is resolved differently than what this patch
implements, it's an easy change to make.

Bug reported by MauMau in message 20DAEA8949EC4E2289C6E8E58560DEC0@maumau

MauMau and Álvaro Herrera
2013-06-28 17:49:46 -04:00
Robert Haas 5893ffa79c Make the OVER keyword unreserved.
This results in a slightly less specific error message when OVER
is used in a context where we don't accept window functions, but
per discussion, it's worth it to get the benefit of not needing
to reserve this keyword any more.  This same refactoring will
also let us avoid reserving some other keywords that we expect
to add in upcoming patches (specifically, IGNORE, RESPECT, and
FILTER).

Troels Nielsen, with minor changes by me
2013-06-28 11:11:00 -04:00
Heikki Linnakangas 9e0bc7c1e8 Track spinlock delay in microsecond granularity.
On many platforms the OS will round the sleep time to millisecond
resolution, but there is no reason for us to pre-emptively round the
argument to pg_usleep.

When the delay was measured in milliseconds and started from 1 ms, it
sometimes took many attempts until the logic that increases the delay by
multiplying with a random value between 1 and 2 actually managed to bump it
from 1 ms to 2 ms. That lead to a sequence of 1 ms waits until the delay
started to increase. This wasn't really a problem but it looked odd if you
observed the waits. There is no measurable difference in performance, but
it's more readable this way.

Jeff Janes
2013-06-28 12:39:55 +03:00
Noah Misch 263865a489 Permit super-MaxAllocSize allocations with MemoryContextAllocHuge().
The MaxAllocSize guard is convenient for most callers, because it
reduces the need for careful attention to overflow, data type selection,
and the SET_VARSIZE() limit.  A handful of callers are happy to navigate
those hazards in exchange for the ability to allocate a larger chunk.
Introduce MemoryContextAllocHuge() and repalloc_huge().  Use this in
tuplesort.c and tuplestore.c, enabling internal sorts of up to INT_MAX
tuples, a factor-of-48 increase.  In particular, B-tree index builds can
now benefit from much-larger maintenance_work_mem settings.

Reviewed by Stephen Frost, Simon Riggs and Jeff Janes.
2013-06-27 14:53:57 -04:00
Noah Misch 19085116ee Cooperate with the Valgrind instrumentation framework.
Valgrind "client requests" in aset.c and mcxt.c teach Valgrind and its
Memcheck tool about the PostgreSQL allocator.  This makes Valgrind
roughly as sensitive to memory errors involving palloc chunks as it is
to memory errors involving malloc chunks.  Further client requests in
PageAddItem() and printtup() verify that all bits being added to a
buffer page or furnished to an output function are predictably-defined.
Those tests catch failures of C-language functions to fully initialize
the bits of a Datum, which in turn stymie optimizations that rely on
_equalConst().  Define the USE_VALGRIND symbol in pg_config_manual.h to
enable these additions.  An included "suppression file" silences nominal
errors we don't plan to fix.

Reviewed in earlier versions by Peter Geoghegan and Korry Douglas.
2013-06-26 20:22:25 -04:00
Noah Misch a855148a29 Refactor aset.c and mcxt.c in preparation for Valgrind cooperation.
Move some repeated debugging code into functions and store intermediates
in variables where not presently necessary.  No code-generation changes
in a production build, and no functional changes.  This simplifies and
focuses the main patch.
2013-06-26 19:56:03 -04:00
Noah Misch 1d96bb9602 Initialize pad bytes in GinFormTuple().
Every other core buffer page consumer initializes the bytes it furnishes
to PageAddItem().  For consistency, do the same here.  No back-patch;
regardless, we couldn't count on the fix so long as binary upgrade can
carry forward affected index builds.
2013-06-26 19:55:15 -04:00
Noah Misch 5f538ad004 Renovate display of non-ASCII messages on Windows.
GNU gettext selects a default encoding for the messages it emits in a
platform-specific manner; it uses the Windows ANSI code page on Windows
and follows LC_CTYPE on other platforms.  This is inconvenient for
PostgreSQL server processes, so realize consistent cross-platform
behavior by calling bind_textdomain_codeset() on Windows each time we
permanently change LC_CTYPE.  This primarily affects SQL_ASCII databases
and processes like the postmaster that do not attach to a database,
making their behavior consistent with PostgreSQL on non-Windows
platforms.  Messages from SQL_ASCII databases use the encoding implied
by the database LC_CTYPE, and messages from non-database processes use
LC_CTYPE from the postmaster system environment.  PlatformEncoding
becomes unused, so remove it.

Make write_console() prefer WriteConsoleW() to write() regardless of the
encodings in use.  In this situation, write() will invariably mishandle
non-ASCII characters.

elog.c has assumed that messages conform to the database encoding.
While usually true, this does not hold for SQL_ASCII and MULE_INTERNAL.
Introduce MessageEncoding to track the actual encoding of message text.
The present consumers are Windows-specific code for converting messages
to UTF16 for use in system interfaces.  This fixes the appearance in
Windows event logs and consoles of translated messages from SQL_ASCII
processes like the postmaster.  Note that SQL_ASCII inherently disclaims
a strong notion of encoding, so non-ASCII byte sequences interpolated
into messages by %s may yet yield a nonsensical message.  MULE_INTERNAL
has similar problems at present, albeit for a different reason: its lack
of libiconv support or a conversion to UTF8.

Consequently, one need no longer restart Windows with a different
Windows ANSI code page to broadly test backend logging under a given
language.  Changing the user's locale ("Format") is enough.  Several
accounts can simultaneously run postmasters under different locales, all
correctly logging localized messages to Windows event logs and consoles.

Alexander Law and Noah Misch
2013-06-26 11:17:33 -04:00
Alvaro Herrera 4ca50e0710 Avoid inconsistent type declaration
Clang 3.3 correctly complains that a variable of type enum
MultiXactStatus cannot hold a value of -1, which makes sense.  Change
the declared type of the variable to int instead, and apply casting as
necessary to avoid the warning.

Per notice from Andres Freund
2013-06-25 16:41:47 -04:00
Fujii Masao 985bd7d497 Support clean switchover.
In replication, when we shutdown the master, walsender tries to send
all the outstanding WAL records to the standby, and then to exit. This
basically means that all the WAL records are fully synced between
two servers after the clean shutdown of the master. So, after
promoting the standby to new master, we can restart the stopped
master as new standby without the need for a fresh backup from
new master.

But there was one problem so far: though walsender tries to send all
the outstanding WAL records, it doesn't wait for them to be replicated
to the standby. Then, before receiving all the WAL records,
walreceiver can detect the closure of connection and exit. We cannot
guarantee that there is no missing WAL in the standby after clean
shutdown of the master. In this case, backup from new master is
required when restarting the stopped master as new standby.

This patch fixes this problem. It just changes walsender so that it
waits for all the outstanding WAL records to be replicated to the
standby before closing the replication connection.

Per discussion, this is a fix that needs to get backpatched rather than
new feature. So, back-patch to 9.1 where enough infrastructure for
this exists.

Patch by me, reviewed by Andres Freund.
2013-06-26 02:14:37 +09:00
Simon Riggs 4f14c86d74 Reverting previous commit, pending investigation
of sporadic seg faults from various build farm members.
2013-06-24 21:21:18 +01:00
Simon Riggs b577a57d41 ALTER TABLE ... ALTER CONSTRAINT for FKs
Allow constraint attributes to be altered,
so the default setting of NOT DEFERRABLE
can be altered to DEFERRABLE and back.

Review by Abhijit Menon-Sen
2013-06-24 20:07:41 +01:00
Peter Eisentraut ce18b01159 Translation updates 2013-06-24 14:16:44 -04:00
Simon Riggs 1f09121b4e Ensure no xid gaps during Hot Standby startup
In some cases with higher numbers of subtransactions
it was possible for us to incorrectly initialize
subtrans leading to complaints of missing pages.

Bug report by Sergey Konoplev
Analysis and fix by Andres Freund
2013-06-23 11:05:02 +01:00
Peter Eisentraut 7dfd5cd21c Clarify terminology standalone backend vs. single-user mode
Most of the documentation uses "single-user mode", so use that in the
code as well.  Adjust the documentation to match the new error message
wording.  Also add a documentation index entry for "single-user mode".

Based-on-patch-by: Jeff Janes <jeff.janes@gmail.com>
2013-06-20 23:03:18 -04:00
Fujii Masao bab54e383d Support TB (terabyte) memory unit in GUC variables.
Patch by Simon Riggs, reviewed by Jeff Janes and me.
2013-06-20 08:17:14 +09:00
Jeff Davis b8fd1a09f3 Add buffer_std flag to MarkBufferDirtyHint().
MarkBufferDirtyHint() writes WAL, and should know if it's got a
standard buffer or not. Currently, the only callers where buffer_std
is false are related to the FSM.

In passing, rename XLOG_HINT to XLOG_FPI, which is more descriptive.

Back-patch to 9.3.
2013-06-17 08:02:12 -07:00
Tom Lane a64ca63e59 Use WaitLatch, not pg_usleep, for delaying in pg_sleep().
This avoids platform-dependent behavior wherein pg_sleep() might fail to be
interrupted by statement timeout, query cancel, SIGTERM, etc.  Also, since
there's no reason to wake up once a second any more, we can reduce the
power consumption of a sleeping backend a tad.

Back-patch to 9.3, since use of SA_RESTART for SIGALRM makes this a bigger
issue than it used to be.
2013-06-15 16:23:24 -04:00
Tom Lane 873ab97219 Use SA_RESTART for all signals, including SIGALRM.
The exclusion of SIGALRM dates back to Berkeley days, when Postgres used
SIGALRM in only one very short stretch of code.  Nowadays, allowing it to
interrupt kernel calls doesn't seem like a very good idea, since its use
for statement_timeout means SIGALRM could occur anyplace in the code, and
there are far too many call sites where we aren't prepared to deal with
EINTR failures.  When third-party code is taken into consideration, it
seems impossible that we ever could be fully EINTR-proof, so better to
use SA_RESTART always and deal with the implications of that.  One such
implication is that we should not assume pg_usleep() will be terminated
early by a signal.  Therefore, long sleeps should probably be replaced
by WaitLatch operations where practical.

Back-patch to 9.3 so we can get some beta testing on this change.
2013-06-15 15:39:51 -04:00
Tom Lane e472b92140 Avoid deadlocks during insertion into SP-GiST indexes.
SP-GiST's original scheme for avoiding deadlocks during concurrent index
insertions doesn't work, as per report from Hailong Li, and there isn't any
evident way to make it work completely.  We could possibly lock individual
inner tuples instead of their whole pages, but preliminary experimentation
suggests that the performance penalty would be huge.  Instead, if we fail
to get a buffer lock while descending the tree, just restart the tree
descent altogether.  We keep the old tuple positioning rules, though, in
hopes of reducing the number of cases where this can happen.

Teodor Sigaev, somewhat edited by Tom Lane
2013-06-14 14:26:43 -04:00
Tom Lane c62866eeaf Remove special-case treatment of LOG severity level in standalone mode.
elog.c has historically treated LOG messages as low-priority during
bootstrap and standalone operation.  This has led to confusion and even
masked a bug, because the normal expectation of code authors is that
elog(LOG) will put something into the postmaster log, and that wasn't
happening during initdb.  So get rid of the special-case rule and make
the priority order the same as it is in normal operation.  To keep from
cluttering initdb's output and the behavior of a standalone backend,
tweak the severity level of three messages routinely issued by xlog.c
during startup and shutdown so that they won't appear in these cases.
Per my proposal back in December.
2013-06-13 23:15:15 -04:00
Tom Lane f04216341d Refactor checksumming code to make it easier to use externally.
pg_filedump and other external utility programs are likely to want to be
able to check Postgres page checksums.  To avoid messy duplication of code,
move the checksumming functionality into an exported header file, much as
we did awhile back for the CRC code.

In passing, get rid of an unportable assumption that a static char[] array
will be word-aligned, and do some other minor code beautification.
2013-06-13 22:35:56 -04:00
Tom Lane 629b3e96dd Only install a portal's ResourceOwner if it actually has one.
In most scenarios a portal without a ResourceOwner is dead and not subject
to any further execution, but a portal for a cursor WITH HOLD remains in
existence with no ResourceOwner after the creating transaction is over.
In this situation, if we attempt to "execute" the portal directly to fetch
data from it, we were setting CurrentResourceOwner to NULL, leading to a
segfault if the datatype output code did anything that required a resource
owner (such as trying to fetch system catalog entries that weren't already
cached).  The case appears to be impossible to provoke with stock libpq,
but psqlODBC at least is able to cause it when working with held cursors.

Simplest fix is to just skip the assignment to CurrentResourceOwner, so
that any resources used by the data output operations will be managed by
the transaction-level resource owner instead.  For consistency I changed
all the places that install a portal's resowner as current, even though
some of them are probably not reachable with a held cursor's portal.

Per report from Joshua Berry (with thanks to Hiroshi Inoue for developing
a self-contained test case).  Back-patch to all supported versions.
2013-06-13 13:12:49 -04:00
Noah Misch 66008564f8 Avoid reading past datum end when parsing JSON.
Several loops in the JSON parser examined a byte in memory just before
checking whether its address was in-bounds, so they could read one byte
beyond the datum's allocation.  A SIGSEGV is possible.  New in 9.3, so
no back-patch.
2013-06-12 19:51:12 -04:00
Noah Misch 3a5d0c5533 Avoid reading below the start of a stack variable in tokenize_file().
We would wrongly overwrite the prior stack byte if it happened to
contain '\n' or '\r'.  New in 9.3, so no back-patch.
2013-06-12 19:50:52 -04:00
Noah Misch 813895e4ac Don't pass oidvector by value.
Since the structure ends with a flexible array, doing so truncates any
vector having more than one element.  New in 9.3, so no back-patch.
2013-06-12 19:50:37 -04:00
Noah Misch fb435f40d5 Observe array length in HaveVirtualXIDsDelayingChkpt().
Since commit f21bb9cfb5, this function
ignores the caller-provided length and loops until it finds a
terminator, which GetVirtualXIDsDelayingChkpt() never adds.  Restore the
previous loop control logic.  In passing, revert the addition of an
unused variable by the same commit, presumably a debugging relic.
2013-06-12 19:50:14 -04:00