Commit Graph

12439 Commits

Author SHA1 Message Date
Tom Lane 6980f817e8 Stop btree indexscans upon reaching nulls in either direction.
The existing scan-direction-sensitive tests were overly complex, and
failed to stop the scan in cases where it's perfectly legitimate to do so.
Per bug #6278 from Maksym Boguk.

Back-patch to 8.3, which is as far back as the patch applies easily.
Doesn't seem worth sweating over a relatively minor performance issue in
8.2 at this late date.  (But note that this was a performance regression
from 8.1 and before, so 8.2 is being left as an outlier.)
2011-10-31 16:40:04 -04:00
Tom Lane 6743a878a4 Support more locale-specific formatting options in cash_out().
The POSIX spec defines locale fields for controlling the ordering of the
value, sign, and currency symbol in monetary output, but cash_out only
supported a small subset of these options.  Fully implement p/n_sign_posn,
p/n_cs_precedes, and p/n_sep_by_space per spec.  Fix up cash_in so that
it will accept all these format variants.

Also, make sure that thousands_sep is only inserted to the left of the
decimal point, as required by spec.

Per bug #6144 from Eduard Kracmar and discussion of bug #6277.  This patch
includes some ideas from Alexander Lakhin's proposed patch, though it is
very different in detail.
2011-10-30 15:02:58 -04:00
Tom Lane eb5834d5af Further improvement of make_greater_string.
Make sure that it considers all the possibilities that the old code did,
instead of trying only one possibility per character position.  To keep the
runtime in bounds, instead tweak the character incrementers to not try
every possible multibyte character code.  Remove unnecessary logic to
restore the old character value on failure.  Additional comment and
formatting cleanup.
2011-10-30 12:22:11 -04:00
Robert Haas fae54e4a16 Update visibilitymap.c header comments.
Recent work on index-only scans left this somewhat out of date.
2011-10-29 14:46:59 -04:00
Tom Lane 7609239f3e Fix assorted bogosities in cash_in() and cash_out().
cash_out failed to handle multiple-byte thousands separators, as per bug
#6277 from Alexander Law.  In addition, cash_in didn't handle that either,
nor could it handle multiple-byte positive_sign.  Both routines failed to
support multiple-byte mon_decimal_point, which I did not think was worth
changing, but at least now they check for the possibility and fall back to
using '.' rather than emitting invalid output.  Also, make cash_in handle
trailing negative signs, which formerly it would reject.  Since cash_out
generates trailing negative signs whenever the locale tells it to, this
last omission represents a fail-to-reload-dumped-data bug.  IMO that
justifies patching this all the way back.
2011-10-29 14:32:06 -04:00
Robert Haas 78d523b633 Improve make_greater_string() with encoding-specific incrementers.
This infrastructure doesn't in any way guarantee that the character
we produce will sort before the one we incremented; but it does at least
make it much more likely that we'll end up with something that is a valid
character, which improves our chances.

Kyotaro Horiguchi, with various adjustments by me.
2011-10-29 14:22:20 -04:00
Robert Haas 53f1ca59b5 Allow hint bits to be set sooner for temporary and unlogged tables.
We need not wait until the commit record is durably on disk, because
in the event of a crash the page we're updating with hint bits will
be gone anyway.  Per off-list report from Heikki Linnakangas, this
can significantly degrade the performance of unlogged tables; I was
able to show a 2x speedup from this patch on a pgbench run with scale
factor 15.  In practice, this will mostly help small, heavily updated
tables, because on larger tables you're unlikely to run into the same
row again before the commit record makes it out to disk.
2011-10-28 17:08:09 -04:00
Heikki Linnakangas cbf65509bb Fix the number of lwlocks needed by the "fast path" lock patch. It needs
one lock per backend or auxiliary process - the need for a lock for each
aux processes was not accounted for in NumLWLocks(). No-one noticed,
because the three locks needed for the three aux processes fit into the
few extra lwlocks we allocate for 3rd party modules that don't call
RequestAddinLWLocks() (NUM_USER_DEFINED_LWLOCKS, 4 by default).
2011-10-27 22:39:58 +03:00
Tom Lane 3e4b3465b6 Improve planner's ability to recognize cases where an IN's RHS is unique.
If the right-hand side of a semijoin is unique, then we can treat it like a
normal join (or another way to say that is: we don't need to explicitly
unique-ify the data before doing it as a normal join).  We were recognizing
such cases when the RHS was a sub-query with appropriate DISTINCT or GROUP
BY decoration, but there's another way: if the RHS is a plain relation with
unique indexes, we can check if any of the indexes prove the output is
unique.  Most of the infrastructure for that was there already in the join
removal code, though I had to rearrange it a bit.  Per reflection about a
recent example in pgsql-performance.
2011-10-26 17:52:29 -04:00
Tom Lane 1e3b21dd5e Change FK trigger naming convention to fix self-referential FKs.
Use names like "RI_ConstraintTrigger_a_NNNN" for FK action triggers and
"RI_ConstraintTrigger_c_NNNN" for FK check triggers.  This ensures the
action trigger fires first in self-referential cases where the very same
row update fires both an action and a check trigger.  This change provides
a non-probabilistic solution for bug #6268, at the risk that it could break
client code that is making assumptions about the exact names assigned to
auto-generated FK triggers.  Hence, change this in HEAD only.  No need for
forced initdb since old triggers continue to work fine.
2011-10-26 13:19:42 -04:00
Tom Lane 58958726ff Change FK trigger creation order to better support self-referential FKs.
When a foreign-key constraint references another column of the same table,
row updates will queue both the PK's ON UPDATE action and the FK's CHECK
action in the same event.  The ON UPDATE action must execute first, else
the CHECK will check a non-final state of the row and possibly throw an
inappropriate error, as seen in bug #6268 from Roman Lytovchenko.

Now, the firing order of multiple triggers for the same event is determined
by the sort order of their pg_trigger.tgnames, and the auto-generated names
we use for FK triggers are "RI_ConstraintTrigger_NNNN" where NNNN is the
trigger OID.  So most of the time the firing order is the same as creation
order, and so rearranging the creation order fixes it.

This patch will fail to fix the problem if the OID counter wraps around or
adds a decimal digit (eg, from 99999 to 100000) while we are creating the
triggers for an FK constraint.  Given the small odds of that, and the low
usage of self-referential FKs, we'll live with that solution in the back
branches.  A better fix is to change the auto-generated names for FK
triggers, but it seems unwise to do that in stable branches because there
may be client code that depends on the naming convention.  We'll fix it
that way in HEAD in a separate patch.

Back-patch to all supported branches, since this bug has existed for a long
time.
2011-10-26 13:02:28 -04:00
Magnus Hagander a87b9ae161 Make event_source visible on all platforms
On non-windows platform, we just ignore any value set there.

Noted by Jaime Casanova
2011-10-25 22:40:58 +02:00
Magnus Hagander d8ea33f2c0 Support configurable eventlog application names on Windows
This allows different instances to use the eventlog with different
identifiers, by setting the event_source GUC, similar to how
syslog_ident works.

Original patch by MauMau, heavily modified by Magnus Hagander
2011-10-25 20:02:55 +02:00
Tom Lane 0f39d5050d Don't trust deferred-unique indexes for join removal.
The uniqueness condition might fail to hold intra-transaction, and assuming
it does can give incorrect query results.  Per report from Marti Raudsepp,
though this is not his proposed patch.

Back-patch to 9.0, where both these features were introduced.  In the
released branches, add the new IndexOptInfo field to the end of the struct,
to try to minimize ABI breakage for third-party code that may be examining
that struct.
2011-10-23 00:43:39 -04:00
Tom Lane bb446b689b Support synchronization of snapshots through an export/import procedure.
A transaction can export a snapshot with pg_export_snapshot(), and then
others can import it with SET TRANSACTION SNAPSHOT.  The data does not
leave the server so there are not security issues.  A snapshot can only
be imported while the exporting transaction is still running, and there
are some other restrictions.

I'm not totally convinced that we've covered all the bases for SSI (true
serializable) mode, but it works fine for lesser isolation modes.

Joachim Wieland, reviewed by Marko Tiikkaja, and rather heavily modified
by Tom Lane
2011-10-22 18:23:30 -04:00
Heikki Linnakangas b436c72f61 Fix overly-complicated usage of errcode_for_file_access().
No need to do  "errcode(errcode_for_file_access())", just
"errcode_for_file_access()" is enough. The extra errcode() call is useless
but harmless, so there's no user-visible bug here. Nevertheless, backpatch
to 9.1 where this code were added.
2011-10-22 20:19:50 +03:00
Tom Lane f9c92a5a3e Code review for pgstat_get_crashed_backend_activity patch.
Avoid possibly dumping core when pgstat_track_activity_query_size has a
less-than-default value; avoid uselessly searching for the query string
of a successfully-exited backend; don't bother putting out an ERRDETAIL if
we don't have a query to show; some other minor stylistic improvements.
2011-10-21 16:36:04 -04:00
Tom Lane 5ac5980744 More cleanup after failed reduced-lock-levels-for-DDL feature.
Turns out that use of ShareUpdateExclusiveLock or ShareRowExclusiveLock
to protect DDL changes had gotten copied into several places that were
not touched by either of Simon's original patches for the feature, and
thus neither he nor I thought to revert them.  (Indeed, it appears that
two of these uses were committed *after* the reversion, which just goes
to show that git merging is no panacea.)  Change these places to use
AccessExclusiveLock again.  If we ever manage to resurrect that feature,
we're going to have to think a bit harder about how to keep lock level
usage in sync for DDL operations that aren't within the AlterTable
infrastructure.

Two of these bugs are only in HEAD, but one is in the 9.1 branch too.
Alvaro found one of them, I found the other two.
2011-10-21 13:50:30 -04:00
Robert Haas c8e8b5a6e2 Try to log current the query string when a backend crashes.
To avoid minimize risk inside the postmaster, we subject this feature
to a number of significant limitations.  We very much wish to avoid
doing any complex processing inside the postmaster, due to the
posssibility that the crashed backend has completely corrupted shared
memory.  To that end, no encoding conversion is done; instead, we just
replace anything that doesn't look like an ASCII character with a
question mark.  We limit the amount of data copied to 1024 characters,
and carefully sanity check the source of that data.  While these
restrictions would doubtless be unacceptable in a general-purpose
logging facility, even this limited facility seems like an improvement
over the status quo ante.

Marti Raudsepp, reviewed by PDXPUG and myself
2011-10-21 13:26:40 -04:00
Robert Haas 980261929f Fix DROP OPERATOR FAMILY IF EXISTS.
Essentially, the "IF EXISTS" portion was being ignored, and an error
thrown anyway if the opfamily did not exist.

I broke this in commit fd1843ff8979c0461fb3f1a9eab61140c977e32d; so
backpatch to 9.1.X.

Report and diagnosis by KaiGai Kohei.
2011-10-21 09:12:23 -04:00
Tom Lane b4a0223d00 Simplify and improve ProcessStandbyHSFeedbackMessage logic.
There's no need to clamp the standby's xmin to be greater than
GetOldestXmin's result; if there were any such need this logic would be
hopelessly inadequate anyway, because it fails to account for
within-database versus cluster-wide values of GetOldestXmin.  So get rid of
that, and just rely on sanity-checking that the xmin is not wrapped around
relative to the nextXid counter.  Also, don't reset the walsender's xmin if
the current feedback xmin is indeed out of range; that just creates more
problems than we already had.  Lastly, don't bother to take the
ProcArrayLock; there's no need to do that to set xmin.

Also improve the comments about this in GetOldestXmin itself.
2011-10-20 19:43:31 -04:00
Robert Haas 8f3362d4b7 Fix get_object_namespace() not to think extensions are "in" a schema.
extnamespace means something altogether different in this context.
Mostly by accident, this coding error (introduced in my commit
82a4a777d9) broke the buildfarm instead
of just silently doing the wrong thing.
2011-10-20 00:07:41 -04:00
Robert Haas 1d751018d8 Add "skipping" to the NOTICE produced by DROP OPERATOR CLASS IF EXISTS.
This makes this message consistent with all the other similar notices
produced by other DROP IF EXISTS commands.

Noted by KaiGai Kohei
2011-10-19 23:45:31 -04:00
Robert Haas 82a4a777d9 Consolidate DROP handling for some object types.
This gets rid of a significant amount of duplicative code.

KaiGai Kohei, reviewed in earlier versions by Dimitri Fontaine, with
further review and cleanup by me.
2011-10-19 23:27:19 -04:00
Tom Lane aa90e148ca Suppress -Wunused-result warnings about write() and fwrite().
This is merely an exercise in satisfying pedants, not a bug fix, because
in every case we were checking for failure later with ferror(), or else
there was nothing useful to be done about a failure anyway.  Document
the latter cases.
2011-10-18 21:37:51 -04:00
Tom Lane e27f52f3a1 Reject empty pg_hba.conf files.
An empty HBA file is surely an error, since it means there is no way to
connect to the server.  We've not heard identifiable reports of people
actually doing that, but this will also close off the case Thom Brown just
complained of, namely pointing hba_file at a directory.  (On at least some
platforms with some directories, it will read as an empty file.)

Perhaps this should be back-patched, but given the lack of previous
complaints, I won't add extra work for the translators.
2011-10-18 20:09:18 -04:00
Magnus Hagander d1e25b78f9 Exclude postmaster.opts from base backups
Noted by Fujii Masao
2011-10-18 15:58:37 +02:00
Tom Lane 336c1d7a51 Avoid assuming that index-only scan data matches the index's rowtype.
In general the data returned by an index-only scan should have the
datatypes originally computed by FormIndexDatum.  If the index opclasses
use "storage" datatypes different from their input datatypes, the scan
tuple will not have the same rowtype attributed to the index; but we had
a hard-wired assumption that that was true in nodeIndexonlyscan.c.  We'd
already hacked around the issue for the one case where the types are
different in btree indexes (btree name_ops), but this would definitely
come back to bite us if we ever implement index-only scans in GiST.

To fix, require the index AM to explicitly provide the tupdesc for the
tuple it is returning.  btree can just pass back the index's tupdesc, but
GiST will have to work harder when and if it supports index-only scans.

I had previously proposed fixing this by allowing the index AM to fill the
scan tuple slot directly; but on reflection that seemed like a module
layering violation, since TupleTableSlots are creatures of the executor.
At least in the btree case, it would also be less efficient, since the
tuple deconstruction work would occur even for rows later found to be
invisible to the scan's snapshot.
2011-10-16 19:15:04 -04:00
Tom Lane 9e8da0f757 Teach btree to handle ScalarArrayOpExpr quals natively.
This allows "indexedcol op ANY(ARRAY[...])" conditions to be used in plain
indexscans, and particularly in index-only scans.
2011-10-16 15:39:24 -04:00
Tom Lane d26e1ebaf5 Fix bugs in information_schema.referential_constraints view.
This view was being insufficiently careful about matching the FK constraint
to the depended-on primary or unique key constraint.  That could result in
failure to show an FK constraint at all, or showing it multiple times, or
claiming that it depended on a different constraint than the one it really
does.  Fix by joining via pg_depend to ensure that we find only the correct
dependency.

Back-patch, but don't bump catversion because we can't force initdb in back
branches.  The next minor-version release notes should explain that if you
need to fix this in an existing installation, you can drop the
information_schema schema then re-create it by sourcing
$SHAREDIR/information_schema.sql in each database (as a superuser of
course).
2011-10-14 20:24:17 -04:00
Tom Lane e6858e6657 Measure the number of all-visible pages for use in index-only scan costing.
Add a column pg_class.relallvisible to remember the number of pages that
were all-visible according to the visibility map as of the last VACUUM
(or ANALYZE, or some other operations that update pg_class.relpages).
Use relallvisible/relpages, instead of an arbitrary constant, to estimate
how many heap page fetches can be avoided during an index-only scan.

This is pretty primitive and will no doubt see refinements once we've
acquired more field experience with the index-only scan mechanism, but
it's way better than using a constant.

Note: I had to adjust an underspecified query in the window.sql regression
test, because it was changing answers when the plan changed to use an
index-only scan.  Some of the adjacent tests perhaps should be adjusted
as well, but I didn't do that here.
2011-10-14 17:23:46 -04:00
Robert Haas 393e828e31 Avoid potential relcache leak in objectaddress.c.
Nobody using the missing_ok flag yet, but let's speculate that this will
be a better interface for future callers.

KaiGai Kohei, with some adjustments by me.
2011-10-14 11:35:40 -04:00
Bruce Momjian 0180bd6180 Remove all "traces" of trace_userlocks, because userlocks were removed
in PG 8.2.
2011-10-13 19:59:57 -04:00
Tom Lane 7b96519fe2 Don't mark auto-generated types as extension members.
Relation rowtypes and automatically-generated array types do not need to
have their own extension membership dependency entries.  If we create such
then it becomes more difficult to remove items from an extension, and it's
also harder for an extension upgrade script to make sure it duplicates the
dependencies created by the extension's regular installation script.

I changed the code in such a way that this happened in commit
988cccc620, I think because of worries about
the shell-type-replacement case; but that cure was worse than the disease.
It would only matter if one extension created a shell type that was
replaced with an auto-generated type in another extension, which seems
pretty far-fetched.  Better to make this work unsurprisingly in normal
cases.

Report and patch by Robert Haas, comment adjustments by me.
2011-10-12 18:41:49 -04:00
Bruce Momjian 484af9b376 Modify RelationGetBufferForTuple() to use a typedef, rather than a
struct, to help pgindent.
2011-10-12 16:53:54 -04:00
Tom Lane 458857cc9d Throw a useful error message if an extension script file is fed to psql.
We have seen one too many reports of people trying to use 9.1 extension
files in the old-fashioned way of sourcing them in psql.  Not only does
that usually not work (due to failure to substitute for MODULE_PATHNAME
and/or @extschema@), but if it did work they'd get a collection of loose
objects not an extension.  To prevent this, insert an \echo ... \quit
line that prints a suitable error message into each extension script file,
and teach commands/extension.c to ignore lines starting with \echo.
That should not only prevent any adverse consequences of loading a script
file the wrong way, but make it crystal clear to users that they need to
do it differently now.

Tom Lane, following an idea of Andrew Dunstan's.  Back-patch into 9.1
... there is not going to be much value in this if we wait till 9.2.
2011-10-12 15:45:03 -04:00
Tom Lane 8c8ba6d11b Add comment on why pulling data from a "name" index column can't crash.
It's been bothering me for several days that pretending that the cstring
data stored in a btree name_ops column is really a "name" Datum could lead
to reading past the end of memory.  However, given the current memory
layout used for index-only scans in the btree code, a crash is in fact not
possible.  Document that so we don't break it.  I have not thought of any
other solutions that aren't fairly ugly too, and most of them lose the
functionality of index-only scans on name columns altogether, so this seems
like the way to go.
2011-10-11 18:40:53 -04:00
Tom Lane cb6771fb32 Generate index-only scan tuple descriptor from the plan node's indextlist.
Dept. of second thoughts: as long as we've got that tlist hanging around
anyway, we can apply ExecTypeFromTL to it to get a suitable descriptor for
the ScanTupleSlot.  This is a nicer solution than the previous one because
it eliminates some hard-wired knowledge about btree name_ops, and because
it avoids the somewhat shaky assumption that we needn't set up the scan
tuple descriptor in EXPLAIN_ONLY mode.  It doesn't change what actually
happens at run-time though, and I'm still a bit nervous about that.
2011-10-11 18:12:57 -04:00
Tom Lane 600d3206d1 Consider index-only scans even when there is no matching qual or ORDER BY.
By popular demand.
2011-10-11 15:00:30 -04:00
Tom Lane a0185461dd Rearrange the implementation of index-only scans.
This commit changes index-only scans so that data is read directly from the
index tuple without first generating a faux heap tuple.  The only immediate
benefit is that indexes on system columns (such as OID) can be used in
index-only scans, but this is necessary infrastructure if we are ever to
support index-only scans on expression indexes.  The executor is now ready
for that, though the planner still needs substantial work to recognize
the possibility.

To do this, Vars in index-only plan nodes have to refer to index columns
not heap columns.  I introduced a new special varno, INDEX_VAR, to mark
such Vars to avoid confusion.  (In passing, this commit renames the two
existing special varnos to OUTER_VAR and INNER_VAR.)  This allows
ruleutils.c to handle them with logic similar to what we use for subplan
reference Vars.

Since index-only scans are now fundamentally different from regular
indexscans so far as their expression subtrees are concerned, I also chose
to change them to have their own plan node type (and hence, their own
executor source file).
2011-10-11 14:21:30 -04:00
Robert Haas fa351d5a0d Replace hardcoded switch in object_exists() with a lookup table.
There's no particular advantage to this change on its face; indeed,
it's possible that this might be slightly slower than the old way.
But it makes this information more easily accessible to other
functions, and therefore paves the way for future code consolidation.
Performance isn't critical here, so there's no need to be smart about
how we do the search.

This is a heavily cut-down version of a patch from KaiGai Kohei,
with several fixes by me.  Additional review from Dimitri Fontaine.
2011-10-11 09:14:30 -04:00
Robert Haas e76bcaba9c Repair breakage in VirtualXactLock.
I broke this in commit 84e3712677.  Report and
fix by Fujii Masao.
2011-10-11 07:39:09 -04:00
Bruce Momjian e26d5fcd94 Mark GUC external_pid_file's default as '' in postgresql.conf, rather
than '(none)'.
2011-10-10 08:17:10 -04:00
Robert Haas c0f03aae04 Fix ALTER TABLE ONLY .. DROP CONSTRAINT.
When I consolidated two copies of the HOT-chain search logic in commit
4da99ea423, I introduced a behavior
change: the old code wouldn't necessarily traverse the entire chain,
if the most recently returned tuple were updated while the HOT chain
traversal is in progress.  The new behavior seems more correct, but
unfortunately, the code here relies on a scan with SnapshotNow failing
to see its own updates.  That seems pretty shaky even with the old HOT
chain traversal behavior, since there's no guarantee that these
updates will always be HOT, but it's trivial to broke a failure with
the new HOT search logic.  Fix by updating just the first matching
pg_constraint tuple, rather than all of them, since there should be
only one anyway.  But since nobody has reproduced this failure on older
versions, no back-patch for now.

Report and test case by Alex Hunsaker; tablecmds.c changes by me.
2011-10-09 23:39:52 -04:00
Heikki Linnakangas d50e125194 Clean up a couple of box gist helper functions.
The original idea of this patch was to make box picksplit run faster, by
eliminating unnecessary palloc() overhead, but that was obsoleted by the new
double-sorting split algorithm that doesn't call these functions so heavily
anymore. Nevertheless, the code looks better this way.

Original patch by me, reviewed and tidied up after the double-sorting patch
by Kevin Grittner.
2011-10-09 18:59:34 +03:00
Tom Lane cbfa92c23c Improve index-only scans to avoid repeated access to the index page.
We copy all the matched tuples off the page during _bt_readpage, instead of
expensively re-locking the page during each subsequent tuple fetch.  This
costs a bit more local storage, but not more than 2*BLCKSZ worth, and the
reduction in LWLock traffic is certainly worth that.  What's more, this
lets us get rid of the API wart in the original patch that said an index AM
could randomly decline to supply an index tuple despite having asserted
pg_am.amcanreturn.  That will be important for future improvements in the
index-only-scan feature, since the executor will now be able to rely on
having the index data available.
2011-10-09 00:21:08 -04:00
Tom Lane b324384f6b Fix brain fade in cost estimation for index-only scans.
visibility_fraction should not be applied to regular indexscans.
Noted by Cédric Villemain.
2011-10-08 10:41:17 -04:00
Heikki Linnakangas 1ef60dab70 Don't let transform_null_equals=on affect CASE foo WHEN NULL ... constructs.
transform_null_equals is only supposed to affect "foo = NULL" expressions
given directly by the user, not the internal "foo = NULL" expression
generated from CASE-WHEN.

This fixes bug #6242, reported by Sergey. Backpatch to all supported
branches.
2011-10-08 11:17:40 +03:00
Tom Lane a2822fb933 Support index-only scans using the visibility map to avoid heap fetches.
When a btree index contains all columns required by the query, and the
visibility map shows that all tuples on a target heap page are
visible-to-all, we don't need to fetch that heap page.  This patch depends
on the previous patches that made the visibility map reliable.

There's a fair amount left to do here, notably trying to figure out a less
chintzy way of estimating the cost of an index-only scan, but the core
functionality seems ready to commit.

Robert Haas and Ibrar Ahmed, with some previous work by Heikki Linnakangas.
2011-10-07 20:14:13 -04:00
Magnus Hagander 7aeff9f4a4 Ensure walsenders can be SIGTERMed while in non-walsender code
In oder to exit on SIGTERM when in non-walsender code,
such as do_pg_stop_backup(), we need to set the interrupt
variables that are used there, and not just the walsender
local ones.
2011-10-06 21:43:14 +02:00
Bruce Momjian aaa6e1def2 Add postmaster -C option to query configuration parameters, and have
pg_ctl use that to query the data directory for config-only installs.
This fixes awkward or impossible pg_ctl operation for config-only
installs.
2011-10-06 09:38:39 -04:00
Heikki Linnakangas 7f3bd86843 Replace the "New Linear" GiST split algorithm for boxes and points with a
new double-sorting algorithm. The new algorithm produces better quality
trees, making searches faster.

Alexander Korotkov
2011-10-06 10:03:46 +03:00
Tom Lane ba6f629326 Improve and simplify CREATE EXTENSION's management of GUC variables.
CREATE EXTENSION needs to transiently set search_path, as well as
client_min_messages and log_min_messages.  We were doing this by the
expedient of saving the current string value of each variable, doing a
SET LOCAL, and then doing another SET LOCAL with the previous value at
the end of the command.  This is a bit expensive though, and it also fails
badly if there is anything funny about the existing search_path value,
as seen in a recent report from Roger Niederland.  Fortunately, there's a
much better way, which is to piggyback on the GUC infrastructure previously
developed for functions with SET options.  We just open a new GUC nesting
level, do our assignments with GUC_ACTION_SAVE, and then close the nesting
level when done.  This automatically restores the prior settings without a
re-parsing pass, so (in principle anyway) there can't be an error.  And
guc.c still takes care of cleanup in event of an error abort.

The CREATE EXTENSION code for this was modeled on some much older code in
ri_triggers.c, which I also changed to use the better method, even though
there wasn't really much risk of failure there.  Also improve the comments
in guc.c to reflect this additional usage.
2011-10-05 20:44:16 -04:00
Tom Lane 41e461d36f Improve define_custom_variable's handling of pre-existing settings.
Arrange for any problems with pre-existing settings to be reported as
WARNING not ERROR, so that we don't undesirably abort the loading of the
incoming add-on module.  The bad setting is just discarded, as though it
had never been applied at all.  (This requires a change in the API of
set_config_option.  After some thought I decided the most potentially
useful addition was to allow callers to just pass in a desired elevel.)

Arrange to restore the complete stacked state of the variable, rather than
cheesily reinstalling only the active value.  This ensures that custom GUCs
will behave unsurprisingly even when the module loading operation occurs
within nested subtransactions that have changed the active value.  Since a
module load could occur as a result of, eg, a PL function call, this is not
an unlikely scenario.
2011-10-04 19:57:21 -04:00
Tom Lane fa56a0c3e0 Fix uninitialized-variable bug. 2011-10-04 17:08:18 -04:00
Tom Lane 4bcb82a7d5 Add sourcefile/sourceline data to EXEC_BACKEND GUC transmission files.
This oversight meant that on Windows, the pg_settings view would not
display source file or line number information for values coming from
postgresql.conf, unless the backend had received a SIGHUP since starting.

In passing, also make the error detection in read_nondefault_variables a
tad more thorough, and fix it to not lose precision on float GUCs (these
changes are already in HEAD as of my previous commit).
2011-10-04 16:47:48 -04:00
Tom Lane 9f5836d224 Remember the source GucContext for each GUC parameter.
We used to just remember the GucSource, but saving GucContext too provides
a little more information --- notably, whether a SET was done by a
superuser or regular user.  This allows us to rip out the fairly dodgy code
that define_custom_variable used to use to try to infer the context to
re-install a pre-existing setting with.  In particular, it now works for
a superuser to SET a extension's SUSET custom variable before loading the
associated extension, because GUC can remember whether the SET was done as
a superuser or not.  The plperl regression tests contain an example where
this is useful.
2011-10-04 16:13:50 -04:00
Alvaro Herrera 09e196e453 Use callbacks in SlruScanDirectory for the actual action
Previously, the code assumed that the only possible action to take was
to delete files behind a certain cutoff point.  The async notify code
was already a crock: it used a different "pagePrecedes" function for
truncation than for regular operation.  By allowing it to pass a
callback to SlruScanDirectory it can do cleanly exactly what it needs to
do.

The clog.c code also had its own use for SlruScanDirectory, which is
made a bit simpler with this.
2011-10-04 14:03:23 -03:00
Tom Lane 1a00c0ef53 Remove the custom_variable_classes parameter.
This variable provides only marginal error-prevention capability (since
it can only check the prefix of a qualified GUC name), and the consensus
is that that isn't worth the amount of hassle that maintaining the setting
creates for DBAs.  So, let's just remove it.

With this commit, the system will silently accept a value for any qualified
GUC name at all, whether it has anything to do with any known extension or
not.  (Unqualified names still have to match known built-in settings,
though; and you will get a WARNING at extension load time if there's an
unrecognized setting with that extension's prefix.)

There's still some discussion ongoing about whether to tighten that up and
if so how; but if we do come up with a solution, it's not likely to look
anything like custom_variable_classes.
2011-10-04 12:36:55 -04:00
Tom Lane 76074fcaa0 ProcedureCreate neglected to record dependencies on default expressions.
Thus, an object referenced in a default expression could be dropped while
the function remained present.  This was unaccountably missed in the
original patch to add default parameters for functions.  Reported by
Pavel Stehule.
2011-10-03 12:13:15 -04:00
Tom Lane d56b3afc03 Restructure error handling in reading of postgresql.conf.
This patch has two distinct purposes: to report multiple problems in
postgresql.conf rather than always bailing out after the first one,
and to change the policy for whether changes are applied when there are
unrelated errors in postgresql.conf.

Formerly the policy was to apply no changes if any errors could be
detected, but that had a significant consistency problem, because in some
cases specific values might be seen as valid by some processes but invalid
by others.  This meant that the latter processes would fail to adopt
changes in other parameters even though the former processes had done so.

The new policy is that during SIGHUP, the file is rejected as a whole
if there are any errors in the "name = value" syntax, or if any lines
attempt to set nonexistent built-in parameters, or if any lines attempt
to set custom parameters whose prefix is not listed in (the new value of)
custom_variable_classes.  These tests should always give the same results
in all processes, and provide what seems a reasonably robust defense
against loading values from badly corrupted config files.  If these tests
pass, all processes will apply all settings that they individually see as
good, ignoring (but logging) any they don't.

In addition, the postmaster does not abandon reading a configuration file
after the first syntax error, but continues to read the file and report
syntax errors (up to a maximum of 100 syntax errors per file).

The postmaster will still refuse to start up if the configuration file
contains any errors at startup time, but these changes allow multiple
errors to be detected and reported before quitting.

Alexey Klyukin, reviewed by Andy Colson and av (Alexander ?)
with some additional hacking by Tom Lane
2011-10-02 16:50:04 -04:00
Tom Lane 5ec6b7f1b8 Improve generated column names for cases involving sub-SELECTs.
We'll now use "exists" for EXISTS(SELECT ...), "array" for ARRAY(SELECT
...), or the sub-select's own result column name for a simple expression
sub-select.  Previously, you usually got "?column?" in such cases.

Marti Raudsepp, reviewed by Kyotaro Horiugchi
2011-10-01 14:01:46 -04:00
Tom Lane d22a09dc70 Support GiST index support functions that want to cache data across calls.
pg_trgm was already doing this unofficially, but the implementation hadn't
been thought through very well and leaked memory.  Restructure the core
GiST code so that it actually works, and document it.  Ordinarily this
would have required an extra memory context creation/destruction for each
GiST index search, but I was able to avoid that in the normal case of a
non-rescanned search by finessing the handling of the RBTree.  It used to
have its own context always, but now shares a context with the
scan-lifespan data structures, unless there is more than one rescan call.
This should make the added overhead unnoticeable in typical cases.
2011-09-30 19:48:57 -04:00
Tom Lane 79edb2b1dc Fix recursion into previously planned sub-query in examine_simple_variable.
This code was looking at the sub-Query tree as seen in the parent query's
RangeTblEntry; but that's the pristine parser output, and what we need to
look at is the tree as it stands at the completion of planning.  Otherwise
we might pick up a Var that references a subquery that got flattened and
hence has no RelOptInfo in the subroot.  Per report from Peter Geoghegan.
2011-09-29 18:13:16 -04:00
Bruce Momjian 054219c907 Fix pg_upgrade for EXEC_BACKEND builds (e.g. Windows) by properly
passing the -b/binary-upgrade flag.

Backpatch to 9.1.X.
2011-09-29 17:21:34 -04:00
Tom Lane cb37c29106 Fix index matching for operators with mixed collatable/noncollatable inputs.
If an indexable operator for a non-collatable indexed datatype has a
collatable right-hand input type, any OpExpr for it will be marked with a
nonzero inputcollid (since having one collatable input is sufficient to
make that happen).  However, an index on a non-collatable column certainly
doesn't have any collation.  This caused us to fail to match such operators
to their indexes, because indxpath.c required an exact match of index
collation and clause collation.  It seems correct to allow a match when the
index is collation-less regardless of the clause's inputcollid: an operator
with both noncollatable and collatable inputs could perhaps depend on the
collation of the collatable input, but it could hardly expect the index for
the noncollatable input to have that same collation.

Per bug #6232 from Pierre Ducroquet.  His example is specifically about
"hstore ? text" but the problem seems quite generic.
2011-09-29 00:43:42 -04:00
Robert Haas f70648d5a1 Update comments related to the crash-safety of the visibility map.
In hio.c, document how we avoid deadlock with respect to visibility map
buffer locks.  In visibilitymap.c, update the LOCKING section of the
file header comment.

Both oversights noted by Heikki Linnakangas.
2011-09-27 09:30:23 -04:00
Robert Haas 624f155ffa heap_update() must recheck tuple after unlocking and relocking buffer.
Bug found by Alvaro Herrera, fix suggested by Heikki Linnakangas
and reviewed by Tom Lane.
2011-09-27 08:24:18 -04:00
Tom Lane 269c5dd2f4 Fix window functions that sort by expressions involving aggregates.
In commit c1d9579dd8, I changed things so
that the output of the Agg node that feeds the window functions would not
list any ungrouped Vars directly.  Formerly, for example, the Agg tlist
might have included both "x" and "sum(x)", which is not really valid if
"x" isn't a grouping column.  If we then had a window function ordering on
something like "sum(x) + 1", prepare_sort_from_pathkeys would find no exact
match for this in the Agg tlist, and would conclude that it must recompute
the expression.  But it would break the expression down to just the Var
"x", which it would find in the tlist, and then rebuild the ORDER BY
expression using a reference to the subplan's "x" output.  Now, after the
above-referenced changes, "x" isn't in the Agg tlist if it's not a grouping
column, so that prepare_sort_from_pathkeys fails with "could not find
pathkey item to sort", as reported by Bricklen Anderson.

The fix is to not break down Aggrefs into their component parts, but just
treat them as irreducible expressions to be sought in the subplan tlist.
This is definitely OK for the use with respect to window functions in
grouping_planner, since it just built the tlist being used on the same
basis.  AFAICT it is safe for other uses too; most of the other call sites
couldn't encounter Aggrefs anyway.
2011-09-26 23:48:39 -04:00
Tom Lane 57eb009092 Allow snapshot references to still work during transaction abort.
In REPEATABLE READ (nee SERIALIZABLE) mode, an attempt to do
GetTransactionSnapshot() between AbortTransaction and CleanupTransaction
failed, because GetTransactionSnapshot would recompute the transaction
snapshot (which is already wrong, given the isolation mode) and then
re-register it in the TopTransactionResourceOwner, leading to an Assert
because the TopTransactionResourceOwner should be empty of resources after
AbortTransaction.  This is the root cause of bug #6218 from Yamamoto
Takashi.  While changing plancache.c to avoid requesting a snapshot when
handling a ROLLBACK masks the problem, I think this is really a snapmgr.c
bug: it's lower-level than the resource manager mechanism and should not be
shutting itself down before we unwind resource manager resources.  However,
just postponing the release of the transaction snapshot until cleanup time
didn't work because of the circular dependency with
TopTransactionResourceOwner.  Fix by managing the internal reference to
that snapshot manually instead of depending on TopTransactionResourceOwner.
This saves a few cycles as well as making the module layering more
straightforward.  predicate.c's dependencies on TopTransactionResourceOwner
go away too.

I think this is a longstanding bug, but there's no evidence that it's more
than a latent bug, so it doesn't seem worth any risk of back-patching.
2011-09-26 22:25:28 -04:00
Robert Haas 821fd903f9 Update obsolete comments.
This was partially fixed by 57fdb2b0d8,
back in 2005, but it missed a couple of spots.

YAMAMOTO Takashi
2011-09-26 13:12:22 -04:00
Tom Lane 21fb95da46 Use a fresh copy of query_list when making a second plan in GetCachedPlan.
The code path that tried a generic plan, didn't like it, and then made a
custom plan was mistakenly passing the same copy of the query_list to the
planner both times.  This doesn't work too well for nontrivial queries,
since the planner tends to scribble on its input.  Diagnosis and fix by
Yamamoto Takashi.
2011-09-26 12:44:17 -04:00
Tom Lane d5aa7a9fe6 Avoid unnecessary snapshot-acquisitions in BuildCachedPlan.
I had copied-and-pasted a claim that we couldn't reach this point when
dealing with utility statements, but that was a leftover from when the
caller was required to supply a plan to start with.  We now will go
through here at least once when handling a utility statement, so it
seems worth a check to see whether a snapshot is actually needed.
(Note that analyze_requires_snapshot is quite a cheap test.)

Per suggestion from Yamamoto Takashi.  I don't think I believe that this
resolves his reported assertion failure; but it's worth changing anyway,
just to save a cycle or two.
2011-09-25 17:34:20 -04:00
Tom Lane 7741dd6590 Recognize self-contradictory restriction clauses for non-table relations.
The constraint exclusion feature checks for contradictions among scan
restriction clauses, as well as contradictions between those clauses and a
table's CHECK constraints.  The first aspect of this testing can be useful
for non-table relations (such as subqueries or functions-in-FROM), but the
feature was coded with only the CHECK case in mind so we were applying it
only to plain-table RTEs.  Move the relation_excluded_by_constraints call
so that it is applied to all RTEs not just plain tables.  With the default
setting of constraint_exclusion this results in no extra work, but with
constraint_exclusion = ON we will detect optimizations that we missed
before (at the cost of more planner cycles than we expended before).

Per a gripe from Gunnlaugur Þór Briem.  Experimentation with
his example also showed we were not being very bright about the case where
constraint exclusion is proven within a subquery within UNION ALL, so tweak
the code to allow set_append_rel_pathlist to recognize such cases.
2011-09-24 19:33:16 -04:00
Robert Haas 0c8eda6258 Memory barrier support for PostgreSQL.
This is not actually used anywhere yet, but it gets the basic
infrastructure in place.  It is fairly likely that there are bugs, and
support for some important platforms may be missing, so we'll need to
refine this as we go along.
2011-09-23 17:52:43 -04:00
Tom Lane f197272365 Make EXPLAIN ANALYZE report the numbers of rows rejected by filter steps.
This provides information about the numbers of tuples that were visited
but not returned by table scans, as well as the numbers of join tuples
that were considered and discarded within a join plan node.

There is still some discussion going on about the best way to report counts
for outer-join situations, but I think most of what's in the patch would
not change if we revise that, so I'm going to go ahead and commit it as-is.

Documentation changes to follow (they weren't in the submitted patch
either).

Marko Tiikkaja, reviewed by Marc Cousin, somewhat revised by Tom
2011-09-22 11:30:11 -04:00
Robert Haas 4893552e21 Fix another bit of unlogged-table-induced breakage.
Per bug #6205, reported by Abel Abraham Camarillo Ojeda.  This isn't a
particularly elegant fix, but I'm trying to minimize the chances of
causing yet another round of breakage.

Adjust regression tests to exercise this case.
2011-09-21 10:48:31 -04:00
Tom Lane 2562dcea81 Suppress "unused function" warning when not HAVE_LOCALE_T.
Forgot to consider this case ...
2011-09-20 17:47:21 -04:00
Tom Lane 37d4fd2b9d Improve reporting of newlocale() failures in CREATE COLLATION.
The standardized errno code for "no such locale" failures is ENOENT, which
we were just reporting at face value, viz "No such file or directory".
Per gripe from Thom Brown, this might confuse users, so add an errdetail
message to clarify what it means.  Also, report newlocale() failures as
ERRCODE_INVALID_PARAMETER_VALUE rather than using
errcode_for_file_access(), since newlocale()'s errno values aren't
necessarily tied directly to file access failures.
2011-09-20 13:23:40 -04:00
Tom Lane c4ae968633 Fix Assert failure in new plancache code.
The regression tests were failing with CLOBBER_CACHE_ALWAYS enabled,
as reported by buildfarm member jaguar.  There was an Assert in
BuildCachedPlan that asserted that the CachedPlanSource hadn't been
invalidated since we called RevalidateCachedQuery, which in theory can't
happen because we are holding locks on all the relevant database objects.
However, CLOBBER_CACHE_ALWAYS generates a false positive by making an
invalidation happen anyway; and on reflection, that could also occur as a
result of a badly-timed sinval reset due to queue overflow.  We could just
remove the Assert and forge ahead with the not-really-stale querytree, but
it seems safer to do another RevalidateCachedQuery call just to make real
sure everything's OK.
2011-09-17 01:47:33 -04:00
Tom Lane 99b5454167 Remove debug logging for pgstat wait timeout.
This reverts commit 79b2ee20c8, which proved
to not be very informative; it looks like the "pgstat wait timeout"
warnings in the buildfarm are just a symptom of running on heavily loaded
machines, and there isn't any weird mechanism causing them to appear.

To try to reduce the frequency of buildfarm failures from this effect,
increase PGSTAT_MAX_WAIT_TIME from 5 seconds to 10.

Also, arrange to not send a fresh inquiry message every single time through
the loop, as that seems more likely to cause problems (by swamping the
collector) than fix them.  We'll now send an inquiry the first time through
the delay loop, and every 640 msec thereafter.
2011-09-16 18:25:27 -04:00
Tom Lane 9d306c66e6 Avoid unnecessary page-level SSI lock check in heap_insert().
As observed by Heikki, we need not conflict on heap page locks during an
insert; heap page locks are only aggregated tuple locks, they don't imply
locking "gaps" as index page locks do.  So we can avoid some unnecessary
conflicts, and also do the SSI check while not holding exclusive lock on
the target buffer.

Kevin Grittner, reviewed by Jeff Davis.  Back-patch to 9.1.
2011-09-16 14:47:20 -04:00
Tom Lane 0a6cc28500 gistendscan() forgot to free so->giststate.
This oversight led to a massive memory leak --- upwards of 10KB per tuple
--- during creation-time verification of an exclusion constraint based on a
GIST index.  In most other scenarios it'd just be a leak of 10KB that would
be recovered at end of query, so not too significant; though perhaps the
leak would be noticeable in a situation where a GIST index was being used
in a nestloop inner indexscan.  In any case, it's a real leak of long
standing, so patch all supported branches.  Per report from Harald Fuchs.
2011-09-16 04:27:49 -04:00
Tom Lane e6faf910d7 Redesign the plancache mechanism for more flexibility and efficiency.
Rewrite plancache.c so that a "cached plan" (which is rather a misnomer
at this point) can support generation of custom, parameter-value-dependent
plans, and can make an intelligent choice between using custom plans and
the traditional generic-plan approach.  The specific choice algorithm
implemented here can probably be improved in future, but this commit is
all about getting the mechanism in place, not the policy.

In addition, restructure the API to greatly reduce the amount of extraneous
data copying needed.  The main compromise needed to make that possible was
to split the initial creation of a CachedPlanSource into two steps.  It's
worth noting in particular that SPI_saveplan is now deprecated in favor of
SPI_keepplan, which accomplishes the same end result with zero data
copying, and no need to then spend even more cycles throwing away the
original SPIPlan.  The risk of long-term memory leaks while manipulating
SPIPlans has also been greatly reduced.  Most of this improvement is based
on use of the recently-added MemoryContextSetParent primitive.
2011-09-16 00:43:52 -04:00
Alvaro Herrera 86822df9b5 Split walsender.h in public/private headers
This dramatically cuts short the number of headers the public one brings
into whatever includes it.
2011-09-13 21:42:49 -03:00
Tom Lane 6693c9a5ed deflist_to_tuplestore dumped core on an option with no value.
Make it return NULL for the option_value, instead.

Per report from Frank van Vugt.  Back-patch to 8.4 where this code was
added.
2011-09-13 11:36:49 -04:00
Heikki Linnakangas 8caf6132c7 In the final emptying phase of the new GiST buffering build, set the
queuedForEmptying flag correctly on buffer when adding it to the queue.
Also, don't add buffer to the queue if it's there already. These were
harmless oversights; failing to set the flag just means that a buffer might
get added to the queue twice if more tuples are added to it (although that
can't actually happen at this point because all the upper buffers have
already been emptied), and having the same buffer twice in the emptying
queue is harmless. But better be tidy.
2011-09-12 13:06:06 +03:00
Tom Lane b0025bd957 Invent a new memory context primitive, MemoryContextSetParent.
This function will be useful for altering the lifespan of a context after
creation (for example, by creating it under a transient context and later
reparenting it to belong to a long-lived context).  It costs almost no new
code, since we can refactor what was there.  Per my proposal of yesterday.
2011-09-11 16:29:42 -04:00
Peter Eisentraut 1b81c2fe6e Remove many -Wcast-qual warnings
This addresses only those cases that are easy to fix by adding or
moving a const qualifier or removing an unnecessary cast.  There are
many more complicated cases remaining.
2011-09-11 21:54:32 +03:00
Tom Lane ca4af308c3 Simplify handling of the timezone GUC by making initdb choose the default.
We were doing some amazingly complicated things in order to avoid running
the very expensive identify_system_timezone() procedure during GUC
initialization.  But there is an obvious fix for that, which is to do it
once during initdb and have initdb install the system-specific default into
postgresql.conf, as it already does for most other GUC variables that need
system-environment-dependent defaults.  This means that the timezone (and
log_timezone) settings no longer have any magic behavior in the server.
Per discussion.
2011-09-09 17:59:11 -04:00
Tom Lane a7801b62f2 Move Timestamp/Interval typedefs and basic macros into datatype/timestamp.h.
As per my recent proposal, this refactors things so that these typedefs and
macros are available in a header that can be included in frontend-ish code.
I also changed various headers that were undesirably including
utils/timestamp.h to include datatype/timestamp.h instead.  Unsurprisingly,
this showed that half the system was getting utils/timestamp.h by way of
xlog.h.

No actual code changes here, just header refactoring.
2011-09-09 13:23:41 -04:00
Tom Lane d63de337f3 round() is not portable. Use rint(). 2011-09-08 16:38:24 -04:00
Alvaro Herrera 295e7dc929 Tweak string for uniformity 2011-09-08 16:39:58 -03:00
Heikki Linnakangas 5edb24a898 Buffering GiST index build algorithm.
When building a GiST index that doesn't fit in cache, buffers are attached
to some internal nodes in the index. This speeds up the build by avoiding
random I/O that would otherwise be needed to traverse all the way down the
tree to the find right leaf page for tuple.

Alexander Korotkov
2011-09-08 17:51:23 +03:00
Tom Lane f0bedf3e45 Fix corner case bug in numeric to_char().
Trailing-zero stripping applied by the FM specifier could strip zeroes
to the left of the decimal point, for a format with no digit positions
after the decimal point (such as "FM999.").

Reported and diagnosed by Marti Raudsepp, though I didn't use his patch.
2011-09-07 17:07:20 -04:00
Tom Lane 99155aaa33 Fix typo in error message.
Per Euler Taveira de Oliveira.
2011-09-07 13:29:26 -04:00
Tom Lane a7d9203cc4 Fix get_name_for_var_field() to deal with RECORD Params.
With 9.1's use of Params to pass down values from NestLoop join nodes
to their inner plans, it is possible for a Param to have type RECORD, in
which case the set of fields comprising the value isn't determinable by
inspection of the Param alone.  However, just as with a Var of type RECORD,
we can find out what we need to know if we can locate the expression that
the Param represents.  We already knew how to do this in get_parameter(),
but I'd overlooked the need to be able to cope in get_name_for_var_field(),
which led to EXPLAIN failing with "record type has not been registered".

To fix, refactor the search code in get_parameter() so it can be used by
both functions.

Per report from Marti Raudsepp.
2011-09-07 13:01:36 -04:00
Bruce Momjian 029dfdf115 Fix to_date() and to_timestamp() to handle year masks of length < 4 so
they wrap toward year 2020, rather than the inconsistent behavior we had
before.
2011-09-07 09:47:51 -04:00
Simon Riggs df383b03e6 Partially revoke attempt to improve performance with many savepoints.
Maintain difference between subtransaction release and commit introduced
by earlier patch.
2011-09-07 12:11:26 +01:00
Simon Riggs dde70cc313 Emit cascaded standby message on shutdown only when appropriate.
Adds additional test for active walsenders and closes a race
condition for when we failover when a new walsender was connecting.

Reported and fixed bu Fujii Masao. Review by Heikki Linnakangas
2011-09-07 09:09:47 +01:00
Tom Lane db10f01baa Improve comment about handling of temp tables in shared-inval code. 2011-09-06 17:06:54 -04:00
Peter Eisentraut e6d800981e Correct ancient logic mistake in assertion
Found by gcc -Wlogical-op
2011-09-06 23:05:02 +03:00
Tom Lane 623f77e9d1 Avoid possibly accessing off the end of memory in SJIS2004 conversion.
The code in shift_jis_20042euc_jis_2004() would fetch two bytes even when
only one remained in the string.  Since conversion functions aren't
supposed to assume null-terminated input, this poses a small risk of
fetching past the end of memory and incurring SIGSEGV.  No such crash has
been identified in the field, but we've certainly seen the equivalent
happen in other code paths, so patch this one all the way back.

Report and patch by Noah Misch.
2011-09-06 14:50:28 -04:00
Tom Lane 780a342c90 Avoid possibly accessing off the end of memory in examine_attribute().
Since the last couple of columns of pg_type are often NULL,
sizeof(FormData_pg_type) can be an overestimate of the actual size of the
tuple data part.  Therefore memcpy'ing that much out of the catalog cache,
as analyze.c was doing, poses a small risk of copying past the end of
memory and incurring SIGSEGV.  No such crash has been identified in the
field, but we've certainly seen the equivalent happen in other code paths,
so patch this one all the way back.

Per valgrind testing by Noah Misch, though this is not his proposed patch.
I chose to use SearchSysCacheCopy1 rather than inventing special-purpose
infrastructure for copying only the minimal part of a pg_type tuple.
2011-09-06 14:37:22 -04:00
Bruce Momjian f458c90bff Add C comment about why we send cache invalidation messages for
session-local objects.
2011-09-05 22:09:02 -04:00
Alvaro Herrera 56a9ed92b6 Adjust translator comment format to xgettext expectations 2011-09-05 19:04:30 -03:00
Alvaro Herrera b64f18c583 Mark some untranslatable messages with errmsg_internal 2011-09-05 17:48:07 -03:00
Peter Eisentraut a2a5ce6826 Improve "invalid byte sequence for encoding" message
It used to say

ERROR:  invalid byte sequence for encoding "UTF8": 0xdb24

Change this to

ERROR:  invalid byte sequence for encoding "UTF8": 0xdb 0x24

to make it clear that this is a byte sequence and not a code point.

Also fix the adjacent "character has no equivalent" message that has
the same issue.
2011-09-05 23:38:27 +03:00
Tom Lane 4c2777d0b7 Change get_variable_numdistinct's API to flag default estimates explicitly.
Formerly, callers tested for DEFAULT_NUM_DISTINCT, which had the problem
that a perfectly solid estimate might be mistaken for a content-free
default.
2011-09-04 15:41:49 -04:00
Tom Lane 1cb108efb0 Dig down into sub-selects to look for column statistics.
If a sub-select's output column is a simple Var, recursively look for
statistics applying to that Var, and use them if available.  The need for
this was foreseen ages ago, but we didn't have enough infrastructure to do
it with reasonable speed until just now.

We punt and stick with default estimates if the subquery uses set
operations, GROUP BY, or DISTINCT, since those operations would change the
underlying column statistics (particularly, the relative frequencies of
different values) beyond recognition.  This means that the types of
sub-selects for which this improvement applies are fairly limited, since
most subqueries satisfying those restrictions would have gotten flattened
into the parent query anyway.  But it does help for some cases, such as
subqueries with ORDER BY or LIMIT.
2011-09-04 15:13:46 -04:00
Tom Lane 698df3350d Can't print PlannerGlobal's subroots list in outfuncs.
Since the subroots will surely link back to the same glob struct, this
necessarily leads to infinite recursion.  Doh.  Found while trying to
debug some other code.
2011-09-04 14:43:52 -04:00
Tom Lane 1609797c25 Clean up the #include mess a little.
walsender.h should depend on xlog.h, not vice versa.  (Actually, the
inclusion was circular until a couple hours ago, which was even sillier;
but Bruce broke it in the expedient rather than logically correct
direction.)  Because of that poor decision, plus blind application of
pgrminclude, we had a situation where half the system was depending on
xlog.h to include such unrelated stuff as array.h and guc.h.  Clean up
the header inclusion, and manually revert a lot of what pgrminclude had
done so things build again.

This episode reinforces my feeling that pgrminclude should not be run
without adult supervision.  Inclusion changes in header files in particular
need to be reviewed with great care.  More generally, it'd be good if we
had a clearer notion of module layering to dictate which headers can sanely
include which others ... but that's a big task for another day.
2011-09-04 01:13:16 -04:00
Tom Lane b3aaf9081a Rearrange planner to save the whole PlannerInfo (subroot) for a subquery.
Formerly, set_subquery_pathlist and other creators of plans for subqueries
saved only the rangetable and rowMarks lists from the lower-level
PlannerInfo.  But there's no reason not to remember the whole PlannerInfo,
and indeed this turns out to simplify matters in a number of places.

The immediate reason for doing this was so that the subroot will still be
accessible when we're trying to extract column statistics out of an
already-planned subquery.  But now that I've done it, it seems like a good
code-beautification effort in its own right.

I also chose to get rid of the transient subrtable and subrowmark fields in
SubqueryScan nodes, in favor of having setrefs.c look up the subquery's
RelOptInfo.  That required changing all the APIs in setrefs.c to pass
PlannerInfo not PlannerGlobal, which was a large but quite mechanical
transformation.

One side-effect not foreseen at the beginning is that this finally broke
inheritance_planner's assumption that replanning the same subquery RTE N
times would necessarily give interchangeable results each time.  That
assumption was always pretty risky, but now we really have to make a
separate RTE for each instance so that there's a place to carry the
separate subroots.
2011-09-03 15:36:24 -04:00
Peter Eisentraut 42ad992fdc Add archive_command example 2011-09-03 01:29:09 +03:00
Peter Eisentraut f1e4f3d44f Whitespace adjustment for consistency in the file 2011-09-03 01:28:05 +03:00
Tom Lane 5b562644fe Teach ANALYZE to clear pg_class.relhassubclass when appropriate.
In the past, relhassubclass always remained true if a relation had ever had
child relations, even if the last subclass was long gone.  While this had
only marginal performance implications in most cases, it was annoying, and
I'm now considering some planner changes that would raise the cost of a
false positive.  It was previously impractical to fix this because of race
condition concerns.  However, given the recent change that made tablecmds.c
take ShareExclusiveLock on relations that are gaining a child (commit
fbcf4b92aa), we can now allow ANALYZE to
clear the flag when it's no longer relevant.  There is no additional
locking cost to do so, since ANALYZE takes ShareExclusiveLock anyway.
2011-09-02 14:29:31 -04:00
Bruce Momjian 10af3ab2b2 Add C comment about needed include. 2011-09-01 12:53:45 -04:00
Tom Lane e5b012b788 Put back improperly removed #include. 2011-09-01 11:57:46 -04:00
Bruce Momjian 6416a82a62 Remove unnecessary #include references, per pgrminclude script. 2011-09-01 10:04:27 -04:00
Heikki Linnakangas a88b6e4cfb setlocale() on Windows doesn't work correctly if the locale name contains
dots. I previously worked around this in initdb, mapping the known
problematic locale names to aliases that work, but Hiroshi Inoue pointed
out that that's not enough because even if you use one of the aliases, like
"Chinese_HKG", setlocale(LC_CTYPE, NULL) returns back the long form, ie.
"Chinese_Hong Kong S.A.R.". When we try to restore an old locale value by
passing that value back to setlocale(), it fails. Note that you are affected
by this bug also if you use one of those short-form names manually, so just
reverting the hack in initdb won't fix it.

To work around that, move the locale name mapping from initdb to a wrapper
around setlocale(), so that the mapping is invoked on every setlocale() call.

Also, add a few checks for failed setlocale() calls in the backend. These
calls shouldn't fail, and if they do there isn't much we can do about it,
but at least you'll get a warning.

Backpatch to 9.1, where the initdb hack was introduced. The Windows bug
affects older versions too if you set locale manually to one of the aliases,
but given the lack of complaints from the field, I'm hesitent to backpatch.
2011-09-01 11:08:32 +03:00
Tom Lane 0d3b231eeb Further repair of eqjoinsel ndistinct-clamping logic.
Examination of examples provided by Mark Kirkwood and others has convinced
me that actually commit 7f3eba30c9 was quite
a few bricks shy of a load.  The useful part of that patch was clamping
ndistinct for the inner side of a semi or anti join, and the reason why
that's needed is that it's the only way that restriction clauses
eliminating rows from the inner relation can affect the estimated size of
the join result.  I had not clearly understood why the clamping was
appropriate, and so mis-extrapolated to conclude that we should clamp
ndistinct for the outer side too, as well as for both sides of regular
joins.  These latter actions were all wrong, and are reverted with this
patch.  In addition, the clamping logic is now made to affect the behavior
of both paths in eqjoinsel_semi, with or without MCV lists to compare.
When we have MCVs, we suppose that the most common values are the ones
that are most likely to survive the decimation resulting from a lower
restriction clause, so we think of the clamping as eliminating non-MCV
values, or potentially even the least-common MCVs for the inner relation.

Back-patch to 8.4, same as previous fixes in this area.
2011-09-01 00:19:38 -04:00
Tom Lane 97930cf578 Improve eqjoinsel's ndistinct clamping to work for multiple levels of join.
This patch fixes an oversight in my commit
7f3eba30c9 of 2008-10-23.  That patch
accounted for baserel restriction clauses that reduced the number of rows
coming out of a table (and hence the number of possibly-distinct values of
a join variable), but not for join restriction clauses that might have been
applied at a lower level of join.  To account for the latter, look up the
sizes of the min_lefthand and min_righthand inputs of the current join,
and clamp with those in the same way as for the base relations.

Noted while investigating a complaint from Ben Chobot, although this in
itself doesn't seem to explain his report.

Back-patch to 8.4; previous versions used different estimation methods
for which this heuristic isn't relevant.
2011-08-31 16:05:43 -04:00
Tom Lane 5bba65de94 Fix a missed case in code for "moving average" estimate of reltuples.
It is possible for VACUUM to scan no pages at all, if the visibility map
shows that all pages are all-visible.  In this situation VACUUM has no new
information to report about the relation's tuple density, so it wasn't
changing pg_class.reltuples ... but it updated pg_class.relpages anyway.
That's wrong in general, since there is no evidence to justify changing the
density ratio reltuples/relpages, but it's particularly bad if the previous
state was relpages=reltuples=0, which means "unknown tuple density".
We just replaced "unknown" with "zero".  ANALYZE would eventually recover
from this, but it could take a lot of repetitions of ANALYZE to do so if
the relation size is much larger than the maximum number of pages ANALYZE
will scan, because of the moving-average behavior introduced by commit
b4b6923e03.

The only known situation where we could have relpages=reltuples=0 and yet
the visibility map asserts everything's visible is immediately following
a pg_upgrade.  It might be advisable for pg_upgrade to try to preserve the
relpages/reltuples statistics; but in any case this code is wrong on its
own terms, so fix it.  Per report from Sergey Koposov.

Back-patch to 8.4, where the visibility map was introduced, same as the
previous change.
2011-08-30 14:51:38 -04:00
Robert Haas 8a3d33c8e6 Fix parsing of time string followed by yesterday/today/tomorrow.
Previously, 'yesterday 04:00:00'::timestamp didn't do the same thing as
'04:00:00 yesterday'::timestamp, and the return value from the latter
was midnight rather than the specified time.

Dean Rasheed, with some stylistic changes
2011-08-30 11:38:42 -04:00
Robert Haas eab2ef6164 Remove some tabs from README file.
Some of the ASCII art expected 8-space tab stops, and some of it
expected 4-space tab stops.

Per report from YAMAMOTO Takashi.
2011-08-29 22:26:29 -04:00
Tom Lane a5b7640ba0 Fix concat_ws() to not insert a separator after leading NULL argument(s).
Per bug #6181 from Itagaki Takahiro.  Also do some marginal code cleanup
and improve error handling.
2011-08-29 15:20:57 -04:00
Robert Haas c01c25fbe5 Improve spinlock performance for HP-UX, ia64, non-gcc.
At least on this architecture, it's very important to spin on a
non-atomic instruction and only retry the atomic once it appears
that it will succeed.  To fix this, split TAS() into two macros:
TAS(), for trying to grab the lock the first time, and TAS_SPIN(),
for spinning until we get it.  TAS_SPIN() defaults to same as TAS(),
but we can override it when we know there's a better way.

It's likely that some of the other cases in s_lock.h require
similar treatment, but this is the only one we've got conclusive
evidence for at present.
2011-08-29 10:05:48 -04:00
Bruce Momjian 4bd7333b14 Allow more include files to be compiled in their own by adding missing
include dependencies.

Modify pgcompinclude to skip a common fcinfo error.
2011-08-27 11:05:33 -04:00
Peter Eisentraut fd5b397ca4 Implement the information schema with_hierarchy column
In PostgreSQL, this is included in the SELECT privilege, so show YES
or NO depending on whether SELECT is granted.
2011-08-27 15:03:02 +03:00
Bruce Momjian f261deb4b4 Add missing includes after pgrminclude run. 2011-08-26 18:15:14 -04:00
Bruce Momjian f8fc37b337 Add markers for skips. 2011-08-26 18:15:13 -04:00
Tom Lane 00eb036c11 Fix potential memory clobber in tsvector_concat().
tsvector_concat() allocated its result workspace using the "conservative"
estimate of the sum of the two input tsvectors' sizes.  Unfortunately that
wasn't so conservative as all that, because it supposed that the number of
pad bytes required could not grow.  Which it can, as per test case from
Jesper Krogh, if there's a mix of lexemes with positions and lexemes
without them in the input data.  The fix is to assume that we might add
a not-previously-present pad byte for each and every lexeme in the two
inputs; which really is conservative, but it doesn't seem worthwhile to
try to be more precise.

This is an aboriginal bug in tsvector_concat, so back-patch to all
versions containing it.
2011-08-26 16:51:34 -04:00
Tom Lane ecf248737a Add makefile rules to check for backtracking in backend and psql lexers.
Per discussion, we should enforce the policy of "no backtracking" in these
performance-sensitive scanners.
2011-08-25 14:44:17 -04:00
Tom Lane 2e95f1f002 Add "%option warn" to all flex input files that lacked it.
This is recommended in the flex manual, and there seems no good reason
not to use it everywhere.
2011-08-25 13:55:57 -04:00
Robert Haas 48bc57657d Tweak postgresql.conf.sample's comments on listen_addresess.
This makes it slightly more clear that '*' is not part of the default
value, in case that wasn't obvious.

As requested by Dougal Sutherland.
2011-08-25 09:41:24 -04:00
Tom Lane cb5c2ba2d8 Fix multiple bugs in extension dropping.
When we implemented extensions, we made findDependentObjects() treat
EXTENSION dependency links similarly to INTERNAL links.  However, that
logic contained an implicit assumption that an object could have at most
one INTERNAL dependency, so it did not work correctly for objects having
both INTERNAL and DEPENDENCY links.  This led to failure to drop some
extension member objects when dropping the extension.  Furthermore, we'd
never actually exercised the case of recursing to an internally-referenced
(owning) object from anything other than a NORMAL dependency, and it turns
out that passing the incoming dependency's flags to the owning object is
the Wrong Thing.  This led to sometimes dropping a whole extension silently
when we should have rejected the drop command for lack of CASCADE.

Since we obviously were under-testing extension drop scenarios, add some
regression test cases.  Unfortunately, such test cases require some
extensions (duh), so we can't test for problems in the core regression
tests.  I chose to add them to the earthdistance contrib module, which is
a good test case because it has a dependency on the cube contrib module.

Back-patch to 9.1.  Arguably these are pre-existing bugs in INTERNAL
dependency handling, but since it appears that the cases can never arise
pre-9.1, I'll refrain from back-patching the logic changes further than
that.
2011-08-24 13:09:06 -04:00
Tom Lane d4aa491493 Make CREATE EXTENSION check schema creation permissions.
When creating a new schema for a non-relocatable extension, we neglected
to check whether the calling user has permission to create schemas.
That didn't matter in the original coding, since we had already checked
superuserness, but in the new dispensation where users need not be
superusers, we should check it.  Use CreateSchemaCommand() rather than
calling NamespaceCreate() directly, so that we also enforce the rules
about reserved schema names.

Per complaint from KaiGai Kohei, though this isn't the same as his patch.
2011-08-23 21:49:07 -04:00
Tom Lane 43f0c20839 Fix overoptimistic assumptions in column width estimation for subqueries.
set_append_rel_pathlist supposed that, while computing per-column width
estimates for the appendrel, it could ignore child rels for which the
translated reltargetlist entry wasn't a Var.  This gave rise to completely
silly estimates in some common cases, such as constant outputs from some or
all of the arms of a UNION ALL.  Instead, fall back on get_typavgwidth to
estimate from the value's datatype; which might be a poor estimate but at
least it's not completely wacko.

That problem was exposed by an Assert in set_subquery_size_estimates, which
unfortunately was still overoptimistic even with that fix, since we don't
compute attr_widths estimates for appendrels that are entirely excluded by
constraints.  So remove the Assert; we'll just fall back on get_typavgwidth
in such cases.

Also, since set_subquery_size_estimates calls set_baserel_size_estimates
which calls set_rel_width, there's no need for set_subquery_size_estimates
to call get_typavgwidth; set_rel_width will handle it for us if we just
leave the estimate set to zero.  Remove the unnecessary code.

Per report from Erik Rijkers and subsequent investigation.
2011-08-23 17:13:12 -04:00
Peter Eisentraut 1af55e2751 Use consistent format for reporting GetLastError()
Use something like "error code %lu" for reporting GetLastError()
values on Windows.  Previously, a mix of different wordings and
formats were in use.
2011-08-23 22:00:52 +03:00
Robert Haas 7488936478 Typo fix. 2011-08-22 12:16:27 -04:00
Tom Lane 660a081c3f Fix handling of extension membership when filling in a shell operator.
The previous coding would result in deleting and not re-creating the
extension membership pg_depend rows, since there was no
CommandCounterIncrement that would allow recordDependencyOnCurrentExtension
to see that the deletion had happened.  Make it work like the shell type
case, ie, keep the existing entries (and then throw an error if they're for
the wrong extension).

Per bug #6172 from Hitoshi Harada.  Investigation and fix by Dimitri
Fontaine.
2011-08-22 10:55:47 -04:00
Tom Lane b33f78df17 Fix trigger WHEN conditions when both BEFORE and AFTER triggers exist.
Due to tuple-slot mismanagement, evaluation of WHEN conditions for AFTER
ROW UPDATE triggers could crash if there had been a BEFORE ROW trigger
fired for the same update.  Fix by not trying to overload the use of
estate->es_trig_tuple_slot.  Per report from Yoran Heling.

Back-patch to 9.0, when trigger WHEN conditions were introduced.
2011-08-21 18:15:55 -04:00
Tom Lane 08e1eedf24 Fix performance problem when building a lossy tidbitmap.
As pointed out by Sergey Koposov, repeated invocations of tbm_lossify can
make building a large tidbitmap into an O(N^2) operation.  To fix, make
sure we remove more than the minimum amount of information per call, and
add a fallback path to behave sanely if we're unable to fit the bitmap
within the requested amount of memory.

This has been wrong since the tidbitmap code was written, so back-patch
to all supported branches.
2011-08-20 14:51:02 -04:00
Robert Haas 0f7acbeddf Make lazy_vacuum_rel call pg_rusage_init only if needed.
do_analyze_rel already does it this way.

Euler Taveira de Oliveira
2011-08-18 09:55:04 -04:00
Robert Haas 24bf1552f6 Remove obsolete README file.
Perhaps we ought to add some other kind of documentation here instead,
but for now let's get rid of this woefully obsolete description of the
sinval machinery.
2011-08-18 09:49:41 -04:00
Peter Eisentraut 1bf80041e3 Translation updates 2011-08-17 14:07:46 +03:00
Heikki Linnakangas 1d0392b245 Fix comment about which version had BACKUP METHOD line in backup_lable, again.
It was invalidated again by Fujii's patch to 9.1.
2011-08-17 12:31:23 +03:00
Tom Lane b5282aa893 Revise sinval code to remove no-longer-used tuple TID from inval messages.
This requires adjusting the API for syscache callback functions: they now
get a hash value, not a TID, to identify the target tuple.  Most of them
weren't paying any attention to that argument anyway, but plancache did
require a small amount of fixing.

Also, improve performance a trifle by avoiding sending duplicate inval
messages when a heap_update isn't changing the catcache lookup columns.
2011-08-16 19:27:46 -04:00
Tom Lane 632ae6829f Forget about targeting catalog cache invalidations by tuple TID.
The TID isn't stable enough: we might queue an sinval event before a VACUUM
FULL, and then process it afterwards, when the target tuple no longer has
the same TID.  So we must invalidate entries on the basis of hash value
only.  The old coding can be shown to result in various bizarre,
hard-to-reproduce errors in the presence of concurrent VACUUM FULLs on
system catalogs, and could easily result in permanent catalog corruption,
up to and including complete loss of tables.

This commit is just a minimal fix that removes the unsafe comparison.
We should remove transmission of the tuple TID from sinval messages
altogether, and then arrange to suppress the extra message in the common
case of a heap_update that doesn't change the key hashvalue.  But that's
going to be much more invasive, and will only produce a probably-marginal
performance gain, so it doesn't seem like material for a back-patch.

Back-patch to 9.0.  Before that, VACUUM FULL refused to do any tuple moving
if it found any INSERT_IN_PROGRESS or DELETE_IN_PROGRESS tuples (and
CLUSTER would give up altogether), so there was no risk of moving a tuple
that might be the subject of an unsent sinval message.
2011-08-16 15:26:22 -04:00
Tom Lane f4d7f1adba Fix incorrect order of operations during sinval reset processing.
We have to be sure that we have revalidated each nailed-in-cache relcache
entry before we try to use it to load data for some other relcache entry.
The introduction of "mapped relations" in 9.0 broke this, because although
we updated the state kept in relmapper.c early enough, we failed to
propagate that information into relcache entries soon enough; in
particular, we could try to fetch pg_class rows out of pg_class before
we'd updated its relcache entry's rd_node.relNode value from the map.

This bug accounts for Dave Gould's report of failures after "vacuum full
pg_class", and I believe that there is risk for other system catalogs
as well.

The core part of the fix is to copy relmapper data into the relcache
entries during "phase 1" in RelationCacheInvalidate(), before they'll be
used in "phase 2".  To try to future-proof the code against other similar
bugs, I also rearranged the order in which nailed relations are visited
during phase 2: now it's pg_class first, then pg_class_oid_index, then
other nailed relations.  This should ensure that RelationClearRelation can
apply RelationReloadIndexInfo to all nailed indexes without risking use
of not-yet-revalidated relcache entries.

Back-patch to 9.0 where the relation mapper was introduced.
2011-08-16 14:38:20 -04:00
Tom Lane 7b0d0e9356 Preserve toast value OIDs in toast-swap-by-content for CLUSTER/VACUUM FULL.
This works around the problem that a catalog cache entry might contain a
toast pointer that we try to dereference just as a VACUUM FULL completes
on that catalog.  We will see the sinval message on the cache entry when
we acquire lock on the toast table, but by that point we've already told
tuptoaster.c "here's the pointer to fetch", so it's difficult from a code
structural standpoint to update the pointer before we use it.  Much less
painful to ensure that toast pointers are not invalidated in the first
place.  We have to add a bit of code to deal with the case that a value
that previously wasn't toasted becomes so; but that should be a
seldom-exercised corner case, so the inefficiency shouldn't be significant.

Back-patch to 9.0.  In prior versions, we didn't allow CLUSTER on system
catalogs, and VACUUM FULL didn't result in reassignment of toast OIDs, so
there was no problem.
2011-08-16 13:48:04 -04:00
Tom Lane 2ada6779c5 Fix race condition in relcache init file invalidation.
The previous code tried to synchronize by unlinking the init file twice,
but that doesn't actually work: it leaves a window wherein a third process
could read the already-stale init file but miss the SI messages that would
tell it the data is stale.  The result would be bizarre failures in catalog
accesses, typically "could not read block 0 in file ..." later during
startup.

Instead, hold RelCacheInitLock across both the unlink and the sending of
the SI messages.  This is more straightforward, and might even be a bit
faster since only one unlink call is needed.

This has been wrong since it was put in (in 2002!), so back-patch to all
supported releases.
2011-08-16 13:11:54 -04:00
Heikki Linnakangas 2877c67bc2 Fix bogus comment that claimed that the new BACKUP METHOD line in
backup_label was new in 9.0. Spotted by Fujii Masao.
2011-08-16 12:23:51 +03:00
Peter Eisentraut e5475a80d2 Add "Reason code" prefix to internal SSI error messages
This makes it clearer that the error message is perhaps not supposed
to be understood by users, and it also makes it somewhat clearer that
it was not accidentally omitted from translation.

Idea from Heikki Linnakangas, except that we don't mark "Reason code"
for translation at this point, because that would make the
implementation too cumbersome.
2011-08-15 15:20:16 +03:00
Tom Lane 52994e9e56 Fix unsafe order of operations in foreign-table DDL commands.
When updating or deleting a system catalog tuple, it's necessary to acquire
RowExclusiveLock on the catalog before looking up the tuple; otherwise a
concurrent VACUUM FULL on the catalog might move the tuple to a different
TID before we can apply the update.  Coding patterns that find the tuple
via a table scan aren't at risk here, but when obtaining the tuple from a
catalog cache, correct ordering is important; and several routines in
foreigncmds.c got it wrong.  Noted while running the regression tests in
parallel with VACUUM FULL of assorted system catalogs.

For consistency I moved all the heap_open calls to the starts of their
functions, including a couple for which there was no actual bug.

Back-patch to 8.4 where foreigncmds.c was added.
2011-08-14 15:40:21 -04:00
Tom Lane 592b615d71 Fix incorrect timeout handling during initial authentication transaction.
The statement start timestamp was not set before initiating the transaction
that is used to look up client authentication information in pg_authid.
In consequence, enable_sig_alarm computed a wrong value (far in the past)
for statement_fin_time.  That didn't have any immediate effect, because the
timeout alarm was set without reference to statement_fin_time; but if we
subsequently blocked on a lock for a short time, CheckStatementTimeout
would consult the bogus value when we cancelled the lock timeout wait,
and then conclude we'd timed out, leading to immediate failure of the
connection attempt.  Thus an innocent "vacuum full pg_authid" would cause
failures of concurrent connection attempts.  Noted while testing other,
more serious consequences of vacuum full on system catalogs.

We should set the statement timestamp before StartTransactionCommand(),
so that the transaction start timestamp is also valid.  I'm not sure if
there are any non-cosmetic effects of it not being valid, but the xact
timestamp is at least sent to the statistics machinery.

Back-patch to 9.0.  Before that, the client authentication timeout was done
outside any transaction and did not depend on this state to be valid.
2011-08-13 17:52:24 -04:00
Tom Lane a180776f7a Teach unix_latch.c to use poll() where available.
poll() is preferred over select() on platforms where both are available,
because it tends to be a bit faster and it doesn't have an arbitrary limit
on the range of FD numbers that can be accessed.  The FD range limit does
not appear to be a risk factor for any 9.1 usages, so this doesn't need to
be back-patched, but we need to have it in place if we keep on expanding
the uses of WaitLatch.
2011-08-11 12:50:22 -04:00
Robert Haas 5057366eed Unbreak legacy syntax "COMMENT ON RULE x IS y", with no relation name.
check_object_ownership() isn't happy about the null relation pointer.
We could fix it there, but this seems more future-proof.
2011-08-11 11:23:51 -04:00
Tom Lane cff75130b5 Remove wal_sender_delay GUC, because it's no longer useful.
The latch infrastructure is now capable of detecting all cases where the
walsender loop needs to wake up, so there is no reason to have an arbitrary
timeout.

Also, modify the walsender loop logic to follow the standard pattern of
ResetLatch, test for work to do, WaitLatch.  The previous coding was both
hard to follow and buggy: it would sometimes busy-loop despite having
nothing available to do, eg between receipt of a signal and the next time
it was caught up with new WAL, and it also had interesting choices like
deciding to update to WALSNDSTATE_STREAMING on the strength of information
known to be obsolete.
2011-08-10 18:50:28 -04:00
Tom Lane 79b2ee20c8 Add a bit of debug logging to backend_read_statsfile().
This is in hopes of learning more about what causes "pgstat wait timeout"
warnings in the buildfarm.  This patch should probably be reverted once
we've learned what we can.  As coded, it will result in regression test
"failures" at half the delay that the existing code does, so I expect
to see a few more than before.
2011-08-10 16:45:43 -04:00
Tom Lane 4dab3d5ae1 Change the autovacuum launcher to use WaitLatch instead of a poll loop.
In pursuit of this (and with the expectation that WaitLatch will be needed
in more places), convert the latch field that was already added to PGPROC
for sync rep into a generic latch that is activated for all PGPROC-owning
processes, and change many of the standard backend signal handlers to set
that latch when a signal happens.  This will allow WaitLatch callers to be
wakened properly by these signals.

In passing, fix a whole bunch of signal handlers that had been hacked to do
things that might change errno, without adding the necessary save/restore
logic for errno.  Also make some minor fixes in unix_latch.c, and clean
up bizarre and unsafe scheme for disowning the process's latch.  Much of
this has to be back-patched into 9.1.

Peter Geoghegan, with additional work by Tom
2011-08-10 12:22:21 -04:00
Heikki Linnakangas 41f9ffd928 If backup-end record is not seen, and we reach end of recovery from a
streamed backup, throw an error and refuse to start up. The restore has not
finished correctly in that case and the data directory is possibly corrupt.
We already errored out in case of archive recovery, but could not during
crash recovery because we couldn't distinguish between the case that
pg_start_backup() was called and the database then crashed (must not error,
data is OK), and the case that we're restoring from a backup and not all
the needed WAL was replayed (data can be corrupt).

To distinguish those cases, add a line to backup_label to indicate
whether the backup was taken with pg_start/stop_backup(), or by streaming
(ie. pg_basebackup).

This requires re-initdb, because of a new field added to the control file.
2011-08-10 09:22:49 +03:00
Tom Lane 9f17ffd866 Measure WaitLatch's timeout parameter in milliseconds, not microseconds.
The original definition had the problem that timeouts exceeding about 2100
seconds couldn't be specified on 32-bit machines.  Milliseconds seem like
sufficient resolution, and finer grain than that would be fantasy anyway
on many platforms.

Back-patch to 9.1 so that this aspect of the latch API won't change between
9.1 and later releases.

Peter Geoghegan
2011-08-09 18:52:29 -04:00
Tom Lane 4e15a4db5e Documentation improvement and minor code cleanups for the latch facility.
Improve the documentation around weak-memory-ordering risks, and do a pass
of general editorialization on the comments in the latch code.  Make the
Windows latch code more like the Unix latch code where feasible; in
particular provide the same Assert checks in both implementations.
Fix poorly-placed WaitLatch call in syncrep.c.

This patch resolves, for the moment, concerns around weak-memory-ordering
bugs in latch-related code: we have documented the restrictions and checked
that existing calls meet them.  In 9.2 I hope that we will install suitable
memory barrier instructions in SetLatch/ResetLatch, so that their callers
don't need to be quite so careful.
2011-08-09 15:30:45 -04:00
Tom Lane cff60f2dfa Avoid creating PlaceHolderVars immediately within PlaceHolderVars.
Such a construction is useless since the lower PlaceHolderVar is already
nullable; no need to make it more so.  Noted while pursuing bug #6154.

This is just a minor planner efficiency improvement, since the final plan
will come out the same anyway after PHVs are flattened.  So not worth the
risk of back-patching.
2011-08-09 11:34:20 -04:00
Peter Eisentraut f4a9da0a15 Use clearer notation for getnameinfo() return handling
Writing

    if (getnameinfo(...))
        handle_error();

reads quite strangely, so use something like

    if (getnameinfo(...) != 0)
        handle_error();

instead.
2011-08-09 18:30:32 +03:00
Heikki Linnakangas 77949a2913 Change the way string relopts are allocated.
Don't try to allocate the default value for a string relopt in the same
palloc chunk as the relopt_string struct. That didn't work too well if you
added a built-in string relopt in the stringRelOpts array, as it's not
possible to have an initializer for a variable length struct in C. This
makes the code slightly simpler too.

While we're at it, move the call to validator function in
add_string_reloption to before the allocation, so that if someone does pass
a bogus default value, we don't leak memory.
2011-08-09 15:25:44 +03:00
Heikki Linnakangas 5b6c8436d7 Fix grammar and spelling in log message. 2011-08-09 11:45:25 +03:00
Tom Lane 77ba232564 Fix nested PlaceHolderVar expressions that appear only in targetlists.
A PlaceHolderVar's expression might contain another, lower-level
PlaceHolderVar.  If the outer PlaceHolderVar is used, the inner one
certainly will be also, and so we have to make sure that both of them get
into the placeholder_list with correct ph_may_need values during the
initial pre-scan of the query (before deconstruct_jointree starts).
We did this correctly for PlaceHolderVars appearing in the query quals,
but overlooked the issue for those appearing in the top-level targetlist;
with the result that nested placeholders referenced only in the targetlist
did not work correctly, as illustrated in bug #6154.

While at it, add some error checking to find_placeholder_info to ensure
that we don't try to create new placeholders after it's too late to do so;
they have to all be created before deconstruct_jointree starts.

Back-patch to 8.4 where the PlaceHolderVar mechanism was introduced.
2011-08-09 00:50:07 -04:00
Tom Lane 05e8396892 Clean up ill-advised attempt to invent a private set of Node tags.
Somebody thought it'd be cute to invent a set of Node tag numbers that were
defined independently of, and indeed conflicting with, the main tag-number
list.  While this accidentally failed to fail so far, it would certainly
lead to trouble as soon as anyone wanted to, say, apply copyObject to these
node types.  Clang was already complaining about the use of makeNode on
these tags, and I think quite rightly so.  Fix by pushing these node
definitions into the mainstream, including putting replnodes.h where it
belongs.
2011-08-06 14:53:49 -04:00
Tom Lane 375aa7b393 Reduce PG_SYSLOG_LIMIT to 900 bytes.
The previous limit of 1024 was set on the assumption that all modern syslog
implementations have line length limits of 2KB or so.  However, this is
false, as at least Solaris and sysklogd truncate at only 1KB.  900 seems
to leave enough room for the max likely length of the tacked-on prefixes,
so let's go with that.

As with the previous change, it doesn't seem wise to back-patch this into
already-released branches; but it should be OK to sneak it into 9.1.

Noah Misch
2011-08-05 21:02:31 -04:00
Robert Haas c4096c7639 Allow per-column foreign data wrapper options.
Shigeru Hanada, with fairly minor editing by me.
2011-08-05 13:24:03 -04:00
Robert Haas 84e3712677 Create VXID locks "lazily" in the main lock table.
Instead of entering them on transaction startup, we materialize them
only when someone wants to wait, which will occur only during CREATE
INDEX CONCURRENTLY.  In Hot Standby mode, the startup process must also
be able to probe for conflicting VXID locks, but the lock need never be
fully materialized, because the startup process does not use the normal
lock wait mechanism.  Since most VXID locks never need to touch the
lock manager partition locks, this can significantly reduce blocking
contention on read-heavy workloads.

Patch by me.  Review by Jeff Davis.
2011-08-04 12:38:33 -04:00
Robert Haas 4af43ee3f1 Make pgbench use erand48() rather than random().
glibc renders random() thread-safe by wrapping a futex lock around it;
testing reveals that this limits the performance of pgbench on machines
with many CPU cores.  Rather than switching to random_r(), which is
only available on GNU systems and crashes unless you use undocumented
alchemy to initialize the random state properly, switch to our built-in
implementation of erand48(), which is both thread-safe and concurrent.

Since the list of reasons not to use the operating system's erand48()
is getting rather long, rename ours to pg_erand48() (and similarly
for our implementations of lrand48() and srand48()) and just always
use those.  We were already doing this on Cygwin anyway, and the
glibc implementation is not quite thread-safe, so pgbench wouldn't
be able to use that either.

Per discussion with Tom Lane.
2011-08-03 16:26:40 -04:00
Tom Lane ac36e6f71f Move CheckRecoveryConflictDeadlock() call to a safer place.
This kluge was inserted in a spot apparently chosen at random: the lock
manager's state is not yet fully set up for the wait, and in particular
LockWaitCancel hasn't been armed by setting lockAwaited, so the ProcLock
will not get cleaned up if the ereport is thrown.  This seems to not cause
any observable problem in trivial test cases, because LockReleaseAll will
silently clean up the debris; but I was able to cause failures with tests
involving subtransactions.

Fixes breakage induced by commit c85c941470.
Back-patch to all affected branches.
2011-08-02 15:16:29 -04:00
Tom Lane 2e53bd5517 Fix incorrect initialization of ProcGlobal->startupBufferPinWaitBufId.
It was initialized in the wrong place and to the wrong value.  With bad
luck this could result in incorrect query-cancellation failures in hot
standby sessions, should a HS backend be holding pin on buffer number 1
while trying to acquire a lock.
2011-08-02 13:23:52 -04:00
Heikki Linnakangas 89df948ec2 Avoid integer overflow when LIMIT + OFFSET >= 2^63.
This fixes bug #6139 reported by Hitoshi Harada.
2011-08-02 10:47:17 +03:00
Robert Haas 85b436f7b1 Minor stylistic corrections. 2011-08-01 08:24:45 -04:00
Peter Eisentraut 8a0fa9cad9 Add host name resolution information to pg_hba.conf error messages
This is to be able to analyze issues with host names in pg_hba.conf.
2011-07-31 18:03:43 +03:00
Robert Haas b4fbe392f8 Reduce sinval synchronization overhead.
Testing shows that the overhead of acquiring and releasing
SInvalReadLock and msgNumLock on high-core count boxes can waste a lot
of CPU time and hurt performance.  This patch adds a per-backend flag
that allows us to skip all that locking in most cases.  Further
testing shows that this improves performance even when sinval traffic
is very high.

Patch by me.  Review and testing by Noah Misch.
2011-07-29 16:46:13 -04:00
Peter Eisentraut 0fe8150827 Minor message style adjustment 2011-07-27 23:54:46 +03:00
Tom Lane c1420fcf7d Check to see whether libxml2 handles error context the way we expect.
It turns out to be possible to link against a libxml2.so that does this
differently than the version we configured and built against, so we need
a runtime check to avoid bizarre behavior.  Per report from Bernd Helmle.
Patch by Florian Pflug.
2011-07-26 16:31:04 -04:00
Peter Eisentraut ce8d7bb644 Replace printf format %i by %d
They are identical, but the overwhelming majority of the code uses %d,
so standardize on that.
2011-07-26 22:54:29 +03:00
Andrew Dunstan 74e6d37276 Silence compiler warning about uninitialized variable.
It is set correctly on the only path that uses it, but the
compiler can't know that.
2011-07-25 19:37:17 -04:00
Tom Lane d0c23026b2 Use OpenSSL's SSL_MODE_ACCEPT_MOVING_WRITE_BUFFER flag.
This disables an entirely unnecessary "sanity check" that causes failures
in nonblocking mode, because OpenSSL complains if we move or compact the
write buffer.  The only actual requirement is that we not modify pending
data once we've attempted to send it, which we don't.  Per testing and
research by Martin Pihlak, though this fix is a lot simpler than his patch.

I put the same change into the backend, although it's less clear whether
it's necessary there.  We do use nonblock mode in some situations in
streaming replication, so seems best to keep the same behavior in the
backend as in libpq.

Back-patch to all supported releases.
2011-07-24 15:17:51 -04:00
Tom Lane 988cccc620 Rethink behavior of CREATE OR REPLACE during CREATE EXTENSION.
The original implementation simply did nothing when replacing an existing
object during CREATE EXTENSION.  The folly of this was exposed by a report
from Marc Munro: if the existing object belongs to another extension, we
are left in an inconsistent state.  We should insist that the object does
not belong to another extension, and then add it to the current extension
if not already a member.
2011-07-23 16:59:39 -04:00
Robert Haas 6f1be5a67a Unbreak unlogged tables.
I broke this in commit 5da79169d3, which
was obviously insufficiently well tested.  Add some regression tests
in the hope of making future slip-ups more likely to be noticed.
2011-07-22 16:15:43 -04:00
Tom Lane 0ce7676aa0 Make xpath() do something useful with XPath expressions that return scalars.
Previously, xpath() simply returned an empty array if the expression did
not yield a node set.  This is useless for expressions that return scalars,
such as one with name() at the top level.  Arrange to return the scalar
value as a single-element xml array, instead.  (String values will be
suitably escaped.)

This change will also cause xpath_exists() to return true, not false,
for such expressions.

Florian Pflug, reviewed by Radoslaw Smogura
2011-07-21 11:32:46 -04:00
Tom Lane aaf15e5c1c Ensure that xpath() escapes special characters in string values.
Without this it's possible for the output to not be legal XML, as
illustrated by the added regression test cases.

NB: this change will need to be called out as an incompatibility in the
9.2 release notes, since it's possible somebody was relying on the old
behavior, even though it's clearly wrong.

Florian Pflug, reviewed by Radoslaw Smogura
2011-07-20 18:44:35 -04:00
Robert Haas 463f2625a5 Support SECURITY LABEL on databases, tablespaces, and roles.
This requires a new shared catalog, pg_shseclabel.

Along the way, fix the security_label regression tests so that they
don't monkey with the labels of any pre-existing objects.  This is
unlikely to matter in practice, since only the label for the "dummy"
provider was being manipulated.  But this way still seems cleaner.

KaiGai Kohei, with fairly extensive hacking by me.
2011-07-20 13:18:24 -04:00
Tom Lane cacd42d62c Rewrite libxml error handling to be more robust.
libxml reports some errors (like invalid xmlns attributes) via the error
handler hook, but still returns a success indicator to the library caller.
This causes us to miss some errors that are important to report.  Since the
"generic" error handler hook doesn't know whether the message it's getting
is for an error, warning, or notice, stop using that and instead start
using the "structured" error handler hook, which gets enough information
to be useful.

While at it, arrange to save and restore the error handler hook setting in
each libxml-using function, rather than assuming we can set and forget the
hook.  This should improve the odds of working nicely with third-party
libraries that also use libxml.

In passing, volatile-ize some local variables that get modified within
PG_TRY blocks.  I noticed this while testing with an older gcc version
than I'd previously tried to compile xml.c with.

Florian Pflug and Tom Lane, with extensive review/testing by Noah Misch
2011-07-20 13:03:49 -04:00
Simon Riggs 7cb7122800 Remove O(N^2) performance issue with multiple SAVEPOINTs.
Subtransaction locks now released en masse at main commit, rather than
repeatedly re-scanning for locks as we ascend the nested transaction tree.
Split transaction state TBLOCK_SUBEND into two states, TBLOCK_SUBCOMMIT
and TBLOCK_SUBRELEASE to allow the commit path to be optimised using
the existing code in ResourceOwnerRelease() which appears to have been
intended for this usage, judging from comments therein.
2011-07-19 17:21:24 +01:00
Robert Haas 8e5ac74c12 Some refinement for the "fast path" lock patch.
1. In GetLockStatusData, avoid initializing instance before we've ensured
that the array is large enough.  Otherwise, if repalloc moves the block
around, we're hosed.

2. Add the word "Relation" to the name of some identifiers, to avoid
assuming that the fast-path mechanism will only ever apply to relations
(though these particular parts certainly will).  Some of the macros
could possibly use similar treatment, but the names are getting awfully
long already.

3. Add a missing word to comment in AtPrepare_Locks().
2011-07-19 12:10:15 -04:00
Robert Haas cdd61237d6 Remove superfluous variable.
Reported by Peter Eisentraut.
2011-07-19 10:30:26 -04:00
Simon Riggs 4bd8ed31b7 Introduce sending servers as new category for replication params
Fujii Masao
2011-07-19 08:59:55 +01:00
Peter Eisentraut 30f854537d Change debug message from ereport to elog 2011-07-19 07:50:10 +03:00
Simon Riggs 5286105800 Cascading replication feature for streaming log-based replication.
Standby servers can now have WALSender processes, which can work with
either WALReceiver or archive_commands to pass data. Fully updated
docs, including new conceptual terms of sending server, upstream and
downstream servers. WALSenders terminated when promote to master.

Fujii Masao, review, rework and doc rewrite by Simon Riggs
2011-07-19 03:40:03 +01:00
Tom Lane 3d4890c0c5 Add GET STACKED DIAGNOSTICS plpgsql command to retrieve exception info.
This is more SQL-spec-compliant, more easily extensible, and better
performing than the old method of inventing special variables.

Pavel Stehule, reviewed by Shigeru Hanada and David Wheeler
2011-07-18 14:47:18 -04:00
Robert Haas 367bc426a1 Avoid index rebuild for no-rewrite ALTER TABLE .. ALTER TYPE.
Noah Misch.  Review and minor cosmetic changes by me.
2011-07-18 11:04:43 -04:00
Robert Haas 3cba8999b3 Create a "fast path" for acquiring weak relation locks.
When an AccessShareLock, RowShareLock, or RowExclusiveLock is requested
on an unshared database relation, and we can verify that no conflicting
locks can possibly be present, record the lock in a per-backend queue,
stored within the PGPROC, rather than in the primary lock table.  This
eliminates a great deal of contention on the lock manager LWLocks.

This patch also refactors the interface between GetLockStatusData() and
pg_lock_status() to be a bit more abstract, so that we don't rely so
heavily on the lock manager's internal representation details.  The new
fast path lock structures don't have a LOCK or PROCLOCK structure to
return, so we mustn't depend on that for purposes of listing outstanding
locks.

Review by Jeff Davis.
2011-07-18 00:49:28 -04:00
Robert Haas b59d2fe497 Add pg_opfamily_is_visible.
We already have similar functions for many other object types, including
operator classes, so it seems like we should have this one, too.

Extracted from a larger patch by Josh Kupershmidt
2011-07-17 23:23:55 -04:00
Tom Lane 9473bb96d0 Further thoughts about temp_file_limit patch.
Move FileClose's decrement of temporary_files_size up, so that it will be
executed even if elog() throws an error.  This is reasonable since if the
unlink() fails, the fact the file is still there is not our fault, and we
are going to forget about it anyhow.  So we won't count it against
temp_file_limit anymore.

Update fileSize and temporary_files_size correctly in FileTruncate.
We probably don't have any places that truncate temp files, but fd.c
surely should not assume that.
2011-07-17 15:05:44 -04:00
Tom Lane 23e5b16c71 Add temp_file_limit GUC parameter to constrain temporary file space usage.
The limit is enforced against the total amount of temp file space used by
each session.

Mark Kirkwood, reviewed by Cédric Villemain and Tatsuo Ishii
2011-07-17 14:19:31 -04:00
Tom Lane 1bc16a9460 Improve make_subplanTargetList to avoid including Vars unnecessarily.
If a Var was used only in a GROUP BY expression, the previous
implementation would include the Var by itself (as well as the expression)
in the generated targetlist.  This wouldn't affect the efficiency of the
scan/join part of the plan at all, but it could result in passing
unnecessarily-wide rows through sorting and grouping steps.  It turns out
to take only a little more code, and not noticeably more time, to generate
a tlist without such redundancy, so let's do that.  Per a recent gripe from
HarmeekSingh Bedi.
2011-07-16 16:46:55 -04:00
Tom Lane 1af37ec96d Replace errdetail("%s", ...) with errdetail_internal("%s", ...).
There may be some other places where we should use errdetail_internal,
but they'll have to be evaluated case-by-case.  This commit just hits
a bunch of places where invoking gettext is obviously a waste of cycles.
2011-07-16 14:22:18 -04:00
Tom Lane 3ee7c8710d Use errdetail_internal() for SSI transaction cancellation details.
Per discussion, these seem too technical to be worth translating.

Kevin Grittner
2011-07-16 14:22:16 -04:00
Tom Lane ed7ed76712 Add an errdetail_internal() ereport auxiliary routine.
This function supports untranslated detail messages, in the same way that
errmsg_internal supports untranslated primary messages.  We've needed this
for some time IMO, but discussion of some cases in the SSI code provided
the impetus to actually add it.

Kevin Grittner, with minor adjustments by me
2011-07-16 14:22:15 -04:00
Magnus Hagander 0886dde5f8 Fix SSPI login when multiple roundtrips are required
This fixes SSPI login failures showing "The function
requested is not supported", often showing up when connecting
to localhost. The reason was not properly updating the SSPI
handle when multiple roundtrips were required to complete the
authentication sequence.

Report and analysis by Ahmed Shinwari, patch by Magnus Hagander
2011-07-16 19:58:53 +02:00
Peter Eisentraut bf3c585681 Set information_schema.tables.commit_action to null
The commit action of temporary tables is currently not cataloged, so
we can't easily show it.  The previous value was outdated from before
we had different commit actions.
2011-07-15 21:11:14 +03:00
Heikki Linnakangas 8d260911e8 Change the way the offset of downlink is stored in GISTInsertStack.
GISTInsertStack.childoffnum used to mean "offset of the downlink in this
node, pointing to the child node in the stack". It's now replaced with
downlinkoffnum, which means "offset of the downlink in the parent of this
node". gistFindPath() already used childoffnum with this new meaning, and
had an extra step at the end to pull all the childoffnum values down one
node in the stack, to adjust the stack for the meaning that childoffnum had
elsewhere. That's no longer required.

The reason to do this now is this new representation is more convenient for
the GiST fast build patch that Alexander Korotkov is working on.

While we're at it, replace the linked list used in gistFindPath with a
standard List, and make gistFindPath() static.

Alexander Korotkov, with some changes by me.
2011-07-15 12:18:30 +03:00
Heikki Linnakangas bc175eb805 Fix two ancient bugs in GiST code to re-find a parent after page split:
First, when following a right-link, we incorrectly marked the current page
as the parent of the right sibling. In reality, the parent of the right page
is the same as the parent of the current page (or some page to the right of
it, gistFindCorrectParent() will sort that out).

Secondly, when we follow a right-link, we must prepend, not append, the right
page to our list of pages to visit. That's because we assume that once we
hit a leaf page in the list, all the rest are leaf pages too, and give up.

To hit these bugs, you need concurrent actions and several unlucky accidents.
Another backend must split the root page, while you're in process of
splitting a lower-level page. Furthermore, while you scan the internal nodes
to re-find the parent, another backend needs to again split some more internal
pages. Even then, the bugs don't necessarily manifest as user-visible errors
or index corruption.

While we're at it, make the error reporting a bit better if gistFindPath()
fails to re-find the parent. It used to be an assertion, but an elog() seems
more appropriate.

Backpatch to all supported branches.
2011-07-15 11:05:12 +03:00
Tom Lane f3ff0433ab In planner, don't assume that empty parent tables aren't really empty.
There's a heuristic in estimate_rel_size() to clamp the minimum size
estimate for a table to 10 pages, unless we can see that vacuum or analyze
has been run (and set relpages to something nonzero, so this will always
happen for a table that's actually empty).  However, it would be better
not to do this for inheritance parent tables, which very commonly are
really empty and can be expected to stay that way.  Per discussion of a
recent pgsql-performance report from Anish Kejariwal.  Also prevent it
from happening for indexes (although this is more in the nature of
documentation, since CREATE INDEX normally initializes relpages to
something nonzero anyway).

Back-patch to 9.0, because the ability to collect statistics across a
whole inheritance tree has improved the planner's estimates to the point
where this relatively small error makes a significant difference.  In the
referenced report, merge or hash joins were incorrectly estimated as
cheaper than a nestloop with inner indexscan on the inherited table.
That was less likely before 9.0 because the lack of inherited stats would
have resulted in a default (and rather pessimistic) estimate of the cost
of a merge or hash join.
2011-07-14 17:30:57 -04:00
Peter Eisentraut f4678c205a Set information_schema.routines.is_udt_dependent to NO
It previously said YES, but that is incorrect.
2011-07-14 19:18:17 +03:00
Tom Lane 96f990e23b Update some comments to clarify who does what in targetlist creation.
No code changes; just avoid blaming query_planner for things it doesn't
really do.
2011-07-13 20:23:09 -04:00
Peter Eisentraut 0527a454ec Implement information schema interval_type columns
Also correct reporting of interval precision when field restrictions
are specified in the typmod.
2011-07-13 20:32:08 +03:00
Tom Lane c1d9579dd8 Avoid listing ungrouped Vars in the targetlist of Agg-underneath-Window.
Regular aggregate functions in combination with, or within the arguments
of, window functions are OK per spec; they have the semantics that the
aggregate output rows are computed and then we run the window functions
over that row set.  (Thus, this combination is not really useful unless
there's a GROUP BY so that more than one aggregate output row is possible.)
The case without GROUP BY could fail, as recently reported by Jeff Davis,
because sloppy construction of the Agg node's targetlist resulted in extra
references to possibly-ungrouped Vars appearing outside the aggregate
function calls themselves.  See the added regression test case for an
example.

Fixing this requires modifying the API of flatten_tlist and its underlying
function pull_var_clause.  I chose to make pull_var_clause's API for
aggregates identical to what it was already doing for placeholders, since
the useful behaviors turn out to be the same (error, report node as-is, or
recurse into it).  I also tightened the error checking in this area a bit:
if it was ever valid to see an uplevel Var, Aggref, or PlaceHolderVar here,
that was a long time ago, so complain instead of ignoring them.

Backpatch into 9.1.  The failure exists in 8.4 and 9.0 as well, but seeing
that it only occurs in a basically-useless corner case, it doesn't seem
worth the risks of changing a function API in a minor release.  There might
be third-party code using pull_var_clause.
2011-07-12 18:24:39 -04:00
Bruce Momjian afc9635c60 Add C comment that txid_current() assigns an XID if one is not already
assigned.
2011-07-11 20:33:07 -04:00
Peter Eisentraut 3315020a09 Fix and clarify information schema interval_precision fields
The fields were previously wrongly typed as character_data; change to
cardinal_number.  Update the documentation and the implementation to
show more clearly that this applies to a feature not available in
PostgreSQL, rather than just not yet being implemented in the
information schema.
2011-07-11 18:49:44 +03:00
Robert Haas 4240e429d0 Try to acquire relation locks in RangeVarGetRelid.
In the previous coding, we would look up a relation in RangeVarGetRelid,
lock the resulting OID, and then AcceptInvalidationMessages().  While
this was sufficient to ensure that we noticed any changes to the
relation definition before building the relcache entry, it didn't
handle the possibility that the name we looked up no longer referenced
the same OID.  This was particularly problematic in the case where a
table had been dropped and recreated: we'd latch on to the entry for
the old relation and fail later on.  Now, we acquire the relation lock
inside RangeVarGetRelid, and retry the name lookup if we notice that
invalidation messages have been processed meanwhile.  Many operations
that would previously have failed with an error in the presence of
concurrent DDL will now succeed.

There is a good deal of work remaining to be done here: many callers
of RangeVarGetRelid still pass NoLock for one reason or another.  In
addition, nothing in this patch guards against the possibility that
the meaning of an unqualified name might change due to the creation
of a relation in a schema earlier in the user's search path than the
one where it was previously found.  Furthermore, there's nothing at
all here to guard against similar race conditions for non-relations.
For all that, it's a start.

Noah Misch and Robert Haas
2011-07-08 22:19:30 -04:00
Tom Lane 9d522cb35d Fix another oversight in logging of changes in postgresql.conf settings.
We were using GetConfigOption to collect the old value of each setting,
overlooking the possibility that it didn't exist yet.  This does happen
in the case of adding a new entry within a custom variable class, as
exhibited in bug #6097 from Maxim Boguk.

To fix, add a missing_ok parameter to GetConfigOption, but only in 9.1
and HEAD --- it seems possible that some third-party code is using that
function, so changing its API in a minor release would cause problems.
In 9.0, create a near-duplicate function instead.
2011-07-08 17:02:58 -04:00
Heikki Linnakangas 89fd72cbf2 Introduce a pipe between postmaster and each backend, which can be used to
detect postmaster death. Postmaster keeps the write-end of the pipe open,
so when it dies, children get EOF in the read-end. That can conveniently
be waited for in select(), which allows eliminating some of the polling
loops that check for postmaster death. This patch doesn't yet change all
the loops to use the new mechanism, expect a follow-on patch to do that.

This changes the interface to WaitLatch, so that it takes as argument a
bitmask of events that it waits for. Possible events are latch set, timeout,
postmaster death, and socket becoming readable or writeable.

The pipe method behaves slightly differently from the kill() method
previously used in PostmasterIsAlive() in the case that postmaster has died,
but its parent has not yet read its exit code with waitpid(). The pipe
returns EOF as soon as the process dies, but kill() continues to return
true until waitpid() has been called (IOW while the process is a zombie).
Because of that, change PostmasterIsAlive() to use the pipe too, otherwise
WaitLatch() would return immediately with WL_POSTMASTER_DEATH, while
PostmasterIsAlive() would claim it's still alive. That could easily lead to
busy-waiting while postmaster is in zombie state.

Peter Geoghegan with further changes by me, reviewed by Fujii Masao and
Florian Pflug.
2011-07-08 18:44:07 +03:00
Heikki Linnakangas 9598afa3b0 Fix one overflow and one signedness error, caused by the patch to calculate
OLDSERXID_MAX_PAGE based on BLCKSZ. MSVC compiler warned about these.
2011-07-08 17:29:53 +03:00
Peter Eisentraut f05c65090a Message style improvements 2011-07-08 07:37:04 +03:00
Heikki Linnakangas bdaabb9b22 There's a small window wherein a transaction is committed but not yet
on the finished list, and we shouldn't flag it as a potential conflict
if so. We can also skip adding a doomed transaction to the list of
possible conflicts because we know it won't commit.

Dan Ports and Kevin Grittner.
2011-07-08 00:36:30 +03:00
Heikki Linnakangas 406d61835b SSI has a race condition, where the order of commit sequence numbers of
transactions might not match the order the work done in those transactions
become visible to others. The logic in SSI, however, assumed that it does.
Fix that by having two sequence numbers for each serializable transaction,
one taken before a transaction becomes visible to others, and one after it.
This is easier than trying to make the the transition totally atomic, which
would require holding ProcArrayLock and SerializableXactHashLock at the same
time. By using prepareSeqNo instead of commitSeqNo in a few places where
commit sequence numbers are compared, we can make those comparisons err on
the safe side when we don't know for sure which committed first.

Per analysis by Kevin Grittner and Dan Ports, but this approach to fix it
is different from the original patch.
2011-07-07 23:26:34 +03:00
Tom Lane 60a81ad133 Reclassify replication-related GUC variables as "master" and "standby".
Per discussion, this structure seems more understandable than what was
there before.  Make config.sgml and postgresql.conf.sample agree.

In passing do a bit of editorial work on the variable descriptions.
2011-07-07 15:11:41 -04:00
Robert Haas 5b2b444f66 Adjust OLDSERXID_MAX_PAGE based on BLCKSZ.
The value when BLCKSZ = 8192 is unchanged, but with larger-than-normal
block sizes we might need to crank things back a bit, as we'll have
more entries per page than normal in that case.

Kevin Grittner
2011-07-07 15:05:21 -04:00
Tom Lane a195e3c34f Finish disabling reduced-lock-levels-for-DDL feature.
Previous patch only covered the ALTER TABLE changes, not changes in other
commands; and it neglected to revert the documentation changes.
2011-07-07 13:15:15 -04:00
Heikki Linnakangas 928408d9e5 Fix a bug with SSI and prepared transactions:
If there's a dangerous structure T0 ---> T1 ---> T2, and T2 commits first,
we need to abort something. If T2 commits before both conflicts appear,
then it should be caught by OnConflict_CheckForSerializationFailure. If
both conflicts appear before T2 commits, it should be caught by
PreCommit_CheckForSerializationFailure. But that is actually run when
T2 *prepares*. Fix that in OnConflict_CheckForSerializationFailure, by
treating a prepared T2 as if it committed already.

This is mostly a problem for prepared transactions, which are in prepared
state for some time, but also for regular transactions because they also go
through the prepared state in the SSI code for a short moment when they're
committed.

Kevin Grittner and Dan Ports
2011-07-07 18:12:15 +03:00
Tom Lane 14f67192c2 Remove assumptions that not-equals operators cannot be in any opclass.
get_op_btree_interpretation assumed this in order to save some duplication
of code, but it's not true in general anymore because we added <> support
to btree_gist.  (We still assume it for btree opclasses, though.)

Also, essentially the same logic was baked into predtest.c.  Get rid of
that duplication by generalizing get_op_btree_interpretation so that it
can be used by predtest.c.

Per bug report from Denis de Bernardy and investigation by Jeff Davis,
though I didn't use Jeff's patch exactly as-is.

Back-patch to 9.1; we do not support this usage before that.
2011-07-06 14:53:16 -04:00
Tom Lane 2e56fa8632 Call FDW validator functions even when the options list is empty.
This is useful since a validator might want to require certain options
to be provided.  The passed array is an empty text array in this case.

Per suggestion by Laurenz Albe, though this is not quite his patch.
2011-07-05 18:21:12 -04:00
Peter Eisentraut 9a0bdc8db5 Message style improvements of errmsg_internal() calls 2011-07-05 23:01:35 +03:00
Peter Eisentraut 27af66162b Message style tweaks 2011-07-05 00:01:35 +03:00
Peter Eisentraut 6fbc80349f Set user_defined_types.data_type to null
On re-reading the standard, this field is only used for distinct or
reference types.
2011-07-04 23:09:42 +03:00
Alvaro Herrera b93f5a5673 Move Trigger and TriggerDesc structs out of rel.h into a new reltrigger.h
This lets us stop including rel.h into execnodes.h, which is a widely
used header.
2011-07-04 14:35:58 -04:00
Alvaro Herrera d665162077 Don't try to use a constraint name as domain name
The bug that caused this to be discovered is that the code was trying to
dereference a NULL or ill-defined pointer, as reported by Michael Mueller;
but what it was doing was wrong anyway, per Heikki.

This patch is Heikki's suggested fix.
2011-07-04 14:33:44 -04:00
Peter Eisentraut 9f084527a4 Remove unused variable to silence compiler warning 2011-07-04 18:03:17 +03:00
Heikki Linnakangas f7ea6beaf4 Remove silent_mode. You get the same functionality with "pg_ctl -l
postmaster.log", or nohup.

There was a small issue with LINUX_OOM_ADJ and silent_mode, namely that with
silent_mode the postmaster process incorrectly used the OOM settings meant
for backend processes. We certainly could've fixed that directly, but since
silent_mode was redundant anyway, we might as well just remove it.
2011-07-04 14:35:44 +03:00
Simon Riggs 2c3d9db56d Reset ALTER TABLE lock levels to AccessExclusiveLock in all cases.
Locks on inheritance parent remain at lower level, as they were before.
Remove entry from 9.1 release notes.
2011-07-04 09:31:40 +01:00
Robert Haas 5da79169d3 Fix bugs in relpersistence handling during table creation.
Unlike the relistemp field which it replaced, relpersistence must be
set correctly quite early during the table creation process, as we
rely on it quite early on for a number of purposes, including security
checks.  Normally, this is set based on whether the user enters CREATE
TABLE, CREATE UNLOGGED TABLE, or CREATE TEMPORARY TABLE, but a
relation may also be made implicitly temporary by creating it in
pg_temp.  This patch fixes the handling of that case, and also
disables creation of unlogged tables in temporary tablespace (such
table indeed skip WAL-logging, but we reject an explicit
specification) and creation of relations in the temporary schemas of
other sessions (which is not very sensible, and didn't work right
anyway).

Report by Amit Khandekar.
2011-07-03 17:34:47 -04:00
Magnus Hagander 24e2d4b6ba Mark pg_stat_reset_shared as strict
This is the proper fix for bug #6082 about
pg_stat_reset_shared(NULL) causing a crash, and it reverts
commit 79aa44536f on head.

The workaround of throwing an error from inside the function is
left on backbranches (including 9.1) since this change requires
a new initdb.
2011-07-03 13:15:58 +02:00
Tom Lane 426cafc46c Suppress compiler warning about potentially uninitialized variable.
Maybe some compilers are smart enough to not complain about the previous
coding ... but mine isn't.
2011-07-01 20:57:34 -04:00
Alvaro Herrera 897795240c Enable CHECK constraints to be declared NOT VALID
This means that they can initially be added to a large existing table
without checking its initial contents, but new tuples must comply to
them; a separate pass invoked by ALTER TABLE / VALIDATE can verify
existing data and ensure it complies with the constraint, at which point
it is marked validated and becomes a normal part of the table ecosystem.

An non-validated CHECK constraint is ignored in the planner for
constraint_exclusion purposes; when validated, cached plans are
recomputed so that partitioning starts working right away.

This patch also enables domains to have unvalidated CHECK constraints
attached to them as well by way of ALTER DOMAIN / ADD CONSTRAINT / NOT
VALID, which can later be validated with ALTER DOMAIN / VALIDATE
CONSTRAINT.

Thanks to Thom Brown, Dean Rasheed and Jaime Casanova for the various
reviews, and Robert Hass for documentation wording improvement
suggestions.

This patch was sponsored by Enova Financial.
2011-06-30 11:24:31 -04:00
Alvaro Herrera b36927fbe9 Fix outdated comment
Extracted from a patch by Bernd Helmle
2011-06-29 19:49:47 -04:00
Tom Lane a5652d3e05 Restore correct btree preprocessing of "indexedcol IS NULL" conditions.
Such a condition is unsatisfiable in combination with any other type of
btree-indexable condition (since we assume btree operators are always
strict).  8.3 and 8.4 had an explicit test for this, which I removed in
commit 29c4ad9829, mistakenly thinking that
the case would be subsumed by the more general handling of IS (NOT) NULL
added in that patch.  Put it back, and improve the comments about it, and
add a regression test case.

Per bug #6079 from Renat Nasyrov, and analysis by Dean Rasheed.
2011-06-29 19:46:47 -04:00
Heikki Linnakangas cd70dd6bef Move the PredicateLockRelation() call from nodeSeqscan.c to heapam.c. It's
more consistent that way, since all the other PredicateLock* calls are
made in various heapam.c and index AM functions. The call in nodeSeqscan.c
was unnecessarily aggressive anyway, there's no need to try to lock the
relation every time a tuple is fetched, it's enough to do it once.

This has the user-visible effect that if a seq scan is initialized in the
executor, but never executed, we now acquire the predicate lock on the heap
relation anyway. We could avoid that by taking the lock on the first
heap_getnext() call instead, but it doesn't seem worth the trouble given
that it feels more natural to do it in heap_beginscan().

Also, remove the retail PredicateLockTuple() calls from heap_getnext(). In
a seqscan, started with heap_begin(), we're holding a whole-relation
predicate lock on the heap so there's no need to lock the tuples
individually.

Kevin Grittner and me
2011-06-29 21:57:43 +03:00
Heikki Linnakangas d9fe63acb0 Grab predicate locks on matching tuples in a lossy bitmap heap scan.
Non-lossy case was already handled correctly.

Kevin Grittner
2011-06-29 21:50:42 +03:00
Magnus Hagander 79aa44536f Protect pg_stat_reset_shared() against NULL input
Per bug #6082, reported by Steve Haslam
2011-06-29 19:36:51 +02:00
Peter Eisentraut 21f1e15aaf Unify spelling of "canceled", "canceling", "cancellation"
We had previously (af26857a27)
established the U.S. spellings as standard.
2011-06-29 09:28:46 +03:00
Simon Riggs 465883b0a2 Introduce compact WAL record for the common case of commit (non-DDL).
XLOG_XACT_COMMIT_COMPACT leaves out invalidation messages and relfilenodes,
saving considerable space for the vast majority of transaction commits.
XLOG_XACT_COMMIT keeps same definition as XLOG_PAGE_MAGIC 0xD067 and earlier.

Leonardo Francalanci and Simon Riggs
2011-06-28 22:58:17 +01:00