Commit Graph

13050 Commits

Author SHA1 Message Date
Alvaro Herrera 2f1692d213 Fix erroneous choice of timeline variable, too 2012-10-31 17:05:55 -03:00
Alvaro Herrera 9b8dd7e8aa Fix erroneous choices of segNo variables
Commit dfda6eba (which changed segment numbers to use a single 64 bit
variable instead of log/seg) introduced a couple of bogus choices of
exactly which log segment number variable to use in each case.

This is currently pretty harmless; in one place, the bogus number was
only being used in an error message for a pretty unlikely condition
(failure to fsync a WAL segment file).  In the other, it was using a
global variable instead of the local variable; but all callsites were
passing the value of the global variable anyway.

No need to backpatch because that commit is not on earlier branches.
2012-10-31 11:05:28 -03:00
Alvaro Herrera 04f28bdb84 Fix ALTER EXTENSION / SET SCHEMA
In its original conception, it was leaving some objects into the old
schema, but without their proper pg_depend entries; this meant that the
old schema could be dropped, causing future pg_dump calls to fail on the
affected database.  This was originally reported by Jeff Frost as #6704;
there have been other complaints elsewhere that can probably be traced
to this bug.

To fix, be more consistent about altering a table's subsidiary objects
along the table itself; this requires some restructuring in how tables
are relocated when altering an extension -- hence the new
AlterTableNamespaceInternal routine which encapsulates it for both the
ALTER TABLE and the ALTER EXTENSION cases.

There was another bug lurking here, which was unmasked after fixing the
previous one: certain objects would be reached twice via the dependency
graph, and the second attempt to move them would cause the entire
operation to fail.  Per discussion, it seems the best fix for this is to
do more careful tracking of objects already moved: we now maintain a
list of moved objects, to avoid attempting to do it twice for the same
object.

Authors: Alvaro Herrera, Dimitri Fontaine
Reviewed by Tom Lane
2012-10-31 10:52:55 -03:00
Kevin Grittner 6868ed7491 Throw error if expiring tuple is again updated or deleted.
This prevents surprising behavior when a FOR EACH ROW trigger
BEFORE UPDATE or BEFORE DELETE directly or indirectly updates or
deletes the the old row.  Prior to this patch the requested action
on the row could be silently ignored while all triggered actions
based on the occurence of the requested action could be committed.
One example of how this could happen is if the BEFORE DELETE
trigger for a "parent" row deleted "children" which had trigger
functions to update summary or status data on the parent.

This also prevents similar surprising problems if the query has a
volatile function which updates a target row while it is already
being updated.

There are related issues present in FOR UPDATE cursors and READ
COMMITTED queries which are not handled by this patch.  These
issues need further evalution to determine what change, if any, is
needed.

Where the new error messages are generated, in most cases the best
fix will be to move code from the BEFORE trigger to an AFTER
trigger.  Where this is not feasible, the trigger can avoid the
error by re-issuing the triggering statement and returning NULL.

Documentation changes will be submitted in a separate patch.

Kevin Grittner and Tom Lane with input from Florian Pflug and
Robert Haas, based on problems encountered during conversion of
Wisconsin Circuit Court trigger logic to plpgsql triggers.
2012-10-26 14:55:36 -05:00
Tom Lane 17804fa71b Prefer actual constants to pseudo-constants in equivalence class machinery.
generate_base_implied_equalities_const() should prefer plain Consts over
other em_is_const eclass members when choosing the "pivot" value that
all the other members will be equated to.  This makes it more likely that
the generated equalities will be useful in constraint-exclusion proofs.
Per report from Rushabh Lathia.
2012-10-26 14:19:34 -04:00
Tom Lane bf01e34b55 Tweak genericcostestimate's fudge factor for index size.
To provide some bias against using a large index when a small one would do
as well, genericcostestimate adds a "fudge factor", which for a long time
was random_page_cost * index_pages/10000.  However, this can grow to be the
dominant term in indexscan cost estimates when the index involved is large
enough, a behavior that was never intended.  Change to a ln(1 + n/10000)
formulation, which has nearly the same behavior up to a few hundred pages
but tails off significantly thereafter.  (A log curve seems correct on
first principles, since what we're trying to account for here is index
descent costs, which are typically logarithmic.)  Per bug #7619 from Niko
Kiirala.

Possibly this change should get back-patched, but I'm hesitant to mess with
cost estimates in stable branches.
2012-10-24 16:25:40 -04:00
Tom Lane a4e8680a6c When converting a table to a view, remove its system columns.
Views should not have any pg_attribute entries for system columns.
However, we forgot to remove such entries when converting a table to a
view.  This could lead to crashes later on, if someone attempted to
reference such a column, as reported by Kohei KaiGai.

Patch in HEAD only.  This bug has been there forever, but in the back
branches we will have to defend against existing mis-converted views,
so it doesn't seem worthwhile to change the conversion code too.
2012-10-24 13:39:37 -04:00
Alvaro Herrera f4c4335a4a Add context info to OAT_POST_CREATE security hook
... and have sepgsql use it to determine whether to check permissions
during certain operations.  Indexes that are being created as a result
of REINDEX, for instance, do not need to have their permissions checked;
they were already checked when the index was created.

Author: KaiGai Kohei, slightly revised by me
2012-10-23 18:24:24 -03:00
Kevin Grittner 4c9d0901f1 Correct predicate locking for DROP INDEX CONCURRENTLY.
For the non-concurrent case there is an AccessExclusiveLock lock
on both the index and the heap at a time during which no other
process is using either, before which the index is maintained and
used for scans, and after which the index is no longer used or
maintained.  Predicate locks can safely be moved from the index to
the related heap relation under the protection of these locks.
This was done prior to the introductin of DROP INDEX CONCURRENTLY
and continues to be done for non-concurrent index drops.

For concurrent index drops, the predicate locks must be moved when
there are no index scans in progress on that index and no more can
subsequently start, and before heap inserts stop maintaining the
index.  As long as these conditions are guaranteed when the
TransferPredicateLocksToHeapRelation() function is called,
stronger locks are not needed for correctness.

Kevin Grittner based on questions by Tom Lane in reviewing the
DROP INDEX CONCURRENTLY patch and in cooperation with Andres
Freund and Simon Riggs.
2012-10-21 16:35:42 -05:00
Tom Lane 5d1abe64e6 Fix UtilityContainsQuery() to handle CREATE TABLE AS EXECUTE correctly.
The code seems to have been written to handle the pre-parse-analysis
representation, where an ExecuteStmt would appear directly under
CreateTableAsStmt.  But in reality the function is only run on
already-parse-analyzed statements, so there will be a Query node in
between.  We'd not noticed the bug because the function is generally
not used at all except in extended query protocol.

Per report from Robert Haas and Rushabh Lathia.
2012-10-19 18:33:45 -04:00
Tom Lane 4e32f8cd14 Fix hash_search to avoid corruption of the hash table on out-of-memory.
An out-of-memory error during expand_table() on a palloc-based hash table
would leave a partially-initialized entry in the table.  This would not be
harmful for transient hash tables, since they'd get thrown away anyway at
transaction abort.  But for long-lived hash tables, such as the relcache
hash, this would effectively corrupt the table, leading to crash or other
misbehavior later.

To fix, rearrange the order of operations so that table enlargement is
attempted before we insert a new entry, rather than after adding it
to the hash table.

Problem discovered by Hitoshi Harada, though this is a bit different
from his proposed patch.
2012-10-19 15:24:03 -04:00
Tom Lane 0d6895051a Fix ruleutils to print "INSERT INTO foo DEFAULT VALUES" correctly.
Per bug #7615 from Marko Tiikkaja.  Apparently nobody ever tried this
case before ...
2012-10-19 13:39:51 -04:00
Simon Riggs da85727565 Fix orphan on cancel of drop index concurrently.
Canceling DROP INDEX CONCURRENTLY during
wait could allow an orphaned index to be
left behind which could not be dropped.

Backpatch to 9.2

Andres Freund, tested by Abhijit Menon-Sen
2012-10-19 09:56:29 +01:00
Tom Lane 002191a1a3 Further cleanup of catcache.c ilist changes.
Remove useless duplicate initialization of bucket headers, don't use a
dlist_mutable_iter in a performance-critical path that doesn't need it,
make some other cosmetic changes for consistency's sake.
2012-10-18 19:30:43 -04:00
Tom Lane dc5aeca168 Remove unnecessary "head" arguments from some dlist/slist functions.
dlist_delete, dlist_insert_after, dlist_insert_before, slist_insert_after
do not need access to the list header, and indeed insisting on that negates
one of the main advantages of a doubly-linked list.

In consequence, revert addition of "cache_bucket" field to CatCTup.
2012-10-18 19:04:20 -04:00
Tom Lane 8f8d746478 Code review for inline-list patch.
Make foreach macros less syntactically dangerous, and fix some typos in
evidently-never-tested ones.  Add missing slist_next_node and
slist_head_node functions.  Fix broken dlist_check code.  Assorted comment
improvements.
2012-10-18 16:47:07 -04:00
Simon Riggs 2f0e480d02 Re-think guts of DROP INDEX CONCURRENTLY.
Concurrent behaviour was flawed when using
a two-step process, so add an additional
phase of processing to ensure concurrency
for both SELECTs and INSERT/UPDATE/DELETEs.

Backpatch to 9.2

Andres Freund, tweaked by me
2012-10-18 18:58:30 +01:00
Tom Lane 72a4231f0c Fix planning of non-strict equivalence clauses above outer joins.
If a potential equivalence clause references a variable from the nullable
side of an outer join, the planner needs to take care that derived clauses
are not pushed to below the outer join; else they may use the wrong value
for the variable.  (The problem arises only with non-strict clauses, since
if an upper clause can be proven strict then the outer join will get
simplified to a plain join.)  The planner attempted to prevent this type
of error by checking that potential equivalence clauses aren't
outerjoin-delayed as a whole, but actually we have to check each side
separately, since the two sides of the clause will get moved around
separately if it's treated as an equivalence.  Bugs of this type can be
demonstrated as far back as 7.4, even though releases before 8.3 had only
a very ad-hoc notion of equivalence clauses.

In addition, we neglected to account for the possibility that such clauses
might have nonempty nullable_relids even when not outerjoin-delayed; so the
equivalence-class machinery lacked logic to compute correct nullable_relids
values for clauses it constructs.  This oversight was harmless before 9.2
because we were only using RestrictInfo.nullable_relids for OR clauses;
but as of 9.2 it could result in pushing constructed equivalence clauses
to incorrect places.  (This accounts for bug #7604 from Bill MacArthur.)

Fix the first problem by adding a new test check_equivalence_delay() in
distribute_qual_to_rels, and fix the second one by adding code in
equivclass.c and called functions to set correct nullable_relids for
generated clauses.  Although I believe the second part of this is not
currently necessary before 9.2, I chose to back-patch it anyway, partly to
keep the logic similar across branches and partly because it seems possible
we might find other reasons why we need valid values of nullable_relids in
the older branches.

Add regression tests illustrating these problems.  In 9.0 and up, also
add test cases checking that we can push constants through outer joins,
since we've broken that optimization before and I nearly broke it again
with an overly simplistic patch for this problem.
2012-10-18 12:30:10 -04:00
Tom Lane ff3f9c8de5 Close un-owned SMgrRelations at transaction end.
If an SMgrRelation is not "owned" by a relcache entry, don't allow it to
live past transaction end.  This design allows the same SMgrRelation to be
used for blind writes of multiple blocks during a transaction, but ensures
that we don't hold onto such an SMgrRelation indefinitely.  Because an
SMgrRelation typically corresponds to open file descriptors at the fd.c
level, leaving it open when there's no corresponding relcache entry can
mean that we prevent the kernel from reclaiming deleted disk space.
(While CacheInvalidateSmgr messages usually fix that, there are cases
where they're not issued, such as DROP DATABASE.  We might want to add
some more sinval messaging for that, but I'd be inclined to keep this
type of logic anyway, since allowing VFDs to accumulate indefinitely
for blind-written relations doesn't seem like a good idea.)

This code replaces a previous attempt towards the same goal that proved
to be unreliable.  Back-patch to 9.1 where the previous patch was added.
2012-10-17 12:38:21 -04:00
Tom Lane 9bacf0e373 Revert "Use "transient" files for blind writes, take 2".
This reverts commit fba105b109.
That approach had problems with the smgr-level state not tracking what
we really want to happen, and with the VFD-level state not tracking the
smgr-level state very well either.  In consequence, it was still possible
to hold kernel file descriptors open for long-gone tables (as in recent
report from Tore Halset), and yet there were also cases of FDs being closed
undesirably soon.  A replacement implementation will follow.
2012-10-17 12:37:08 -04:00
Alvaro Herrera a66ee69add Embedded list interface
Provide a common implementation of embedded singly-linked and
doubly-linked lists.  "Embedded" in the sense that the nodes'
next/previous pointers exist within some larger struct; this design
choice reduces memory allocation overhead.

Most of the implementation uses inlineable functions (where supported),
for performance.

Some existing uses of both types of lists have been converted to the new
code, for demonstration purposes.  Other uses can (and probably will) be
converted in the future.  Since dllist.c is unused after this conversion,
it has been removed.

Author: Andres Freund
Some tweaks by me
Reviewed by Tom Lane, Peter Geoghegan
2012-10-17 11:31:20 -03:00
Bruce Momjian 22cc3b35f4 When outputting the session id in log_line_prefix (%c) or in CSV log
output mode, cause the hex digits after the period to always be at least
four hex digits, with zero-padding.
2012-10-16 12:37:59 -04:00
Heikki Linnakangas 7d3ed5ae78 Fix typo in comment.
Fujii Masao
2012-10-15 13:01:31 +03:00
Tom Lane e81e8f9342 Split up process latch initialization for more-fail-soft behavior.
In the previous coding, new backend processes would attempt to create their
self-pipe during the OwnLatch call in InitProcess.  However, pipe creation
could fail if the kernel is short of resources; and the system does not
recover gracefully from a FATAL error right there, since we have armed the
dead-man switch for this process and not yet set up the on_shmem_exit
callback that would disarm it.  The postmaster then forces an unnecessary
database-wide crash and restart, as reported by Sean Chittenden.

There are various ways we could rearrange the code to fix this, but the
simplest and sanest seems to be to split out creation of the self-pipe into
a new function InitializeLatchSupport, which must be called from a place
where failure is allowed.  For most processes that gets called in
InitProcess or InitAuxiliaryProcess, but processes that don't call either
but still use latches need their own calls.

Back-patch to 9.1, which has only a part of the latch logic that 9.2 and
HEAD have, but nonetheless includes this bug.
2012-10-14 22:59:56 -04:00
Tom Lane 8b728e5c6e Fix oversight in new code for printing rangetable aliases.
In commit 11e131854f, I missed the case of
a CTE RTE that doesn't have a user-defined alias, but does have an
alias assigned by set_rtable_names().  Per report from Peter Eisentraut.

While at it, refactor slightly to reduce code duplication.
2012-10-12 16:14:43 -04:00
Bruce Momjian 49ec613201 In our source code, make a copy of getopt's 'optarg' string arguments,
rather than just storing a pointer.
2012-10-12 13:35:43 -04:00
Tom Lane a29f7ed554 Get rid of COERCE_DONTCARE.
We don't need this hack any more.
2012-10-12 13:35:00 -04:00
Tom Lane 71e58dcfb9 Make equal() ignore CoercionForm fields for better planning with casts.
This change ensures that the planner will see implicit and explicit casts
as equivalent for all purposes, except in the minority of cases where
there's actually a semantic difference (as reflected by having a 3-argument
cast function).  In particular, this fixes cases where the EquivalenceClass
machinery failed to consider two references to a varchar column as
equivalent if one was implicitly cast to text but the other was explicitly
cast to text, as seen in bug #7598 from Vaclav Juza.  We have had similar
bugs before in other parts of the planner, so I think it's time to fix this
problem at the core instead of continuing to band-aid around it.

Remove set_coercionform_dontcare(), which represents the band-aid
previously in use for allowing matching of index and constraint expressions
with inconsistent cast labeling.  (We can probably get rid of
COERCE_DONTCARE altogether, but I don't think removing that enum value in
back branches would be wise; it's possible there's third party code
referring to it.)

Back-patch to 9.2.  We could go back further, and might want to once this
has been tested more; but for the moment I won't risk destabilizing plan
choices in long-since-stable branches.
2012-10-12 12:11:22 -04:00
Tom Lane 4816d2ea32 Fix cross-type case in partial row matching for hashed subplans.
When hashing a subplan like "WHERE (a, b) NOT IN (SELECT x, y FROM ...)",
findPartialMatch() attempted to match rows using the hashtable's internal
equality operators, which of course are for x and y's datatypes.  What we
need to use are the potentially cross-type operators for a=x, b=y, etc.
Failure to do that leads to wrong answers or even crashes.  The scope for
problems is limited to cases where we have different types with compatible
hash functions (else we'd not be using a hashed subplan), but for example
int4 vs int8 can cause the problem.

Per bug #7597 from Bo Jensen.  This has been wrong since the hashed-subplan
code was written, so patch all the way back.
2012-10-11 12:22:13 -04:00
Heikki Linnakangas 6f60fdd701 Improve replication connection timeouts.
Rename replication_timeout to wal_sender_timeout, and add a new setting
called wal_receiver_timeout that does the same at the walreceiver side.
There was previously no timeout in walreceiver, so if the network went down,
for example, the walreceiver could take a long time to notice that the
connection was lost. Now with the two settings, both sides of a replication
connection will detect a broken connection similarly.

It is no longer necessary to manually set wal_receiver_status_interval to
a value smaller than the timeout. Both wal sender and receiver now
automatically send a "ping" message if more than 1/2 of the configured
timeout has elapsed, and it hasn't received any messages from the other end.

Amit Kapila, heavily edited by me.
2012-10-11 17:48:08 +03:00
Peter Eisentraut 8521d13194 Refactor flex and bison make rules
Numerous flex and bison make rules have appeared in the source tree
over time, and they are all virtually identical, so we can replace
them by pattern rules with some variables for customization.

Users of pgxs will also be able to benefit from this.
2012-10-11 06:57:04 -04:00
Tom Lane 864db11683 Update obsolete comment.
We no longer use GetNewOidWithIndex on pg_largeobject; rather,
pg_largeobject_metadata's regular OID column is considered the repository
of OIDs for large objects.  The special functionality is still needed for
TOAST tables however.
2012-10-10 17:04:37 -04:00
Tom Lane 3f88fa971a Fix PGXS support for building loadable modules on AIX.
Building a shlib on AIX requires use of the mkldexport.sh script, but we
failed to install that, preventing its use from non-source-tree contexts.
Also, Makefile.aix had the wrong idea about where to find the installed
copy of the postgres.imp symbol file used by AIX.

Per report from John Pierce.  Patch all the way back, since this has been
broken since the beginning of PGXS.
2012-10-09 21:04:06 -04:00
Tom Lane 7e0cce0265 Remove unnecessary overhead in backend's large-object operations.
Do read/write permissions checks at most once per large object descriptor,
not once per lo_read or lo_write call as before.  The repeated tests were
quite useless in the read case since the snapshot-based tests were
guaranteed to produce the same answer every time.  In the write case,
the extra tests could in principle detect revocation of write privileges
after a series of writes has started --- but there's a race condition there
anyway, since we'd check privileges before performing and certainly before
committing the write.  So there's no real advantage to checking every
single time, and we might as well redefine it as "only check the first
time".

On the same reasoning, remove the LargeObjectExists checks in inv_write
and inv_truncate.  We already checked existence when the descriptor was
opened, and checking again doesn't provide any real increment of safety
that would justify the cost.
2012-10-09 16:38:00 -04:00
Heikki Linnakangas 2d8c81ac86 Fix silly bug in previous refactoring.
I extracted the refactoring patch from a larger patch that contained other
changes too, but missed one unintentional change and didn't test enough...
2012-10-09 19:33:12 +03:00
Heikki Linnakangas ff8f160bf4 Put the logic to wait for WAL in standby mode to a separate function.
This is just refactoring with no user-visible effect, to make the code more
readable.
2012-10-09 19:20:17 +03:00
Heikki Linnakangas 0b77aebabf Remove stray newline in comment. 2012-10-09 13:06:48 +03:00
Peter Eisentraut b6d4522296 Remove generation of repl_gram.h
It was apparently never necessary.
2012-10-08 20:36:46 -04:00
Tom Lane 26fe56481c Code review for 64-bit-large-object patch.
Fix broken-on-bigendian-machines byte-swapping functions, add missed update
of alternate regression expected file, improve error reporting, remove some
unnecessary code, sync testlo64.c with current testlo.c (it seems to have
been cloned from a very old copy of that), assorted cosmetic improvements.
2012-10-08 18:24:32 -04:00
Alvaro Herrera 878daf2e72 Fix thinko in previous commit
Since postgres.h includes palloc.h, definitions that affect the latter
must be present before the former is included.

Per buildfarm results
2012-10-08 18:33:08 -03:00
Alvaro Herrera 976fa10d20 Add support for easily declaring static inline functions
We already had those, but they forced modules to spell out the function
bodies twice.  Eliminate some duplicates we had already grown.

Extracted from a somewhat larger patch from Andres Freund.
2012-10-08 16:28:01 -03:00
Heikki Linnakangas b28cc92d7d Say ANALYZE, not VACUUM, in error message on analyze in hot standby.
Tomonaru Katsumata
2012-10-08 14:17:27 +03:00
Heikki Linnakangas 9c0e2b9182 Fix walsender handling of postmaster shutdown, to not go into endless loop.
This bug was introduced by my patch to use the regular die/quickdie signal
handlers in walsender processes. I tried to make walsender exit at next
CHECK_FOR_INTERRUPTS() by setting ProcDiePending, but that's not enough, you
need to set InterruptPending too. On second thoght, it was not a very good
way to make walsender exit anyway, so use proc_exit(0) instead.

Also, send a CommandComplete message before exiting; that's what we did
before, and you get a nicer error message in the standby that way.

Reported by Thom Brown.
2012-10-08 13:32:14 +03:00
Andrew Dunstan ea72bb8ae5 Fix typo in previous MSC commit. 2012-10-07 19:56:26 -04:00
Andrew Dunstan 33a7101281 Quiet a few MSC compiler warnings. 2012-10-07 17:31:10 -04:00
Tatsuo Ishii 461ef73f09 Add API for 64-bit large object access. Now users can access up to
4TB large objects (standard 8KB BLCKSZ case).  For this purpose new
libpq API lo_lseek64, lo_tell64 and lo_truncate64 are added.  Also
corresponding new backend functions lo_lseek64, lo_tell64 and
lo_truncate64 are added. inv_api.c is changed to handle 64-bit
offsets.

Patch contributed by Nozomi Anzai (backend side) and Yugo Nagata
(frontend side, docs, regression tests and example program). Reviewed
by Kohei Kaigai. Committed by Tatsuo Ishii with minor editings.
2012-10-07 08:36:48 +09:00
Heikki Linnakangas fd5942c18f Use the regular main processing loop also in walsenders.
The regular backend's main loop handles signal handling and error recovery
better than the current WAL sender command loop does. For example, if the
client hangs and a SIGTERM is received before starting streaming, the
walsender will now terminate immediately, rather than hang until the
connection times out.
2012-10-05 17:21:12 +03:00
Tom Lane 1997f34db4 getnameinfo_unix has to be taught not to insist on NI_NUMERIC flags, too.
Per testing of previous patch.
2012-10-04 22:54:18 -04:00
Peter Eisentraut c424d0d105 Remove redundant code for getnameinfo() replacement
Our getnameinfo() replacement implementation in getaddrinfo.c failed
unless NI_NUMERICHOST and NI_NUMERICSERV were given as flags, because
it doesn't resolve host names, only numeric IPs.  But per standard,
when those flags are not given, an implementation can still degrade to
not returning host names, so this restriction is unnecessary.  When we
remove it, we can eliminate some code in postmaster.c that apparently
tried to work around that.
2012-10-04 21:45:14 -04:00
Tom Lane e1e60694b4 Make CREATE AGGREGATE complain if the initcond is invalid for the datatype.
The initial transition value is stored as a text string and not fed to the
transition type's input function until runtime (so that values such as
"now" don't get frozen at creation time).  Previously, CREATE AGGREGATE
didn't do anything with it but that, which meant that even erroneous values
would be accepted and not complained of until the aggregate is used.  This
seems unhelpful, and it's confused at least one user, as in Rhys Stewart's
recent report.  It seems worth taking a few more cycles to invoke the input
function and verify that the value is acceptable.  We can't do this if the
transition type is polymorphic, but in normal aggregates we know the actual
transition type so we can call the right input function.
2012-10-04 17:54:53 -04:00
Tom Lane 707263542e Fix parse location tracking for lists that can be empty.
The previous coding of the YYLLOC_DEFAULT macro behaved strangely for empty
productions, assigning the previous nonterminal's location as the parse
location of the result.  The usefulness of that was (at best) debatable
already, but the real problem is that in list-generating nonterminals like
	OptFooList: /* EMPTY */ { ... } | OptFooList Foo { ... } ;
the initially-identified location would get copied up, so that even a
nonempty list would be given a bogus parse location.  Document how to work
around that, and do so for OptSchemaEltList, so that the error condition
just added for CREATE SCHEMA IF NOT EXISTS produces a sane error cursor.
So far as I can tell, there are currently no other cases where the
situation arises, so we don't need other instances of this coding yet.
2012-10-04 17:15:29 -04:00
Heikki Linnakangas 1a956481ba Fix typo in comment, and reword it slightly while we're at it. 2012-10-04 10:35:48 +03:00
Tom Lane fb34e94d21 Support CREATE SCHEMA IF NOT EXISTS.
Per discussion, schema-element subcommands are not allowed together with
this option, since it's not very obvious what should happen to the element
objects.

Fabrízio de Royes Mello
2012-10-03 19:47:11 -04:00
Alvaro Herrera 994c36e01d refactor ALTER some-obj SET OWNER implementation
Remove duplicate implementation of catalog munging and miscellaneous
privilege and consistency checks.  Instead rely on already existing data
in objectaddress.c to do the work.

Author: KaiGai Kohei
Tweaked by me
Reviewed by Robert Haas
2012-10-03 18:07:46 -03:00
Tom Lane 1f91c8ca1d Avoid planner crash/Assert failure with joins to unflattened subqueries.
examine_simple_variable supposed that any RTE_SUBQUERY rel it gets pointed
at must have been planned already.  However, this isn't a safe assumption
because we must do selectivity estimation while generating indexscan paths,
and that code might look at join clauses involving a rel that the loop in
set_base_rel_sizes() hasn't reached yet.  The simplest fix is to play dumb
in such a situation, that is give up trying to extract any stats for the
Var.  This could possibly be improved by making a separate pass over the
RTE list to plan each unflattened subquery before we start the main
planning work --- but that would be pretty invasive and it doesn't seem
worth it, for now at least.  (We couldn't just break set_base_rel_sizes()
into two loops: the prescan would need to handle all subquery rels in the
query, not only those in the current join subproblem.)

This bug was introduced in commit 1cb108efb0,
although I think that subsequent changes may have exposed it more than it
was originally.  Per bug #7580 from Maxim Boguk.
2012-10-03 13:37:53 -04:00
Alvaro Herrera fe3b5eb08a REASSIGN OWNED: consider grants on tablespaces, too
Apparently this was considered in the original code (see commit
cec3b0a9) but I failed to notice that such entries would always be
skipped by the database check at the start of the loop.

Per bugs #7578 by Nikolay, #6116 by tushar.qa@gmail.com.
2012-10-03 12:30:00 -03:00
Heikki Linnakangas 7ae1815961 Return the number of rows processed when COPY is executed through SPI.
You can now get the number of rows processed by a COPY statement in a
PL/pgSQL function with "GET DIAGNOSTICS x = ROW_COUNT".

Pavel Stehule, reviewed by Amit Kapila, with some editing by me.
2012-10-03 14:38:22 +03:00
Heikki Linnakangas bc1229c832 Fix two bugs introduced in the xlog.c split.
The comment explaining the naming of timeline history files was wrong, and
the history file was not being arhived.

Pointed out by Fujii Masao.
2012-10-03 09:15:38 +03:00
Peter Eisentraut 6bd176095b Improve some LDAP authentication error messages 2012-10-02 23:25:05 -04:00
Tom Lane 09ac603c36 Work around unportable behavior of malloc(0) and realloc(NULL, 0).
On some platforms these functions return NULL, rather than the more common
practice of returning a pointer to a zero-sized block of memory.  Hack our
various wrapper functions to hide the difference by substituting a size
request of 1.  This is probably not so important for the callers, who
should never touch the block anyway if they asked for size 0 --- but it's
important for the wrapper functions themselves, which mistakenly treated
the NULL result as an out-of-memory failure.  This broke at least pg_dump
for the case of no user-defined aggregates, as per report from
Matthew Carrington.

Back-patch to 9.2 to fix the pg_dump issue.  Given the lack of previous
complaints, it seems likely that there is no live bug in previous releases,
even though some of these functions were in place before that.
2012-10-02 17:32:42 -04:00
Alvaro Herrera 2164f9a125 Refactor "ALTER some-obj SET SCHEMA" implementation
Instead of having each object type implement the catalog munging
independently, centralize knowledge about how to do it and expand the
existing table in objectaddress.c with enough data about each object
type to support this operation.

Author: KaiGai Kohei
Tweaks by me
Reviewed by Robert Haas
2012-10-02 18:13:54 -03:00
Heikki Linnakangas 93b6d78cf0 Add #includes needed on some platforms in the new files.
Hopefully this makes the *BSD buildfarm animals happy.
2012-10-02 17:19:52 +03:00
Heikki Linnakangas d5497b95f3 Split off functions related to timeline history files and XLOG archiving.
This is just refactoring, to make the functions accessible outside xlog.c.
A followup patch will make use of that, to allow fetching timeline history
files over streaming replication.
2012-10-02 13:37:19 +03:00
Heikki Linnakangas 0899556e92 Fix access past end of string in date parsing.
This affects date_in(), and a couple of other funcions that use DecodeDate().

Hitoshi Harada
2012-10-02 10:43:48 +03:00
Bruce Momjian dbdb2172a0 Add C comment that IsBackendPid() is called by external modules, so we
don't accidentally remove it.
2012-10-01 10:14:35 -04:00
Tom Lane 05b555d12b Fix tar files emitted by pg_dump and pg_basebackup to be POSIX conformant.
Both programs got the "magic" string wrong, causing standard-conforming tar
implementations to believe the output was just legacy tar format without
any POSIX extensions.  This doesn't actually matter that much, especially
since pg_dump failed to fill the POSIX fields anyway, but still there is
little point in emitting tar format if we can't be compliant with the
standard.  In addition, pg_dump failed to write the EOF marker correctly
(there should be 2 blocks of zeroes not just one), pg_basebackup put the
numeric group ID in the wrong place, and both programs had a pretty
brain-dead idea of how to compute the checksum.  Fix all that and improve
the comments a bit.

pg_restore is modified to accept either the correct POSIX-compliant "magic"
string or the previous value.  This part of the change will need to be
back-patched to avoid an unnecessary compatibility break when a previous
version tries to read tar-format output from 9.3 pg_dump.

Brian Weaver and Tom Lane
2012-09-28 15:19:15 -04:00
Peter Eisentraut edc9109c42 Produce textual error messages for LDAP issues instead of numeric codes 2012-09-27 20:22:50 -04:00
Tom Lane 70bc583319 Fix btmarkpos/btrestrpos to handle array keys.
This fixes another error in commit 9e8da0f757.
I neglected to make the mark/restore functionality save and restore the
current set of array key values, which led to strange behavior if an
IndexScan with ScalarArrayOpExpr quals was used as the inner side of a
mergejoin.  Per bug #7570 from Melese Tesfaye.
2012-09-27 17:01:02 -04:00
Alvaro Herrera ae90ffada4 Have pg_terminate/cancel_backend not ERROR on non-existent processes
This worked fine for superusers, but not for ordinary users trying to
cancel their own processes.  Tweak the order the checks are done in so
that we correctly return SIGNAL_BACKEND_ERROR (which current callers
know to ignore without erroring out) so that an ordinary user can loop
through a resultset without fearing that a process might exit in the
middle of said looping -- causing the remaining processes to go
unsignalled.

Incidentally, the last in-core caller of IsBackendPid() is now gone.
However, the function is exported and must remain in place, because
there are plenty of callers in external modules.

Author: Josh Kupershmidt

Reviewed by Noah Misch
2012-09-27 12:29:51 -03:00
Tom Lane 55c1687a97 Run check_keywords.pl anytime gram.c is rebuilt.
This script is a bit slow, but still it only takes a fraction of the time
the bison run does, so the overhead doesn't seem intolerable.  And we
definitely need some mechanical aid here, because people keep missing
the need to add new keywords to the appropriate keyword-list production.

While at it, I moved check_keywords.pl from src/tools into
src/backend/parser where it's actually used, and did some very minor
cleanup on the script.
2012-09-26 23:12:39 -04:00
Tom Lane fc68ac86b1 Add new EVENT keyword to unreserved_keyword production.
Once again, somebody who ought to know better forgot this.  We really
need some automated cross-check on the keyword-list productions, I think.
Per report from Brian Weaver.
2012-09-26 20:07:36 -04:00
Heikki Linnakangas 2a0c81a12c Add support for include_dir in config file.
This allows easily splitting configuration into many files, deployed in a
directory.

Magnus Hagander, Greg Smith, Selena Deckelmann, reviewed by Noah Misch.
2012-09-24 18:07:53 +03:00
Tom Lane 31510194cc Minor corrections for ALTER TYPE ADD VALUE IF NOT EXISTS patch.
Produce a NOTICE when the label already exists, for consistency with other
CREATE IF NOT EXISTS commands.  Also, fix the code so it produces something
more user-friendly than an index violation when the label already exists.
This not incidentally enables making a regression test that the previous
patch didn't make for fear of exposing an unpredictable OID in the results.
Also some wordsmithing on the documentation.
2012-09-22 18:35:22 -04:00
Andrew Dunstan 6d12b68cd7 Allow IF NOT EXISTS when add a new enum label.
If the label is already in the enum the statement becomes a no-op.
This will reduce the pain that comes from our not allowing this
operation inside a transaction block.

Andrew Dunstan, reviewed by Tom Lane and Magnus Hagander.
2012-09-22 12:53:31 -04:00
Tom Lane 11e131854f Improve ruleutils.c's heuristics for dealing with rangetable aliases.
The previous scheme had bugs in some corner cases involving tables that had
been renamed since a view was made.  This could result in dumped views that
failed to reload or reloaded incorrectly, as seen in bug #7553 from Lloyd
Albin, as well as in some pgsql-hackers discussion back in January.  Also,
its behavior for printing EXPLAIN plans was sometimes confusing because of
willingness to use the same alias for multiple RTEs (it was Ashutosh
Bapat's complaint about that aspect that started the January thread).

To fix, ensure that each RTE in the query has a unique unqualified alias,
by modifying the alias if necessary (we add "_" and digits as needed to
create a non-conflicting name).  Then we can just print its variables with
that alias, avoiding the confusing and bug-prone scheme of sometimes
schema-qualifying variable names.  In EXPLAIN, it proves to be expedient to
take the further step of only assigning such aliases to RTEs that are
actually referenced in the query, since the planner has a habit of
generating extra RTEs with the same alias in situations such as
inheritance-tree expansion.

Although this fixes a bug of very long standing, I'm hesitant to back-patch
such a noticeable behavioral change.  My experiments while creating a
regression test convinced me that actually incorrect output (as opposed to
confusing output) occurs only in very narrow cases, which is backed up by
the lack of previous complaints from the field.  So we may be better off
living with it in released branches; and in any case it'd be smart to let
this ripen awhile in HEAD before we consider back-patching it.
2012-09-21 19:03:10 -04:00
Heikki Linnakangas 7c45e3a3c6 Parse pg_ident.conf when it's loaded, keeping it in memory in parsed format.
Similar changes were done to pg_hba.conf earlier already, this commit makes
pg_ident.conf to behave the same as pg_hba.conf.

This has two user-visible effects. First, if pg_ident.conf contains multiple
errors, the whole file is parsed at postmaster startup time and all the
errors are immediately reported. Before this patch, the file was parsed and
the errors were reported only when someone tries to connect using an
authentication method that uses the file, and the parsing stopped on first
error. Second, if you SIGHUP to reload the config files, and the new
pg_ident.conf file contains an error, the error is logged but the old file
stays in effect.

Also, regular expressions in pg_ident.conf are now compiled only once when
the file is loaded, rather than every time the a user is authenticated. That
should speed up authentication if you have a lot of regexps in the file.

Amit Kapila
2012-09-21 17:54:39 +03:00
Heikki Linnakangas 9d5e9730e5 Fix obsolete comment.
load_hba and load_ident load stuff in a separate memory context nowadays,
not in the current memory context.
2012-09-21 15:22:56 +03:00
Alvaro Herrera 22c734fcdb Remove execdesc.h inclusion from tcopprot.h 2012-09-20 11:07:59 -03:00
Tom Lane 96cc18eef6 Put back AcceptInvalidationMessages calls in heap_openrv(_extended).
These calls were removed in commit 4240e429d0
as part of a general refactoring and improvement of DDL locking.  However,
there's a problem not solved by the rewrite, which is that GRANT/REVOKE
update pg_class.relacl without taking any particular lock on the target
table as such.  If another backend fails to do AcceptInvalidationMessages,
it won't notice a recently-committed change in ACLs.  Bug #7557 from Piotr
Czachur demonstrates that there's at least one code path in 9.2.0 in which
a command fails to do any AcceptInvalidationMessages calls at all, if the
current transaction already holds all the locks it will need.

Since we're hard up against the release deadline for 9.2.1, fix this by
putting back the AcceptInvalidationMessages calls in heap_openrv and
heap_openrv_extended, thereby restoring the historical behavior in this
area.  We ought to look for a more elegant and perhaps more bulletproof
solution, but there's no time for that right now.
2012-09-19 17:10:37 -04:00
Tom Lane 807a40c551 Fix planning of btree index scans using ScalarArrayOpExpr quals.
In commit 9e8da0f757, I improved btree
to handle ScalarArrayOpExpr quals natively, so that constructs like
"indexedcol IN (list)" could be supported by index-only scans.  Using
such a qual results in multiple scans of the index, under-the-hood.
I went to some lengths to ensure that this still produces rows in index
order ... but I failed to recognize that if a higher-order index column
is lacking an equality constraint, rescans can produce out-of-order
data from that column.  Tweak the planner to not expect sorted output
in that case.  Per trouble report from Robert McGehee.
2012-09-18 12:20:34 -04:00
Tom Lane 3f828fae62 Fix array_typanalyze to work for domains over arrays.
Not sure how we missed this case, but we did.  Per bug #7551 from
Diego de Lima.
2012-09-18 00:31:40 -04:00
Tom Lane 3b8968f252 Rethink heuristics for choosing index quals for parameterized paths.
Some experimentation with examples similar to bug #7539 has convinced me
that indxpath.c's original implementation of parameterized-path generation
was several bricks shy of a load.  In general, if we are relying on a
particular outer rel or set of outer rels for a parameterized path, the
path should use every indexable join clause that's available from that rel
or rels.  Any join clauses that get left out of the indexqual will end up
getting applied as plain filter quals (qpquals), and that's generally a
significant loser compared to having the index AM enforce them.  (This is
particularly true with btree, which can skip the index scan entirely if
it can see that the given indexquals are mutually contradictory.)  The
original heuristics failed to ensure this, though, and were overly
complicated anyway.  Rewrite to make the code explicitly identify each
useful set of outer rels and then select all applicable join clauses for
each one.  The one plan that changes in the regression tests is in fact
for the better according to the planner's cost estimates.

(Note: this is not a correctness issue but just a matter of plan quality.
I don't yet know what is going on in bug #7539, but I don't expect this
change to fix that.)
2012-09-16 17:58:09 -04:00
Simon Riggs 64e196b6ef Fix bufmgr so CHECKPOINT_END_OF_RECOVERY behaves as a shutdown checkpoint.
Recovery code documents clearly that a shutdown checkpoint is executed at
end of recovery - a shutdown checkpoint WAL record is written but the buffer
manager had been altered to treat end of recovery as a normal checkpoint.
This bug exacerbates the bufmgr relpersistence bug.

Bug spotted by Andres Freund, patch by me.
2012-09-16 19:53:34 +01:00
Robert Haas beb850e1d8 Properly set relpersistence for fake relcache entries.
This can result in buffers failing to be properly flushed at
checkpoint time, leading to data loss.

Report, diagnosis, and patch by Jeff Davis.
2012-09-14 09:35:07 -04:00
Tom Lane a20993608a Fix case of window function + aggregate + GROUP BY expression.
In commit 1bc16a9460 I added a minor
optimization to drop the component variables of a GROUP BY expression from
the target list computed at the aggregation level of a query, if those Vars
weren't referenced elsewhere in the tlist.  However, I overlooked that the
window-function planning code would deconstruct such expressions and thus
need to have access to their component variables.  Fix it to not do that.

While at it, I removed the distinction between volatile and nonvolatile
window partition/order expressions: the code now computes all of them
at the aggregation level.  This saves a relatively expensive check for
volatility, and it's unclear that the resulting plan isn't better anyway.

Per bug #7535 from Louis-David Mitterrand.  Back-patch to 9.2.
2012-09-13 11:32:25 -04:00
Tom Lane 9a93e71008 Fix a couple other leftover uses of 'conisonly' terminology. 2012-09-12 15:12:24 -04:00
Tom Lane 1faf866ace Fix logical errors in tsquery selectivity estimation for prefix queries.
I made multiple errors in commit 97532f7c29,
stemming mostly from failure to think about the available frequency data
as being element frequencies not value frequencies (so that occurrences of
different elements are not mutually exclusive).  This led to sillinesses
such as estimating that "word" would match more rows than "word:*".

The choice to clamp to a minimum estimate of DEFAULT_TS_MATCH_SEL also
seems pretty ill-considered in hindsight, as it would frequently result in
an estimate much larger than the available data suggests.  We do need some
sort of clamp, since a pattern not matching any of the MCELEMs probably
still needs a selectivity estimate of more than zero.  I chose instead to
clamp to at least what a non-MCELEM word would be estimated as, preserving
the property that "word:*" doesn't get an estimate less than plain "word",
whether or not the word appears in MCELEM.

Per investigation of a gripe from Bill Martin, though I suspect that his
example case actually isn't even reaching the erroneous code.

Back-patch to 9.1 where this code was introduced.
2012-09-11 21:23:20 -04:00
Tom Lane d2286a98ef Allow embedded spaces without quoting in unix_socket_directories entries.
This fix removes an unnecessary incompatibility with the old behavior of
the unix_socket_directory parameter.  Since pathnames with embedded spaces
are fairly popular on some platforms, the incompatibility could be
significant in practice.  We'll still strip unquoted leading/trailing
spaces, however.

No docs update since the documentation already implied that it worked
like this.

Per bug #7514 from Murray Cumming.
2012-09-06 11:43:51 -04:00
Heikki Linnakangas ab9a14e903 Fix WAL file replacement during cascading replication on Windows.
When the startup process restores a WAL file from the archive, it deletes
any old file with the same name and renames the new file in its place. On
Windows, however, when a file is deleted, it still lingers as long as a
process holds a file handle open on it. With cascading replication, a
walsender process can hold the old file open, so the rename() in the startup
process would fail. To fix that, rename the old file to a temporary name, to
make the original file name available for reuse, before deleting the old
file.
2012-09-05 18:52:12 -07:00
Tom Lane 2e0cc1f031 Fix inappropriate error messages for Hot Standby misconfiguration errors.
Give the correct name of the GUC parameter being complained of.
Also, emit a more suitable SQLSTATE (INVALID_PARAMETER_VALUE,
not the default INTERNAL_ERROR).

Gurjeet Singh, errcode adjustment by me
2012-09-05 21:49:08 -04:00
Tom Lane 46c508fbcf Fix PARAM_EXEC assignment mechanism to be safe in the presence of WITH.
The planner previously assumed that parameter Vars having the same absolute
query level, varno, and varattno could safely be assigned the same runtime
PARAM_EXEC slot, even though they might be different Vars appearing in
different subqueries.  This was (probably) safe before the introduction of
CTEs, but the lazy-evalution mechanism used for CTEs means that a CTE can
be executed during execution of some other subquery, causing the lifespan
of Params at the same syntactic nesting level as the CTE to overlap with
use of the same slots inside the CTE.  In 9.1 we created additional hazards
by using the same parameter-assignment technology for nestloop inner scan
parameters, but it was broken before that, as illustrated by the added
regression test.

To fix, restructure the planner's management of PlannerParamItems so that
items having different semantic lifespans are kept rigorously separated.
This will probably result in complex queries using more runtime PARAM_EXEC
slots than before, but the slots are cheap enough that this hardly matters.
Also, stop generating PlannerParamItems containing Params for subquery
outputs: all we really need to do is reserve the PARAM_EXEC slot number,
and that now only takes incrementing a counter.  The planning code is
simpler and probably faster than before, as well as being more correct.

Per report from Vik Reykja.

These changes will mostly also need to be made in the back branches, but
I'm going to hold off on that until after 9.2.0 wraps.
2012-09-05 12:55:01 -04:00
Alvaro Herrera e20a90e188 Trim spgist_private.h inclusion
It doesn't really need rel.h; relcache.h is enough.
2012-09-05 11:06:51 -03:00
Heikki Linnakangas 358ff99d70 Fix compiler warnings about unused variables, caused by my previous commit.
Reported by Peter Eisentraut.
2012-09-04 22:07:35 -07:00
Heikki Linnakangas c4c227477b Fix bugs in cascading replication with recovery_target_timeline='latest'
The cascading replication code assumed that the current RecoveryTargetTLI
never changes, but that's not true with recovery_target_timeline='latest'.
The obvious upshot of that is that RecoveryTargetTLI in shared memory needs
to be protected by a lock. A less obvious consequence is that when a
cascading standby is connected, and the standby switches to a new target
timeline after scanning the archive, it will continue to stream WAL to the
cascading standby, but from a wrong file, ie. the file of the previous
timeline. For example, if the standby is currently streaming from the middle
of file 000000010000000000000005, and the timeline changes, the standby
will continue to stream from that file. However, the WAL on the new
timeline is in file 000000020000000000000005, so the standby sends garbage
from 000000010000000000000005 to the cascading standby, instead of the
correct WAL from file 000000020000000000000005.

This also fixes a related bug where a partial WAL segment is restored from
the archive and streamed to a cascading standby. The code assumed that when
a WAL segment is copied from the archive, it can immediately be fully
streamed to a cascading standby. However, if the segment is only partially
filled, ie. has the right size, but only N first bytes contain valid WAL,
that's not safe. That can happen if a partial WAL segment is manually copied
to the archive, or if a partial WAL segment is archived because a server is
started up on a new timeline within that segment. The cascading standby will
get confused if the WAL it received is not valid, and will get stuck until
it's restarted. This patch fixes that problem by not allowing WAL restored
from the archive to be streamed to a cascading standby until it's been
replayed, and thus validated.
2012-09-04 19:33:21 -07:00
Kevin Grittner cdf91edba9 Fix serializable mode with index-only scans.
Serializable Snapshot Isolation used for serializable transactions
depends on acquiring SIRead locks on all heap relation tuples which
are used to generate the query result, so that a later delete or
update of any of the tuples can flag a read-write conflict between
transactions.  This is normally handled in heapam.c, with tuple level
locking.  Since an index-only scan avoids heap access in many cases,
building the result from the index tuple, the necessary predicate
locks were not being acquired for all tuples in an index-only scan.

To prevent problems with tuple IDs which are vacuumed and re-used
while the transaction still matters, the xmin of the tuple is part of
the tag for the tuple lock.  Since xmin is not available to the
index-only scan for result rows generated from the index tuples, it
is not possible to acquire a tuple-level predicate lock in such
cases, in spite of having the tid.  If we went to the heap to get the
xmin value, it would no longer be an index-only scan.  Rather than
prohibit index-only scans under serializable transaction isolation,
we acquire an SIRead lock on the page containing the tuple, when it
was not necessary to visit the heap for other reasons.

Backpatch to 9.2.

Kevin Grittner and Tom Lane
2012-09-04 21:13:11 -05:00
Magnus Hagander bd46b52199 Remove some useless trailing whitespace
Michael Paquier
2012-09-04 09:17:14 +02:00
Bruce Momjian 015722fb36 Fix to_date() and to_timestamp() to allow specification of the day of
the week via ISO or Gregorian designations.  The fix is to store the
day-of-week consistently as 1-7, Sunday = 1.

Fixes bug reported by Marc Munro
2012-09-03 22:52:44 -04:00
Tom Lane 2a2352e07d Replace memcpy() calls in xlog.c critical sections with struct assignments.
This gets rid of a dangerous-looking use of the not-volatile XLogCtl
pointer in a couple of spinlock-protected sections, where the normal
coding rule is that you should only access shared memory through a
pointer-to-volatile.  I think the risk is only hypothetical not actual,
since for there to be a bug the compiler would have to move the spinlock
acquire or release across the memcpy() call, which one sincerely hopes
it will not.  Still, it looks cleaner this way.

Per comment from Daniel Farina and subsequent discussion.
2012-09-03 15:39:15 -04:00
Tom Lane 6d2c8c0e2a Drop cheap-startup-cost paths during add_path() if we don't need them.
We can detect whether the planner top level is going to care at all about
cheap startup cost (it will only do so if query_planner's tuple_fraction
argument is greater than zero).  If it isn't, we might as well discard
paths immediately whose only advantage over others is cheap startup cost.
This turns out to get rid of quite a lot of paths in complex queries ---
I saw planner runtime reduction of more than a third on one large query.

Since add_path isn't currently passed the PlannerInfo "root", the easiest
way to tell it whether to do this was to add a bool flag to RelOptInfo.
That's a bit redundant, since all relations in a given query level will
have the same setting.  But in the future it's possible that we'd refine
the control decision to work on a per-relation basis, so this seems like
a good arrangement anyway.

Per my suggestion of a few months ago.
2012-09-01 18:16:24 -04:00
Tom Lane 4da6439bd8 Fix mark_placeholder_maybe_needed to handle LATERAL references.
If a PlaceHolderVar contains a pulled-up LATERAL reference, its minimum
possible evaluation level might be higher in the join tree than its
original syntactic location.  That in turn affects the ph_needed level for
any contained PlaceHolderVars (that is, those PHVs had better propagate up
the join tree at least to the evaluation level of the outer PHV).  We got
this mostly right, but mark_placeholder_maybe_needed() failed to account
for the effect, and in consequence could leave the inner PHVs with
ph_may_need less than what their ultimate ph_needed value will be.  That's
bad because it could lead to failure to select a join order that will allow
evaluation of the inner PHV at a valid location.  Fix that, and add an
Assert that checks that we don't ever set ph_needed to more than
ph_may_need.
2012-09-01 13:56:46 -04:00