Commit Graph

4783 Commits

Author SHA1 Message Date
Tom Lane 7de81124d5 Create a function quote_nullable(), which works the same as quote_literal()
except that it returns the string 'NULL', rather than a SQL null, when called
with a null argument.  This is often a much more useful behavior for
constructing dynamic queries.  Add more discussion to the documentation
about how to use these functions.

Brendan Jurd
2008-03-23 00:24:20 +00:00
Tatsuo Ishii 325c0a39e4 Add server side lo_import(filename, oid) function. 2008-03-22 01:55:14 +00:00
Tom Lane 58a8285542 Remove TypeName struct's timezone flag, which has been write-only storage
for a very long time --- in current usage it's entirely redundant with the
name field.
2008-03-21 22:41:48 +00:00
Tom Lane 4b7ae4afae Report the current queries of all backends involved in a deadlock
(if they'd be visible to the current user in pg_stat_activity).

This might look like it's subject to race conditions, but it's actually
pretty safe because at the time DeadLockReport() is constructing the
report, we haven't yet aborted our transaction and so we can expect that
everyone else involved in the deadlock is still blocked on some lock.
(There are corner cases where that might not be true, such as a statement
timeout triggering in another backend before we finish reporting; but at
worst we'd report a misleading activity string, so it seems acceptable
considering the usefulness of reporting the queries.)

Original patch by Itagaki Takahiro, heavily modified by me.
2008-03-21 21:08:31 +00:00
Tom Lane 2d0583a166 Get rid of a bunch of #ifdef HAVE_INT64_TIMESTAMP conditionals by inventing
a new typedef TimeOffset to represent an intermediate time value.  It's
either int64 or double as appropriate, and in most usages will be measured
in microseconds or seconds the same as Timestamp.  We don't call it
Timestamp, though, since the value doesn't necessarily represent an absolute
time instant.

Warren Turkal
2008-03-21 01:31:43 +00:00
Tom Lane 6b0706ac33 Arrange for an explicit cast applied to an ARRAY[] constructor to be applied
directly to all the member expressions, instead of the previous implementation
where the ARRAY[] constructor would infer a common element type and then we'd
coerce the finished array after the fact.  This has a number of benefits,
one being that we can allow an empty ARRAY[] construct so long as its
element type is specified by such a cast.

Brendan Jurd, minor fixes by me.
2008-03-20 21:42:48 +00:00
Tom Lane 5507b22dfc Support ALTER TYPE RENAME. Petr Jelinek 2008-03-19 18:38:30 +00:00
Tom Lane 0d49838df6 Arrange to "inline" SQL functions that appear in a query's FROM clause,
are declared to return set, and consist of just a single SELECT.  We
can replace the FROM-item with a sub-SELECT and then optimize much as
if we were dealing with a view.  Patch from Richard Rowell, cleaned up
by me.
2008-03-18 22:04:14 +00:00
Peter Eisentraut a7b7b07af3 Enable probes to work with Mac OS X Leopard and other OSes that will
support DTrace in the future.

Switch from using DTRACE_PROBEn macros to the dynamically generated macros.
Use "dtrace -h" to create a header file that contains the dynamically
generated macros to be used in the source code instead of the DTRACE_PROBEn
macros.  A dummy header file is generated for builds without DTrace support.

Author: Robert Lor <Robert.Lor@sun.com>
2008-03-17 19:44:41 +00:00
Magnus Hagander 7cbfa7565e Fix postgres --describe-config for guc enums, breakage noted by Alvaro.
While at it, rename option lookup functions to make names clearer, per
discussion with Tom.
2008-03-17 17:45:09 +00:00
Alvaro Herrera 23057f51f5 Move ProcState definition into sinvaladt.c from sinvaladt.h, since it's not
needed anywhere after my previous patch.  Noticed by Tom Lane.

Also, remove #include <signal.h> from sinval.c.
2008-03-17 11:50:27 +00:00
Tom Lane 32846f8152 Fix TransactionIdIsCurrentTransactionId() to use binary search instead of
linear search when checking child-transaction XIDs.  This makes for an
important speedup in transactions that have large numbers of children,
as in a recent example from Craig Ringer.  We can also get rid of an
ugly kluge that represented lists of TransactionIds as lists of OIDs.

Heikki Linnakangas
2008-03-17 02:18:55 +00:00
Tom Lane 787eba734b When creating a large hash index, pre-sort the index entries by estimated
bucket number, so as to ensure locality of access to the index during the
insertion step.  Without this, building an index significantly larger than
available RAM takes a very long time because of thrashing.  On the other
hand, sorting is just useless overhead when the index does fit in RAM.
We choose to sort when the initial index size exceeds effective_cache_size.

This is a revised version of work by Tom Raney and Shreya Bhargava.
2008-03-16 23:15:08 +00:00
Alvaro Herrera ec6550c6c0 Modify interactions between sinval.c and sinvaladt.c. The code that actually
deals with the queue, including locking etc, is all in sinvaladt.c.  This means
that the struct definition of the queue, and the queue pointer, are now
internal "implementation details" inside sinvaladt.c.

Per my proposal dated 25-Jun-2007 and followup discussion.
2008-03-16 19:47:34 +00:00
Magnus Hagander a3f66eac01 Some cleanups of enum-guc code, per comments from Tom. 2008-03-16 16:42:44 +00:00
Tom Lane c9a1cc694a Change hash index creation so that rather than always establishing exactly
two buckets at the start, we create a number of buckets appropriate for the
estimated size of the table.  This avoids a lot of expensive bucket-split
actions during initial index build on an already-populated table.

This is one of the two core ideas of Tom Raney and Shreya Bhargava's patch
to reduce hash index build time.  I'm committing it separately to make it
easier for people to test the effects of this separately from the effects
of their other core idea (pre-sorting the index entries by bucket number).
2008-03-15 20:46:31 +00:00
Alvaro Herrera adc4e1e635 Fix vacuum so that autovacuum is really not cancelled when doing an emergency
job (i.e. to prevent Xid wraparound problems.)  Bug reported by ITAGAKI
Takahiro in 20080314103837.63D3.52131E4D@oss.ntt.co.jp, though I didn't use his
patch.
2008-03-14 17:25:59 +00:00
Tom Lane 3e701a04fe Fix heap_page_prune's problem with failing to send cache invalidation
messages if the calling transaction aborts later on.  Collapsing out line
pointer redirects is a done deal as soon as we complete the page update,
so syscache *must* be notified even if the VACUUM FULL as a whole doesn't
complete.  To fix, add some functionality to inval.c to allow the pending
inval messages to be sent immediately while heap_page_prune is still
running.  The implementation is a bit chintzy: it will only work in the
context of VACUUM FULL.  But that's all we need now, and it can always be
extended later if needed.  Per my trouble report of a week ago.
2008-03-13 18:00:32 +00:00
Tom Lane 611b4393f2 Make TransactionIdIsInProgress check transam.c's single-item XID status cache
before it goes groveling through the ProcArray.  In situations where the same
recently-committed transaction ID is checked repeatedly by tqual.c, this saves
a lot of shared-memory searches.  And it's cheap enough that it shouldn't
hurt noticeably when it doesn't help.
Concept and patch by Simon, some minor tweaking and comment-cleanup by Tom.
2008-03-11 20:20:35 +00:00
Tom Lane f0828b2fc3 Provide a build-time option to store large relations as single files, rather
than dividing them into 1GB segments as has been our longtime practice.  This
requires working support for large files in the operating system; at least for
the time being, it won't be the default.

Zdenek Kotala
2008-03-10 20:06:27 +00:00
Magnus Hagander 0a66303781 Bump catversion from guc enum patch, which I forgot. Sorry! 2008-03-10 13:53:35 +00:00
Magnus Hagander 52a8d4f8f7 Implement enum type for guc parameters, and convert a couple of existing
variables to it. More need to be converted, but I wanted to get this in
before it conflicts with too much...

Other than just centralising the text-to-int conversion for parameters,
this allows the pg_settings view to contain a list of available options
and allows an error hint to show what values are allowed.
2008-03-10 12:55:13 +00:00
Tom Lane 3fcc7e8e18 Reduce memory consumption during VACUUM of large relations, by using
FSMPageData (6 bytes) instead of PageFreeSpaceInfo (8 or 16 bytes)
for the temporary array of page-free-space information.

Itagaki Takahiro
2008-03-10 02:04:10 +00:00
Tom Lane f4230d2937 Change patternsel() so that instead of switching from a pure
pattern-examination heuristic method to purely histogram-driven selectivity at
histogram size 100, we compute both estimates and use a weighted average.
The weight put on the heuristic estimate decreases linearly with histogram
size, dropping to zero for 100 or more histogram entries.
Likewise in ltreeparentsel().  After a patch by Greg Stark, though I
reorganized the logic a bit to give the caller of histogram_selectivity()
more control.
2008-03-09 00:32:09 +00:00
Tom Lane 6f10eb2111 Refactor heap_page_prune so that instead of changing item states on-the-fly,
it accumulates the set of changes to be made and then applies them.  It had
to accumulate the set of changes anyway to prepare a WAL record for the
pruning action, so this isn't an enormous change; the only new complexity is
to not doubly mark tuples that are visited twice in the scan.  The main
advantage is that we can substantially reduce the scope of the critical
section in which the changes are applied, thus avoiding PANIC in foreseeable
cases like running out of memory in inval.c.  A nice secondary advantage is
that it is now far clearer that WAL replay will actually do the same thing
that the original pruning did.

This commit doesn't do anything about the open problem that
CacheInvalidateHeapTuple doesn't have the right semantics for a CTID change
caused by collapsing out a redirect pointer.  But whatever we do about that,
it'll be a good idea to not do it inside a critical section.
2008-03-08 21:57:59 +00:00
Tom Lane ad434473eb This patch addresses some issues in TOAST compression strategy that
were discussed last year, but we felt it was too late in the 8.3 cycle to
change the code immediately.  Specifically, the patch:

* Reduces the minimum datum size to be considered for compression from
256 to 32 bytes, as suggested by Greg Stark.

* Increases the required compression rate for compressed storage from
20% to 25%, again per Greg's suggestion.

* Replaces force_input_size (size above which compression is forced)
with a maximum size to be considered for compression.  It was agreed
that allowing large inputs to escape the minimum-compression-rate
requirement was not bright, and that indeed we'd rather have a knob
that acted in the other direction.  I set this value to 1MB for the
moment, but it could use some performance studies to tune it.

* Adds an early-failure path to the compressor as suggested by Jan:
if it's been unable to find even one compressible substring in the
first 1KB (parameterizable), assume we're looking at incompressible
input and give up.  (Possibly this logic can be improved, but I'll
commit it as-is for now.)

* Improves the toasting heuristics so that when we have very large
fields with attstorage 'x' or 'e', we will push those out to toast
storage before considering inline compression of shorter fields.
This also responds to a suggestion of Greg's, though my original
proposal for a solution was a bit off base because it didn't fix
the problem for large 'e' fields.

There was some discussion in the earlier threads of exposing some
of the compression knobs to users, perhaps even on a per-column
basis.  I have not done anything about that here.  It seems to me
that if we are changing around the parameters, we'd better get some
experience and be sure we are happy with the design before we set
things in stone by providing user-visible knobs.
2008-03-07 23:20:21 +00:00
Tom Lane 7d6e6e2e97 Fix PREPARE TRANSACTION to reject the case where the transaction has dropped a
temporary table; we can't support that because there's no way to clean up the
source backend's internal state if the eventual COMMIT PREPARED is done by
another backend.  This was checked correctly in 8.1 but I broke it in 8.2 :-(.
Patch by Heikki Linnakangas, original trouble report by John Smith.
2008-03-04 19:54:06 +00:00
Alvaro Herrera 7157114d54 Remove long-unused and broken TCL_ARRAYS. 2008-02-29 20:58:33 +00:00
Magnus Hagander 2d2b022267 Fix handling of restricted processes for Windows Vista (mainly),
by explicitly adding back the user to the DACL of the new process.
This fixes the failure case when executing as the Administrator
user, which had no permissions left at all after we dropped the
Administrators group.

Dave Page with some modifications from me
2008-02-29 15:31:33 +00:00
Tom Lane 9713c06319 Change the declaration of struct varlena so that the length word is
represented as "char ...[4]" not "int32".  Since the length word is never
supposed to be accessed via this struct member anyway, this won't break
any existing code that is following the rules.  The advantage is that C
compilers will no longer assume that a pointer to struct varlena is
word-aligned, which prevents incorrect optimizations in TOAST-pointer
access and perhaps other places.  gcc doesn't seem to do this (at least
not at -O2), but the problem is demonstrable on some other compilers.

I changed struct inet as well, but didn't bother to touch a lot of other
struct definitions in which it wouldn't make any difference because there
were other fields forcing int alignment anyway.  Hopefully none of those
struct definitions are used for accessing unaligned Datums.
2008-02-23 19:11:45 +00:00
Tom Lane 870993e871 Rename miscadmin.h's PG_VERSIONSTR macro to PG_BACKEND_VERSIONSTR to
make it a bit clearer what it is, and get rid of duplicate definitions
in initdb and pg_ctl.
2008-02-20 22:46:24 +00:00
Peter Eisentraut 84ce707ba8 Added --htmldir option to pg_config, equivalent to the new configure option. 2008-02-18 14:51:48 +00:00
Peter Eisentraut b120382353 Upgrade to Autoconf 2.61:
- Change configure.in to use Autoconf 2.61 and update generated files.
- Update build system and documentation to support now directory variables
  offered by Autoconf 2.61.
- Replace usages of PGAC_CHECK_ALIGNOF by AC_CHECK_ALIGNOF, now available
  in Autoconf 2.61.
- Drop our patched version of AC_C_INLINE, as Autoconf now has the change.
2008-02-17 16:36:43 +00:00
Tom Lane cd00406774 Replace time_t with pg_time_t (same values, but always int64) in on-disk
data structures and backend internal APIs.  This solves problems we've seen
recently with inconsistent layout of pg_control between machines that have
32-bit time_t and those that have already migrated to 64-bit time_t.  Also,
we can get out from under the problem that Windows' Unix-API emulation is not
consistent about the width of time_t.

There are a few remaining places where local time_t variables are used to hold
the current or recent result of time(NULL).  I didn't bother changing these
since they do not affect any cross-module APIs and surely all platforms will
have 64-bit time_t before overflow becomes an actual risk.  time_t should
be avoided for anything visible to extension modules, however.
2008-02-17 02:09:32 +00:00
Tom Lane df1e965e12 Sync our regex code with upstream changes since last time we did this, which
was Tcl 8.4.8.  The main changes are to remove the never-fully-implemented
code for multi-character collating elements, and to const-ify some stuff a
bit more fully.  In combination with the recent security patch, this commit
brings us into line with Tcl 8.5.0.

Note that I didn't make any effort to duplicate a lot of cosmetic changes
that they made to bring their copy into line with their own style
guidelines, such as adding braces around single-line IF bodies.  Most of
those we either had done already (such as ANSI-fication of function headers)
or there is no point because pgindent would undo the change anyway.
2008-02-14 17:33:37 +00:00
Tom Lane cf9e156156 Stamp HEAD as 8.4devel. 2008-02-13 03:40:38 +00:00
Tom Lane b7fe5f70d3 Fix CREATE TABLE ... LIKE ... INCLUDING INDEXES to not cause unwanted
tablespace permissions failures when copying an index that is in the
database's default tablespace.  A side-effect of the change is that explicitly
specifying the default tablespace no longer triggers a permissions check;
this is not how it was done in pre-8.3 releases but is argued to be more
consistent.  Per bug #3921 from Andrew Gilligan.  (Note: I argued in the
subsequent discussion that maybe LIKE shouldn't copy index tablespaces
at all, but since no one indicated agreement with that idea, I've refrained
from doing it.)
2008-02-07 17:09:51 +00:00
Bruce Momjian aad140b7ff Stamp 8.3 in CVS. _No_ update of configure/configure.in. 2008-02-01 02:59:02 +00:00
Tom Lane 0688d84041 Add checks to TRUNCATE, CLUSTER, and REINDEX to prevent performing these
operations when the current transaction has any open references to the
target relation or index (implying it has an active query using the relation).
The need for this was previously recognized in connection with ALTER TABLE,
but anything that summarily eliminates tuples or moves them around would
confuse an active scan.

While this patch does not in itself fix bug #3883 (the deadlock would happen
before the new check fires), it will discourage people from attempting the
sequence of operations that creates a deadlock risk, so it's at least a
partial response to that problem.

In passing, add a previously-missing check to REINDEX to prevent trying to
reindex another backend's temp table.  This isn't a security problem since
only a superuser would get past the schema permission checks, but if we are
testing for this in other utility commands then surely REINDEX should too.
2008-01-30 19:46:48 +00:00
Tom Lane 6322e84430 Change StatementCancelHandler() to check the DoingCommandRead flag to decide
whether to execute an immediate interrupt, rather than testing whether
LockWaitCancel() cancelled a lock wait.  The old way misclassified the case
where we were blocked in ProcWaitForSignal(), and arguably would misclassify
any other future additions of new ImmediateInterruptOK states too.  This
allows reverting the old kluge that gave LockWaitCancel() a return value,
since no callers care anymore.  Improve comments in the various
implementations of PGSemaphoreLock() to explain that on some platforms, the
assumption that semop() exits after a signal is wrong, and so we must ensure
that the signal handler itself throws elog if we want cancel or die interrupts
to be effective.  Per testing related to bug #3883, though this patch doesn't
solve those problems fully.

Perhaps this change should be back-patched, but since pre-8.3 branches aren't
really relying on autovacuum to respond to SIGINT, it doesn't seem critical
for them.
2008-01-26 19:55:08 +00:00
Peter Eisentraut 79a323ab49 Change /contrib to contrib for consistency. 2008-01-24 06:23:33 +00:00
Tom Lane b9ff7443e6 Prevent integer overflow within the integer-datetimes version of
TimestampTzPlusMilliseconds.  An integer argument of more than INT_MAX/1000
milliseconds (ie, about 35 minutes) would provoke a wrong result, resulting
in incorrect enforcement of statement_timestamp values larger than that.
Bug was introduced in my rewrite of 2006-06-20, which fixed some other
overflow risks, but missed this one :-(  Per report from Elein.
2008-01-23 21:26:13 +00:00
Tom Lane 716e8b8374 Fix RS_isRegis() to agree exactly with RS_compile()'s idea of what's a valid
regis.  Correct the latter's oversight that a bracket-expression needs to be
terminated.  Reduce the ereports to elogs, since they are now not expected to
ever be hit (thus addressing Alvaro's original complaint).
In passing, const-ify the string argument to RS_compile.
2008-01-21 02:46:11 +00:00
Bruce Momjian 7b4be2ba2f Stamp release for 8.3RC2; configure will be stamped by packager. 2008-01-18 00:13:50 +00:00
Tom Lane 0df7717faa Fix ALTER INDEX RENAME so that if the index belongs to a unique or primary key
constraint, the constraint is renamed as well.  This avoids inconsistent
situations that could confuse pg_dump (not to mention humans).  We might at
some point provide ALTER TABLE RENAME CONSTRAINT as a more general solution,
but there seems no reason not to allow doing it this way too.  Per bug #3854
and related discussions.
2008-01-17 18:56:54 +00:00
Tom Lane ac12412ede Revise memory management for libxml calls. Instead of keeping libxml's data
in whichever context happens to be current during a call of an xml.c function,
use a dedicated context that will not go away until we explicitly delete it
(which we do at transaction end or subtransaction abort).  This makes recovery
after an error much simpler --- we don't have to individually delete the data
structures created by libxml.  Also, we need to initialize and cleanup libxml
only once per transaction (if there's no error) instead of once per function
call, so it should be a bit faster.  We'll need to keep an eye out for
intra-transaction memory leaks, though.  Alvaro and Tom.
2008-01-15 18:57:00 +00:00
Tom Lane d3b1b1f9d8 Fix CREATE INDEX CONCURRENTLY so that it won't use synchronized scan for
its second pass over the table.  It has to start at block zero, else the
"merge join" logic for detecting which TIDs are already in the index
doesn't work.  Hence, extend heapam.c's API so that callers can enable or
disable syncscan.  (I put in an option to disable buffer access strategy,
too, just in case somebody needs it.)  Per report from Hannes Dorbath.
2008-01-14 01:39:09 +00:00
Tom Lane 89c0a87fda The original implementation of polymorphic aggregates didn't really get the
checking of argument compatibility right; although the problem is only exposed
with multiple-input aggregates in which some arguments are polymorphic and
some are not.  Per bug #3852 from Sokolov Yura.
2008-01-11 18:39:41 +00:00
Tom Lane 59fc64acee Fix a conceptual error in my patch of 2007-10-26 that avoided considering
clauseless joins of relations that have unexploited join clauses.  Rather
than looking at every other base relation in the query, the correct thing is
to examine the other relations in the "initial_rels" list of the current
make_rel_from_joinlist() invocation, because those are what we actually have
the ability to join against.  This might be a subset of the whole query in
cases where join_collapse_limit or from_collapse_limit or full joins have
prevented merging the whole query into a single join problem.  This is a bit
untidy because we have to pass those rels down through a new PlannerInfo
field, but it's necessary.  Per bug #3865 from Oleg Kharin.
2008-01-11 04:02:18 +00:00
Tom Lane ceb9360067 Fix CREATE INDEX CONCURRENTLY to not deadlock against an automatic or manual
VACUUM that is blocked waiting to get lock on the table being indexed.
Per report and fix suggestion from Greg Stark.
2008-01-09 21:52:36 +00:00
Tom Lane 6a6522529f Fix some planner issues found while investigating Kevin Grittner's report
of poorer planning in 8.3 than 8.2:

1. After pushing a constant across an outer join --- ie, given
"a LEFT JOIN b ON (a.x = b.y) WHERE a.x = 42", we can deduce that b.y is
sort of equal to 42, in the sense that we needn't fetch any b rows where
it isn't 42 --- loop to see if any additional deductions can be made.
Previous releases did that by recursing, but I had mistakenly thought that
this was no longer necessary given the EquivalenceClass machinery.

2. Allow pushing constants across outer join conditions even if the
condition is outerjoin_delayed due to a lower outer join.  This is safe
as long as the condition is strict and we re-test it at the upper join.

3. Keep the outer-join clause even if we successfully push a constant
across it.  This is *necessary* in the outerjoin_delayed case, but
even in the simple case, it seems better to do this to ensure that the
join search order heuristics will consider the join as reasonable to
make.  Mark such a clause as having selectivity 1.0, though, since it's
not going to eliminate very many rows after application of the constant
condition.

4. Tweak have_relevant_eclass_joinclause to report that two relations
are joinable when they have vars that are equated to the same constant.
We won't actually generate any joinclause from such an EquivalenceClass,
but again it seems that in such a case it's a good idea to consider
the join as worth costing out.

5. Fix a bug in select_mergejoin_clauses that was exposed by these
changes: we have to reject candidate mergejoin clauses if either side was
equated to a constant, because we can't construct a canonical pathkey list
for such a clause.  This is an implementation restriction that might be
worth fixing someday, but it doesn't seem critical to get it done for 8.3.
2008-01-09 20:42:29 +00:00
Magnus Hagander 8d546c7170 Don't enforce 32-bit time_t for FRONTEND apps. Fixes standalone
builds of libpq in both 32 and 64-bit. Per gripe from Hiroshi Saito.
2008-01-09 09:16:43 +00:00
Tom Lane da3df47c84 lmgr.c:DescribeLockTag was never taught about virtual xids, per Greg Stark.
Also a couple of minor tweaks to try to future-proof the code a bit better
against future locktag additions.
2008-01-08 23:18:51 +00:00
Tom Lane 2bf121e40b Stamp release 8.3RC1.
Security: CVE-2007-4769, CVE-2007-4772, CVE-2007-6067, CVE-2007-6600, CVE-2007-6601
2008-01-03 21:40:12 +00:00
Tom Lane eedb068c0a Make standard maintenance operations (including VACUUM, ANALYZE, REINDEX,
and CLUSTER) execute as the table owner rather than the calling user, using
the same privilege-switching mechanism already used for SECURITY DEFINER
functions.  The purpose of this change is to ensure that user-defined
functions used in index definitions cannot acquire the privileges of a
superuser account that is performing routine maintenance.  While a function
used in an index is supposed to be IMMUTABLE and thus not able to do anything
very interesting, there are several easy ways around that restriction; and
even if we could plug them all, there would remain a risk of reading sensitive
information and broadcasting it through a covert channel such as CPU usage.

To prevent bypassing this security measure, execution of SET SESSION
AUTHORIZATION and SET ROLE is now forbidden within a SECURITY DEFINER context.

Thanks to Itagaki Takahiro for reporting this vulnerability.

Security: CVE-2007-6600
2008-01-03 21:23:15 +00:00
Tom Lane 98f27aaef3 Fix assorted security-grade bugs in the regex engine. All of these problems
are shared with Tcl, since it's their code to begin with, and the patches
have been copied from Tcl 8.5.0.  Problems:

CVE-2007-4769: Inadequate check on the range of backref numbers allows
crash due to out-of-bounds read.
CVE-2007-4772: Infinite loop in regex optimizer for pattern '($|^)*'.
CVE-2007-6067: Very slow optimizer cleanup for regex with a large NFA
representation, as well as crash if we encounter an out-of-memory condition
during NFA construction.

Part of the response to CVE-2007-6067 is to put a limit on the number of
states in the NFA representation of a regex.  This seems needed even though
the within-the-code problems have been corrected, since otherwise the code
could try to use very large amounts of memory for a suitably-crafted regex,
leading to potential DOS by driving the system into swap, activating a kernel
OOM killer, etc.

Although there are certainly plenty of ways to drive the system into effective
DOS with poorly-written SQL queries, these problems seem worth treating as
security issues because many applications might accept regex search patterns
from untrustworthy sources.

Thanks to Will Drewry of Google for reporting these problems.  Patches by Will
Drewry and Tom Lane.

Security: CVE-2007-4769, CVE-2007-4772, CVE-2007-6067
2008-01-03 20:47:55 +00:00
Tom Lane 20e862155f Forbid ALTER TABLE and CLUSTER when there are pending AFTER-trigger events
in the current backend for the target table.  These operations move tuples
around and would thus invalidate the TIDs stored in the trigger event records.
(We need not worry about events in other backends, since acquiring exclusive
lock should be enough to ensure there aren't any.)  It might be sufficient
to forbid only the table-rewriting variants of ALTER TABLE, but in the absence
of any compelling use-case, let's just be safe and simple.  Per follow-on
investigation of bug #3847, though this is not actually the same problem
reported therein.

Possibly this should be back-patched, but since the case has never been
reported from the field, I didn't bother.
2008-01-02 23:34:42 +00:00
Bruce Momjian 14b5eaa236 Correct two more copyrights found by updated script. 2008-01-02 02:42:06 +00:00
Tom Lane ce9baa06f0 Fix some missed copyright updates. 2008-01-01 20:31:21 +00:00
Bruce Momjian 9098ab9e32 Update copyrights in source tree to 2008. 2008-01-01 19:46:01 +00:00
Tom Lane 5233dc15cf Improve consistency of error reporting in GUC assign_hook routines. Some
were reporting ERROR for interactive assignments and LOG for other cases,
some were saying nothing for non-interactive cases, and a few did yet other
things.  Make them use a new function GUC_complaint_elevel() to establish
a reasonably uniform policy about how to report.  There are still a few
edge cases such as assign_search_path(), but it's much better than before.
Per gripe from Devrim Gunduz and subsequent discussion.

As noted by Alvaro, it'd be better to fold these custom messages into the
standard "invalid parameter value" complaint from guc.c, perhaps as the DETAIL
field.  However that will require more redesign than seems prudent for 8.3.
This is a relatively safe, low-impact change that we can afford to risk now.
2007-12-28 00:23:23 +00:00
Magnus Hagander 22867ab986 Use _USE_32BIT_TIME_T when building with MSVC. Also, enforce that it's
used when building addons.

Dave Page
2007-12-11 14:34:43 +00:00
Tom Lane 9fd8843647 Fix mergejoin cost estimation so that we consider the statistical ranges of
the two join variables at both ends: not only trailing rows that need not be
scanned because there cannot be a match on the other side, but initial rows
that will be scanned without possibly having a match.  This allows a more
realistic estimate of startup cost to be made, per recent pgsql-performance
discussion.  In passing, fix a couple of bugs that had crept into
mergejoinscansel: it was not quite up to speed for the task of estimating
descending-order scans, which is a new requirement in 8.3.
2007-12-08 21:05:11 +00:00
Tom Lane 01434d41d4 Stamp 8.3beta4. 2007-12-03 00:11:01 +00:00
Tom Lane 265f904d8f Code review for LIKE ... INCLUDING INDEXES patch. Fix failure to propagate
constraint status of copied indexes (bug #3774), as well as various other
small bugs such as failure to pstrdup when needed.  Allow INCLUDING INDEXES
indexes to be merged with identical declared indexes (perhaps not real useful,
but the code is there and having it not apply to LIKE indexes seems pretty
unorthogonal).  Avoid useless work in generateClonedIndexStmt().  Undo some
poorly chosen API changes, and put a couple of routines in modules that seem
to be better places for them.
2007-12-01 23:44:44 +00:00
Tom Lane 895a94de6d Avoid incrementing the CommandCounter when CommandCounterIncrement is called
but no database changes have been made since the last CommandCounterIncrement.
This should result in a significant improvement in the number of "commands"
that can typically be performed within a transaction before hitting the 2^32
CommandId size limit.  In particular this buys back (and more) the possible
adverse consequences of my previous patch to fix plan caching behavior.

The implementation requires tracking whether the current CommandCounter
value has been "used" to mark any tuples.  CommandCounter values stored into
snapshots are presumed not to be used for this purpose.  This requires some
small executor changes, since the executor used to conflate the curcid of
the snapshot it was using with the command ID to mark output tuples with.
Separating these concepts allows some small simplifications in executor APIs.

Something for the TODO list: look into having CommandCounterIncrement not do
AcceptInvalidationMessages.  It seems fairly bogus to be doing it there,
but exactly where to do it instead isn't clear, and I'm disinclined to mess
with asynchronous behavior during late beta.
2007-11-30 21:22:54 +00:00
Tom Lane 11fccbeaeb Adjust the names of a couple of tsearch index support functions that had
inappropriately generic-sounding names.  This is more or less free since
we already forced initdb for the next beta, and it may prevent confusion or
name conflicts (particularly at the C-global-symbol level) down the road.
Per my proposal yesterday.
2007-11-28 19:33:05 +00:00
Tom Lane d54ca56743 Install a lookaside cache to speed up repeated lookups of the same operator
by short-circuiting schema search path and ambiguous-operator resolution
computations.  Remarkably, this buys as much as 45% speedup of repetitive
simple queries that involve operators that are not an exact match to the
input datatypes.  It should be marginally faster even for exact-match
cases, though I've not had success in proving an improvement in benchmark
tests.  Per report from Guillame Smet and subsequent discussion.
2007-11-28 18:47:56 +00:00
Tom Lane 3f89964234 Add quote_literal(anyelement) to preserve (and, in fact, extend) a
useful consequence of the former liberal implicit casting to text;
namely that you can feed non-string values to quote_literal() and get
unsurprising results.  Per discussion.
2007-11-27 18:29:11 +00:00
Peter Eisentraut 7888b52076 Make casts from xml to text independent of the XML option setting, thus
immutable and indexable.  Also fix the volatility settings of some other
XML-related functions.
2007-11-27 12:21:05 +00:00
Bruce Momjian 0d072761d3 Borland BCC does not support SSPI, per cnliou9@fastmail.fm. 2007-11-24 01:55:26 +00:00
Bruce Momjian 7a1b72f6ef Borland CC 5.5.1 needs ssize_t, per cnliou9@fastmail.fm. 2007-11-24 01:32:48 +00:00
Tom Lane ef48ed4c86 Actually ... it's pretty silly that parse_oper.c doesn't set up the
opfuncid of an OpExpr initially, considering that it has the information
at hand already.  We'll still treat opfuncid as a cache rather than a
guaranteed-valid value, but this change saves one more syscache lookup
in the normal code path.
2007-11-22 19:40:25 +00:00
Bruce Momjian 64e0d32133 WSATYPE_NOT_FOUND was already defined for BCC so don't redefine it
(conflicting values).
2007-11-21 23:13:36 +00:00
Tom Lane 16dcd5e5ce GIN index build's allocatedMemory counter needs to be long, not uint32.
Else, in a 64-bit machine with maintenance_work_mem set to above 4Gb,
the counter overflows and we never recognize having reached the
maintenance_work_mem limit.  I believe this explains out-of-memory
failure recently reported by Sean Davis.

This is a bug, so backpatch to 8.2.
2007-11-16 21:50:06 +00:00
Tom Lane 93190c3098 Repair still another bug in the btree page split WAL reduction patch:
it failed for splits of non-leaf pages because in such pages the first
data key on a page is suppressed, and so we can't just copy the first
key from the right page to reconstitute the left page's high key.
Problem found by Koichi Suzuki, patch by Heikki.
2007-11-16 19:53:50 +00:00
Marc G. Fournier 2a174e45dd update files for beta3 2007-11-16 04:29:45 +00:00
Bruce Momjian f6e8730d11 Re-run pgindent with updated list of typedefs. (Updated README should
avoid this problem in the future.)
2007-11-15 22:25:18 +00:00
Bruce Momjian fdf5a5efb7 pgindent run for 8.3. 2007-11-15 21:14:46 +00:00
Tom Lane 6cc4451b5c Prevent re-use of a deleted relation's relfilenode until after the next
checkpoint.  This guards against an unlikely data-loss scenario in which
we re-use the relfilenode, then crash, then replay the deletion and
recreation of the file.  Even then we'd be OK if all insertions into the
new relation had been WAL-logged ... but that's not guaranteed given all
the no-WAL-logging optimizations that have recently been added.

Patch by Heikki Linnakangas, per a discussion last month.
2007-11-15 20:36:40 +00:00
Tom Lane 4394c1b09c Resurrect the code for the rewrite(ARRAY[...]) aggregate function,
and put it into contrib/tsearch2 compatibility module.
2007-11-13 22:14:50 +00:00
Tom Lane 0bd4da23a4 Ensure that typmod decoration on a datatype name is validated in all cases,
even in code paths where we don't pay any subsequent attention to the typmod
value.  This seems needed in view of the fact that 8.3's generalized typmod
support will accept a lot of bogus syntax, such as "timestamp(foo)" or
"record(int, 42)" --- if we allow such things to pass without comment,
users will get confused.  Per a recent example from Greg Stark.

To implement this in a way that's not very vulnerable to future
bugs-of-omission, refactor the API of parse_type.c's TypeName lookup routines
so that typmod validation is folded into the base lookup operation.  Callers
can still choose not to receive the encoded typmod, but we'll check the
decoration anyway if it's present.
2007-11-11 19:22:49 +00:00
Tom Lane 654dcfb9e4 Clean up ts_locale.h/.c. Fix broken and not-consistent-across-platforms
behavior of wchar2char/char2wchar; this should resolve bug #3730.  Avoid
excess computations of pg_mblen in t_isalpha and friends.  Const-ify
APIs where possible.
2007-11-09 22:37:35 +00:00
Magnus Hagander 4b606ee444 Add parameter krb_realm used by GSSAPI, SSPI and Kerberos
to validate the realm of the connecting user. By default
it's empty meaning no verification, which is the way
Kerberos authentication has traditionally worked in
PostgreSQL.
2007-11-09 17:31:07 +00:00
Tom Lane c291203ca3 Fix EquivalenceClass code to handle volatile sort expressions in a more
predictable manner; in particular that if you say ORDER BY output-column-ref,
it will in fact sort by that specific column even if there are multiple
syntactic matches.  An example is
	SELECT random() AS a, random() AS b FROM ... ORDER BY b, a;
While the use-case for this might be a bit debatable, it worked as expected
in earlier releases, so we should preserve the behavior for 8.3.  Per my
recent proposal.

While at it, fix convert_subquery_pathkeys() to handle RelabelType stripping
in both directions; it needs this for the same reasons make_sort_from_pathkeys
does.
2007-11-08 21:49:48 +00:00
Tom Lane 1be0601681 Last week's patch for make_sort_from_pathkeys wasn't good enough: it has
to be able to discard top-level RelabelType nodes on *both* sides of the
equivalence-class-to-target-list comparison, since make_pathkey_from_sortinfo
might either add or remove a RelabelType.  Also fix the latter to do the
removal case cleanly.  Per example from Peter.
2007-11-08 19:25:37 +00:00
Tom Lane 2de946be6a Improve the performance of LIKE/regex estimation in non-C locales, by making
make_greater_string() try harder to generate a string that's actually greater
than its input string.  Before we just assumed that making a string that was
memcmp-greater was enough, but it is easy to generate examples where this is
not so when the locale is not C.  Instead, loop until the relevant comparison
function agrees that the generated string is greater than the input.

Unfortunately this is probably not enough to guarantee that the generated
string is greater than all extensions of the input, so we cannot relax the
restriction to C locale for the LIKE/regex index optimization.  But it should
at least improve the odds of getting a useful selectivity estimate in
prefix_selectivity().  Per example from Guillaume Smet.

Backpatch to 8.1, mainly because that's what the complainant is using...
2007-11-07 22:37:24 +00:00
Peter Eisentraut 5f9869d0ee Use "alternative" instead of "alternate" where it is clearer. 2007-11-07 12:24:24 +00:00
Bruce Momjian c1a03bee08 Document that configure option only affects contrib:
--with-ossp-uuid        use OSSP UUID library when building /contrib/uuid-ossp
2007-11-05 17:43:20 +00:00
Tom Lane a06ce21c72 Add a note about another issue that needs to be considered before
changing the TOAST size thresholds.
2007-11-05 14:11:17 +00:00
Tom Lane b17b7fae8c Remove the hack in the grammar that "optimized away" DEFAULT NULL clauses.
Instead put in a test to drop a NULL default at the last moment before
storing the catalog entry.  This changes the behavior in a couple of ways:
* Specifying DEFAULT NULL when creating an inheritance child table will
  successfully suppress inheritance of any default expression from the
  parent's column, where formerly it failed to do so.
* Specifying DEFAULT NULL for a column of a domain type will correctly
  override any default belonging to the domain; likewise for a sub-domain.
The latter change happens because by the time the clause is checked,
it won't be a simple null Const but a CoerceToDomain expression.

Personally I think this should be back-patched, but there doesn't seem to
be consensus for that on pgsql-hackers, so refraining.
2007-10-29 19:40:40 +00:00
Magnus Hagander 6f14effbf9 New versions of mingw have gettimeofday(), so add an autoconf test
for this.
2007-10-29 11:25:42 +00:00
Tom Lane 5b5a70aedf Stamp 8.3beta2. 2007-10-27 00:22:42 +00:00
Magnus Hagander bb98b2e27e Change win32 child-death tracking code to use a threadpool to wait for
childprocess deaths instead of using one thread per child. This drastastically
reduces the address space usage and should allow for more backends running.

Also change the win32_waitpid functionality to use an IO Completion Port for
queueing child death notices instead of using a fixed-size array.
2007-10-26 21:50:10 +00:00
Alvaro Herrera acac68b2bc Allow an autovacuum worker to be interrupted automatically when it is found
to be locking another process (except when it's working to prevent Xid
wraparound problems).
2007-10-26 20:45:10 +00:00
Tom Lane 048efc25e4 Disallow scrolling of FOR UPDATE/FOR SHARE cursors, so as to avoid problems
in corner cases such as re-fetching a just-deleted row.  We may be able to
relax this someday, but let's find out how many people really care before
we invest a lot of work in it.  Per report from Heikki and subsequent
discussion.

While in the neighborhood, make the combination of INSENSITIVE and FOR UPDATE
throw an error, since they are semantically incompatible.  (Up to now we've
accepted but just ignored the INSENSITIVE option of DECLARE CURSOR.)
2007-10-24 23:27:08 +00:00
Alvaro Herrera 745c1b2c2a Rearrange vacuum-related bits in PGPROC as a bitmask, to better support
having several of them.  Add two more flags: whether the process is
executing an ANALYZE, and whether a vacuum is for Xid wraparound (which
is obviously only set by autovacuum).

Sneakily move the worker's recently-acquired PostAuthDelay to a more useful
place.
2007-10-24 20:55:36 +00:00
Tom Lane c29a9c37bf Fix UPDATE/DELETE WHERE CURRENT OF to support repeated update and update-
then-delete on the current cursor row.  The basic fix is that nodeTidscan.c
has to apply heap_get_latest_tid() to the current-scan-TID obtained from the
cursor query; this ensures we get the latest row version to work with.
However, since that only works if the query plan is a TID scan, we also have
to hack the planner to make sure only that type of plan will be selected.
(Formerly, the planner might decide to apply a seqscan if the table is very
small.  This change is probably a Good Thing anyway, since it's hard to see
how a seqscan could really win.)  That means the execQual.c code to support
CurrentOfExpr as a regular expression type is dead code, so replace it with
just an elog().  Also, add regression tests covering these cases.  Note
that the added tests expose the fact that re-fetching an updated row
misbehaves if the cursor used FOR UPDATE.  That's an independent bug that
should be fixed later.  Per report from Dharmendra Goyal.
2007-10-24 18:37:09 +00:00
Tom Lane 592c88a0d2 Remove the aggregate form of ts_rewrite(), since it doesn't work as desired
if there are zero rows to aggregate over, and the API seems both conceptually
and notationally ugly anyway.  We should look for something that improves
on the tsquery-and-text-SELECT version (which is also pretty ugly but at
least it works...), but it seems that will take query infrastructure that
doesn't exist today.  (Hm, I wonder if there's anything in or near SQL2003
window functions that would help?)  Per discussion.
2007-10-24 02:24:49 +00:00
Tom Lane 07d0a370c1 Make configure probe for the location of the <uuid.h> header file.
Needed to accommodate different layout on some platforms (Debian for
one).  Heikki Linnakangas
2007-10-23 21:38:16 +00:00
Tom Lane dbaec70c15 Rename and slightly redefine the default text search parser's "word"
categories, as per discussion.  asciiword (formerly lword) is still
ASCII-letters-only, and numword (formerly word) is still the most general
mixed-alpha-and-digits case.  But word (formerly nlword) is now
any-group-of-letters-with-at-least-one-non-ASCII, rather than all-non-ASCII as
before.  This is no worse than before for parsing mixed Russian/English text,
which seems to have been the design center for the original coding; and it
should simplify matters for parsing most European languages.  In particular
it will not be necessary for any language to accept strings containing digits
as being regular "words".  The hyphenated-word categories are adjusted
similarly.
2007-10-23 20:46:12 +00:00
Tom Lane 12f25e70a6 Fix two-argument form of ts_rewrite() so it actually works for cases where
a later rewrite rule should change a subtree modified by an earlier one.
Per my gripe of a few days ago.
2007-10-23 01:44:40 +00:00
Tom Lane 3e17ef1cfa Adjust ts_debug's output as per my proposal of yesterday: show the
active dictionary and its output lexemes as separate columns, instead
of smashing them into one text column, and lowercase the column names.
Also, define the output rowtype using OUT parameters instead of a
composite type, to be consistent with the other built-in functions.
2007-10-22 20:13:37 +00:00
Tom Lane 1ea47dd8cb Fix shared tsvector/tsquery input code so that we don't say "syntax error in
tsvector" when we are really parsing a tsquery.  Report the bogus input,
too.  Make styles of some related error messages more consistent.
2007-10-21 22:29:56 +00:00
Tom Lane 638bd34f89 Found another small glitch in tsearch API: the two versions of ts_lexize()
are really redundant, since we invented a regdictionary alias type.
We can have just one function, declared as taking regdictionary, and
it will handle both behaviors.  Noted while working on documentation.
2007-10-19 22:01:45 +00:00
Tom Lane ba6b0bfd63 ts_rewrite() does not return a set, only one row; fix mislabeling in
pg_proc.h.
2007-10-19 19:48:34 +00:00
Tom Lane febd60bf5d Fix pg_wchar_table[] to match revised ordering of the encoding ID enum.
Add some comments so hopefully the next poor sod doesn't fall into the
same trap.  (Wrong comments are worse than none at all...)
2007-10-15 22:46:27 +00:00
Tom Lane 7cf3ff109d make install is supposed to install everything under src/include/,
but it was missing a bunch of recently-added subdirectories.
2007-10-14 17:07:51 +00:00
Tom Lane 18e3fcc31e Migrate the former contrib/txid module into core. This will make it easier
for Slony and Skytools to depend on it.  Per discussion.
2007-10-13 23:06:28 +00:00
Tom Lane 8468146b03 Fix the inadvertent libpq ABI breakage discovered by Martin Pitt: the
renumbering of encoding IDs done between 8.2 and 8.3 turns out to break 8.2
initdb and psql if they are run with an 8.3beta1 libpq.so.  For the moment
we can rearrange the order of enum pg_enc to keep the same number for
everything except PG_JOHAB, which isn't a problem since there are no direct
references to it in the 8.2 programs anyway.  (This does force initdb
unfortunately.)

Going forward, we want to fix things so that encoding IDs can be changed
without an ABI break, and this commit includes the changes needed to allow
libpq's encoding IDs to be treated as fully independent of the backend's.
The main issue is that libpq clients should not include pg_wchar.h or
otherwise assume they know the specific values of libpq's encoding IDs,
since they might encounter version skew between pg_wchar.h and the libpq.so
they are using.  To fix, have libpq officially export functions needed for
encoding name<=>ID conversion and validity checking; it was doing this
anyway unofficially.

It's still the case that we can't renumber backend encoding IDs until the
next bump in libpq's major version number, since doing so will break the
8.2-era client programs.  However the code is now prepared to avoid this
type of problem in future.

Note that initdb is no longer a libpq client: we just pull in the two
source files we need directly.  The patch also fixes a few places that
were being sloppy about checking for an unrecognized encoding name.
2007-10-13 20:18:42 +00:00
Tom Lane 537e92e41f Fix ALTER COLUMN TYPE to preserve the tablespace and reloptions of indexes
it affects.  The original coding neglected tablespace entirely (causing
the indexes to move to the database's default tablespace) and for an index
belonging to a UNIQUE or PRIMARY KEY constraint, it would actually try to
assign the parent table's reloptions to the index :-(.  Per bug #3672 and
subsequent investigation.

8.0 and 8.1 did not have reloptions, but the tablespace bug is present.
2007-10-13 15:55:40 +00:00
Tom Lane 82d8ab6fc4 Fix the plan-invalidation mechanism to treat regclass constants that refer to
a relation as a reason to invalidate a plan when the relation changes.  This
handles scenarios such as dropping/recreating a sequence that is referenced by
nextval('seq') in a cached plan.  Rather than teach plancache.c all about
digging through plan trees to find regclass Consts, we charge the planner's
setrefs.c with making a list of the relation OIDs on which each plan depends.
That way the list can be built cheaply during a plan tree traversal that has
to happen anyway.  Per bug #3662 and subsequent discussion.
2007-10-11 18:05:27 +00:00
Tom Lane c4db0d9ae1 Adjust regcustom.h so that all those assert() calls in the regex package
are converted to Postgres Assert() macros, instead of using <assert.h>
as formerly.  No difference in production builds, but --enable-cassert
debug builds will get better coverage for regex testing.
2007-10-06 16:01:51 +00:00
Tom Lane 89db887b1e Keep the planner from failing on "WHERE false AND something IN (SELECT ...)".
eval_const_expressions simplifies this to just "WHERE false", but we have
already done pull_up_IN_clauses so the IN join will be done, or at least
planned, anyway.  The trouble case comes when the sub-SELECT is itself a join
and we decide to implement the IN by unique-ifying the sub-SELECT outputs:
with no remaining reference to the output Vars in WHERE, we won't have
propagated the Vars up to the upper join point, leading to "variable not found
in subplan target lists" error.  Fix by adding an extra scan of in_info_list
and forcing all Vars mentioned therein to be propagated up to the IN join
point.  Per bug report from Miroslav Sulc.
2007-10-04 20:44:47 +00:00
Tom Lane 87dfa0d9ae Stamp 8.3beta1, except in configure.in/configure. 2007-10-04 19:12:04 +00:00
Tom Lane f1d37a9997 Cope with ERR_set_mark() and ERR_pop_to_mark() not existing in older
OpenSSL libraries --- just don't call them if they're not there.  This
might possibly lead to misleading error messages, but we'll just have
to live with that.
2007-10-02 00:25:20 +00:00
Tom Lane b526462f9e Avoid assuming that struct varattrib_pointer doesn't get padded by the
compiler --- at least on ARM, it does.  I suspect that the varvarlena patch
has been creating larger-than-intended toast pointers all along on ARM,
but it wasn't exposed until the latest tweak added some Asserts that
calculated the expected size in a different way.  We could probably have
fixed this by adding __attribute__((packed)) as is done for ItemPointerData,
but struct varattrib_pointer isn't really all that useful anyway, so it
seems cleanest to just get rid of it and have only struct varattrib_1b_e.
Per results from buildfarm member quagga.
2007-10-01 16:25:56 +00:00
Magnus Hagander 4164e6636e Enable __FUNCTION__ on MSVC builds.
Hannes Eder
2007-10-01 10:54:29 +00:00
Tom Lane 27b8922221 Add an extra header byte to TOAST-pointer datums to represent their size
explicitly.  This means a TOAST pointer takes 18 bytes instead of 17 --- still
smaller than in 8.2 --- which seems a good tradeoff to ensure we won't have
painted ourselves into a corner if we want to support multiple types of TOAST
pointer later on.  Per discussion with Greg Stark.
2007-09-30 19:54:58 +00:00
Tom Lane 70b9b9b788 Change initdb and CREATE DATABASE to actively reject attempts to create
databases with encodings that are incompatible with the server's LC_CTYPE
locale, when we can determine that (which we can on most modern platforms,
I believe).  C/POSIX locale is compatible with all encodings, of course,
so there is still some usefulness to CREATE DATABASE's ENCODING option,
but this will insulate us against all sorts of recurring complaints
caused by mismatched settings.

I moved initdb's existing LC_CTYPE-to-encoding mapping knowledge into
a new src/port/ file so it could be shared by CREATE DATABASE.
2007-09-28 22:25:49 +00:00
Tom Lane aedc5ed571 Fix typos in two comments. Spotted by Brendan Jurd 2007-09-27 21:01:59 +00:00
Tom Lane 314ed5de6d Define the FRONTEND symbol in postgres_fe.h, which allows us to eliminate
duplicative -DFRONTEND flags from many Makefiles.  We still need Makefile
control of the symbol in a few places that compile frontend-or-backend
src/port/ files, but it's a lot cleaner than before.

Hiroshi Saito
2007-09-27 19:53:44 +00:00
Tom Lane f18dfc4835 Minor improvements in backup and recovery:
- create a separate archive_mode GUC, on which archive_command is dependent

- %r option in recovery.conf sends last restartpoint to recovery command

- %r used in pg_standby, updated README

- minor other code cleanup in pg_standby

- doc on Warm Standby now mentions pg_standby and %r

- log_restartpoints recovery option emits LOG message at each restartpoint

- end of recovery now displays last transaction end time, as requested
  by Warren Little; also shown at each restartpoint

- restart archiver if needed to carry away WAL files at shutdown

Simon Riggs
2007-09-26 22:36:30 +00:00
Tom Lane cdf0231c88 Create a function variable "join_search_hook" to let plugins override the
join search order portion of the planner; this is specifically intended to
simplify developing a replacement for GEQO planning.  Patch by Julius
Stroffek, editorialized on by me.  I renamed make_one_rel_by_joins to
standard_join_search and make_rels_by_joins to join_search_one_level to better
reflect their place within this scheme.
2007-09-26 18:51:51 +00:00
Tom Lane f828f878e9 Change on-disk representation of NUMERIC datatype so that the sign_dscale
word comes before the weight instead of after.  This will allow future
binary-compatible extension of the representation to support compact formats,
as discussed on pgsql-hackers around 2007/06/18.  The reason to do it now is
that we've already pretty well broken any chance of simple in-place upgrade
from 8.2 to 8.3, but it's possible that 8.3 to 8.4 (or whenever we get around
to squeezing NUMERIC) could otherwise be data-compatible.
2007-09-25 22:21:55 +00:00
Tom Lane 6f5c38dcd0 Just-in-time background writing strategy. This code avoids re-scanning
buffers that cannot possibly need to be cleaned, and estimates how many
buffers it should try to clean based on moving averages of recent allocation
requests and density of reusable buffers.  The patch also adds a couple
more columns to pg_stat_bgwriter to help measure the effectiveness of the
bgwriter.

Greg Smith, building on his own work and ideas from several other people,
in particular a much older patch from Itagaki Takahiro.
2007-09-25 20:03:38 +00:00
Tom Lane 48f7e64395 Simplify and rename some GUC variables, per various recent discussions:
* stats_start_collector goes away; we always start the collector process,
unless prevented by a problem with setting up the stats UDP socket.

* stats_reset_on_server_start goes away; it seems useless in view of the
availability of pg_stat_reset().

* stats_block_level and stats_row_level are merged into a single variable
"track_counts", which controls all reports sent to the collector process.

* stats_command_string is renamed to track_activities.

* log_autovacuum is renamed to log_autovacuum_min_duration to better reflect
its meaning.

The log_autovacuum change is not a compatibility issue since it didn't exist
before 8.3 anyway.  The other changes need to be release-noted.
2007-09-24 03:12:23 +00:00
Andrew Dunstan 02138357ff Remove "convert 'blah' using conversion_name" facility, because if it
produces text it is an encoding hole and if not it's incompatible
with the spec, whatever the spec means (which we're not sure about anyway).
2007-09-24 01:29:30 +00:00
Tom Lane 7125687511 Fix cost estimates for EXISTS subqueries that are evaluated as initPlans
(because they are uncorrelated with the immediate parent query).  We were
charging the full run cost to the parent node, disregarding the fact that
only one row need be fetched for EXISTS.  While this would only be a
cosmetic issue in most cases, it might possibly affect planning outcomes
if the parent query were itself a subquery to some upper query.
Per recent discussion with Steve Crawford.
2007-09-22 21:36:40 +00:00
Tom Lane 571340a00e Parenthesize macro arguments safely. I see no bug among the current
uses of PG_DETOAST_DATUM_SLICE, but it's clearly trouble waiting to
happen.
2007-09-22 04:41:19 +00:00
Tom Lane 7583f9a7ca Fix regex, LIKE, and some other second-rank text-manipulation functions
to not cause needless copying of text datums that have 1-byte headers.
Greg Stark, in response to performance gripe from Guillaume Smet and
ITAGAKI Takahiro.
2007-09-21 22:52:52 +00:00
Tom Lane cc59049daf Improve handling of prune/no-prune decisions by storing a page's oldest
unpruned XMAX in its header.  At the cost of 4 bytes per page, this keeps us
from performing heap_page_prune when there's no chance of pruning anything.
Seems to be necessary per Heikki's preliminary performance testing.
2007-09-21 21:25:42 +00:00
Tom Lane 017daed0dd If we're gonna provide an --enable-profiling configure option, surely
it ought to know that you need -DLINUX_PROFILE on Linux.
2007-09-21 02:33:46 +00:00
Tom Lane 282d2a03dd HOT updates. When we update a tuple without changing any of its indexed
columns, and the new version can be stored on the same heap page, we no longer
generate extra index entries for the new version.  Instead, index searches
follow the HOT-chain links to ensure they find the correct tuple version.

In addition, this patch introduces the ability to "prune" dead tuples on a
per-page basis, without having to do a complete VACUUM pass to recover space.
VACUUM is still needed to clean up dead index entries, however.

Pavan Deolasee, with help from a bunch of other people.
2007-09-20 17:56:33 +00:00
Andrew Dunstan 55613bf9cd Close previously open holes for invalidly encoded data to enter the
database via builtin functions, as recently discussed on -hackers.

chr() now returns a character in the database encoding. For UTF8 encoded databases
the argument is treated as a Unicode code point. For other multi-byte encodings
the argument must designate a strict ascii character, or an error is raised,
as is also the case if the argument is 0.

ascii() is adjusted so that it remains the inverse of chr().

The two argument form of convert() is gone, and the three argument form now
takes a bytea first argument and returns a bytea. To cover this loss three new
functions are introduced:
. convert_from(bytea, name) returns text - converts the first argument from the
  named encoding to the database encoding
. convert_to(text, name) returns bytea - converts the first argument from the
  database encoding to the named encoding
. length(bytea, name) returns int - gives the length of the first argument in
  characters in the named encoding
2007-09-18 17:41:17 +00:00
Tom Lane 6889303531 Redefine the lp_flags field of item pointers as having four states, rather
than two independent bits (one of which was never used in heap pages anyway,
or at least hadn't been in a very long time).  This gives us flexibility to
add the HOT notions of redirected and dead item pointers without requiring
anything so klugy as magic values of lp_off and lp_len.  The state values
are chosen so that for the states currently in use (pre-HOT) there is no
change in the physical representation.
2007-09-12 22:10:26 +00:00
Teodor Sigaev 476045a21b Remove QueryOperand->istrue flag, it was used only in cover ranking
(ts_rank_cd). Use palloc'ed array in ranking instead of flag.
2007-09-11 16:01:40 +00:00
Teodor Sigaev 13553cbbff Fix header's size of structs defines in ispell.
Backpatch is needed for contrib version.
2007-09-11 12:57:05 +00:00
Teodor Sigaev 57cafe7982 Refactor from Heikki Linnakangas <heikki@enterprisedb.com>:
* Defined new struct WordEntryPosVector that holds a uint16 length and a
variable size array of WordEntries. This replaces the previous
convention of a variable size uint16 array, with the first element
implying the length. WordEntryPosVector has the same layout in memory,
but is more readable in source code. The POSDATAPTR and POSDATALEN
macros are still used, though it would now be more readable to access
the fields in WordEntryPosVector directly.

* Removed needfree field from DocRepresentation. It was always set to false.

* Miscellaneous other commenting and refactoring
2007-09-11 08:46:29 +00:00
Tom Lane ef4d38c86c Rename recently-added pg_stat_activity column from txn_start to xact_start,
for consistency with other column names such as in pg_stat_database.
2007-09-11 03:28:05 +00:00
Tom Lane 82a47982f3 Arrange for SET LOCAL's effects to persist until the end of the current top
transaction, unless rolled back or overridden by a SET clause for the same
variable attached to a surrounding function call.  Per discussion, these
seem the best semantics.  Note that this is an INCOMPATIBLE CHANGE: in 8.0
through 8.2, SET LOCAL's effects disappeared at subtransaction commit
(leading to behavior that made little sense at the SQL level).

I took advantage of the opportunity to rewrite and simplify the GUC variable
save/restore logic a little bit.  The old idea of a "tentative" value is gone;
it was a hangover from before we had a stack.  Also, we no longer need a stack
entry for every nesting level, but only for those in which a variable's value
actually changed.
2007-09-11 00:06:42 +00:00
Teodor Sigaev f7379f5c88 Heikki Linnakangas <heikki@enterprisedb.com>:
Add tsearch subdirectory is added to Makefile to allow
compile  custom tsearch dictionary as an external module.
2007-09-10 20:37:36 +00:00
Teodor Sigaev d982daae0b Change void* opaque argument to Datum type, add argument's
name to PushFunction type definition.

Per suggestion by Tome Lane <tgl@sss.pgh.pa.us>
2007-09-10 12:36:41 +00:00
Tom Lane 40fda15dce Code review for GUC revert-values-if-removed-from-postgresql.conf patch;
and in passing, fix some bogosities dating from the custom_variable_classes
patch.  Fix guc-file.l to correctly check changes in custom_variable_classes
that are attempted concurrently with additions/removals of custom variables,
and don't allow the new setting to be applied in advance of checking it.
Clean up messy and undocumented situation for string variables with NULL
boot_val.  Fix DefineCustomVariable functions to initialize boot_val
correctly.  Prevent find_option from inserting bogus placeholders for custom
variables that are simply inquired about rather than being set.
2007-09-10 00:57:22 +00:00
Tom Lane 6bd4f401b0 Replace the former method of determining snapshot xmax --- to wit, calling
ReadNewTransactionId from GetSnapshotData --- with a "latestCompletedXid"
variable that is updated during transaction commit or abort.  Since
latestCompletedXid is written only in places that had to lock ProcArrayLock
exclusively anyway, and is read only in places that had to lock ProcArrayLock
shared anyway, it adds no new locking requirements to the system despite being
cluster-wide.  Moreover, removing ReadNewTransactionId from snapshot
acquisition eliminates the need to take both XidGenLock and ProcArrayLock at
the same time.  Since XidGenLock is sometimes held across I/O this can be a
significant win.  Some preliminary benchmarking suggested that this patch has
no effect on average throughput but can significantly improve the worst-case
transaction times seen in pgbench.  Concept by Florian Pflug, implementation
by Tom Lane.
2007-09-08 20:31:15 +00:00
Teodor Sigaev 978de9d06d Improvements from Heikki Linnakangas <heikki@enterprisedb.com>
- change the alignment requirement of lexemes in TSVector slightly.
Lexeme strings were always padded to 2-byte aligned length to make sure
that if there's position array (uint16[]) it has the right alignment.
The patch changes that so that the padding is not done when there's no
positions. That makes the storage of tsvectors without positions
slightly more compact.

- added some #include "miscadmin.h" lines I missed in the earlier when I
added calls to check_stack_depth().

- Reimplement the send/recv functions, and added a comment
above them describing the on-wire format. The CRC is now recalculated in
tsquery as well per previous discussion.
2007-09-07 16:03:40 +00:00
Teodor Sigaev 8983852e34 Improving various checks by Heikki Linnakangas <heikki@enterprisedb.com>
- add code to check that the query tree is well-formed. It was indeed
  possible to send malformed queries in binary mode, which produced all
  kinds of strange results.

- make the left-field a uint32. There's no reason to
  arbitrarily limit it to 16-bits, and it won't increase the disk/memory
  footprint either now that QueryOperator and QueryOperand are separate
  structs.

- add check_stack_depth() call to all recursive functions I found.
  Some of them might have a natural limit so that you can't force
  arbitrarily deep recursions, but check_stack_depth() is cheap enough
  that seems best to just stick it into anything that might be a problem.
2007-09-07 15:35:11 +00:00
Teodor Sigaev e5be89981f Refactoring by Heikki Linnakangas <heikki@enterprisedb.com> with
small editorization by me

- Brake the QueryItem struct into QueryOperator and QueryOperand.
  Type was really the only common field between them. QueryItem still
  exists, and is used in the TSQuery struct as before, but it's now a
  union of the two. Many other changes fell from that, like separation
  of pushval_asis function into pushValue, pushOperator and pushStop.

- Moved some structs that were for internal use only from header files
  to the right .c-files.

- Moved tsvector parser to a new tsvector_parser.c file. Parser code was
  about half of the size of tsvector.c, it's also used from tsquery.c, and
  it has some data structures of its own, so it seems better to separate
  it. Cleaned up the API so that TSVectorParserState is not accessed from
  outside tsvector_parser.c.

- Separated enumerations (#defines, really) used for QueryItem.type
  field and as return codes from gettoken_query. It was just accidental
  code sharing.

- Removed ParseQueryNode struct used internally by makepol and friends.
  push*-functions now construct QueryItems directly.

- Changed int4 variables to just ints for variables like "i" or "array
  size", where the storage-size was not significant.
2007-09-07 15:09:56 +00:00
Tom Lane cd1aae5864 Allow CREATE INDEX CONCURRENTLY to disregard transactions in other
databases, per gripe from hubert depesz lubaczewski.  Patch from
Simon Riggs.
2007-09-07 00:58:57 +00:00
Tom Lane f8942f4a15 Make eval_const_expressions() preserve typmod when simplifying something like
null::char(3) to a simple Const node.  (It already worked for non-null values,
but not when we skipped evaluation of a strict coercion function.)  This
prevents loss of typmod knowledge in situations such as exhibited in bug
#3598.  Unfortunately there seems no good way to fix that bug in 8.1 and 8.2,
because they simply don't carry a typmod for a plain Const node.

In passing I made all the other callers of makeNullConst supply "real" typmod
values too, though I think it probably doesn't matter anywhere else.
2007-09-06 17:31:58 +00:00
Tom Lane 295e63983d Implement lazy XID allocation: transactions that do not modify any database
rows will normally never obtain an XID at all.  We already did things this way
for subtransactions, but this patch extends the concept to top-level
transactions.  In applications where there are lots of short read-only
transactions, this should improve performance noticeably; not so much from
removal of the actual XID-assignments, as from reduction of overhead that's
driven by the rate of XID consumption.  We add a concept of a "virtual
transaction ID" so that active transactions can be uniquely identified even
if they don't have a regular XID.  This is a much lighter-weight concept:
uniqueness of VXIDs is only guaranteed over the short term, and no on-disk
record is made about them.

Florian Pflug, with some editorialization by Tom.
2007-09-05 18:10:48 +00:00
Andrew Dunstan 2e74c53ec1 Provide for binary input/output of enums, to fix complaint from Merlin Moncure.
This just provides text values, we're not exposing the underlying Oid representation.
Catalog version bumped.
2007-09-04 16:41:43 +00:00
Tom Lane e7889b83b7 Support SET FROM CURRENT in CREATE/ALTER FUNCTION, ALTER DATABASE, ALTER ROLE.
(Actually, it works as a plain statement too, but I didn't document that
because it seems a bit useless.)  Unify VariableResetStmt with
VariableSetStmt, and clean up some ancient cruft in the representation of
same.
2007-09-03 18:46:30 +00:00
Tom Lane 7ab43b88d7 Improve stylistic consistency of descriptions of built-in objects by avoiding
initcap style --- the vast majority of the existing descriptions do not use
an initial cap.  I didn't change places where the first word was all-cap.

initdb not forced because this doesn't change any regression test results.
2007-09-03 02:30:45 +00:00
Tom Lane a4df52f95f Fix breakage of GIN support for varchar[] and cidr[] that I introduced in the
operator-family rewrite.  I had mistakenly supposed that these could use the
pg_amproc entries for text[] and inet[] respectively.  However, binary
compatibility of the underlying types does not make two array types binary
compatible (since they must differ in the header field that gives the element
type OID), and so the index support code doesn't consider those entries
applicable.  Add back the missing pg_amproc entries, and add an opr_sanity
query to try to catch such mistakes in future.  Per report from Gregory
Maxwell.
2007-09-03 01:18:33 +00:00
Tom Lane 2abae34a2e Implement function-local GUC parameter settings, as per recent discussion.
There are still some loose ends: I didn't do anything about the SET FROM
CURRENT idea yet, and it's not real clear whether we are happy with the
interaction of SET LOCAL with function-local settings.  The documentation
is a bit spartan, too.
2007-09-03 00:39:26 +00:00
Tom Lane 0ee5a39862 Apply a band-aid fix for the problem that 8.2 and up completely misestimate
the number of rows likely to be produced by a query such as
	SELECT * FROM t1 LEFT JOIN t2 USING (key) WHERE t2.key IS NULL;
What this is doing is selecting for t1 rows with no match in t2, and thus
it may produce a significant number of rows even if the t2.key table column
contains no nulls at all.  8.2 thinks the table column's null fraction is
relevant and thus may estimate no rows out, which results in terrible plans
if there are more joins above this one.  A proper fix for this will involve
passing much more information about the context of a clause to the selectivity
estimator functions than we ever have.  There's no time left to write such a
patch for 8.3, and it wouldn't be back-patchable into 8.2 anyway.  Instead,
put in an ad-hoc test to defeat the normal table-stats-based estimation when
an IS NULL test is evaluated at an outer join, and just use a constant
estimate instead --- I went with 0.5 for lack of a better idea.  This won't
catch every case but it will catch the typical ways of writing such queries,
and it seems unlikely to make things worse for other queries.
2007-08-31 23:35:22 +00:00
Tom Lane b4c806faa8 Rewrite make_outerjoininfo's construction of min_lefthand and min_righthand
sets for outer joins, in the light of bug #3588 and additional thought and
experimentation.  The original methodology was fatally flawed for nests of
more than two outer joins: it got the relationships between adjacent joins
right, but didn't always come to the right conclusions about whether a join
could be interchanged with one two or more levels below it.  This was largely
caused by a mistaken idea that we should use the min_lefthand + min_righthand
sets of a sub-join as the minimum left or right input set of an upper join
when we conclude that the sub-join can't commute with the upper one.  If
there's a still-lower join that the sub-join *can* commute with, this method
led us to think that that one could commute with the topmost join; which it
can't.  Another problem (not directly connected to bug #3588) was that
make_outerjoininfo's processing-order-dependent method for enforcing outer
join identity #3 didn't work right: if we decided that join A could safely
commute with lower join B, we dropped all information about sub-joins under B
that join A could perhaps not safely commute with, because we removed B's
entire min_righthand from A's.

To fix, make an explicit computation of all inner join combinations that occur
below an outer join, and add to that the full syntactic relsets of any lower
outer joins that we determine it can't commute with.  This method gives much
more direct enforcement of the outer join rearrangement identities, and it
turns out not to cost a lot of additional bookkeeping.

Thanks to Richard Harris for the bug report and test case.
2007-08-31 01:44:06 +00:00
Tom Lane 862861ee77 Fix a couple of misbehaviors rooted in the fact that the default creation
namespace isn't necessarily first in the search path (there could be implicit
schemas ahead of it).  Examples are

test=# set search_path TO s1;

test=# create view pg_timezone_names as select * from pg_timezone_names();
ERROR:  "pg_timezone_names" is already a view

test=# create table pg_class (f1 int primary key);
ERROR:  permission denied: "pg_class" is a system catalog

You'd expect these commands to create the requested objects in s1, since
names beginning with pg_ aren't supposed to be reserved anymore.  What is
happening is that we create the requested base table and then execute
additional commands (here, CREATE RULE or CREATE INDEX), and that code is
passed the same RangeVar that was in the original command.  Since that
RangeVar has schemaname = NULL, the secondary commands think they should do a
path search, and that means they find system catalogs that are implicitly in
front of s1 in the search path.

This is perilously close to being a security hole: if the secondary command
failed to apply a permission check then it'd be possible for unprivileged
users to make schema modifications to system catalogs.  But as far as I can
find, there is no code path in which a check doesn't occur.  Which makes it
just a weird corner-case bug for people who are silly enough to want to
name their tables the same as a system catalog.

The relevant code has changed quite a bit since 8.2, which means this patch
wouldn't work as-is in the back branches.  Since it's a corner case no one
has reported from the field, I'm not going to bother trying to back-patch.
2007-08-27 03:36:08 +00:00
Tom Lane 6c96188cb5 Remove the 'not in' operator (!!=). This was a hangover from Berkeley
days that was obsolete the moment we had IN (SELECT ...) capability.
It's arguably a security hole since it applied no permissions check to
the table it searched, and since it was never documented anywhere,
removing it seems more appropriate than fixing it.
2007-08-27 01:39:25 +00:00
Tom Lane 67bf7b919e Make ARRAY(SELECT ...) return an empty array, rather than a NULL, when the
sub-select returns zero rows.  Per complaint from Jens Schicke.  Since this
is more in the nature of a definition change than a bug, not back-patched.
2007-08-26 21:44:25 +00:00
Tom Lane 93eab9312f Rename built-in Snowball stemmer dictionaries to be english_stem,
russian_stem, etc.  Per discussion.
2007-08-25 01:06:25 +00:00
Tom Lane 7351b5fa17 Cleanup for some problems in tsearch patch:
- ispell initialization crashed on empty dictionary file
- ispell initialization crashed on affix file with prefixes but no suffixes
- stop words file was run through pg_verify_mbstr, with database
  encoding, but it's supposed to be UTF-8; similar bug for synonym files
- bunch of comments added, typos fixed, and other cleanup

Introduced consistent encoding checking/conversion of data read from tsearch
configuration files, by doing this in a single t_readline() subroutine
(replacing direct usages of fgets).  Cleaned up API for readstopwords too.

Heikki Linnakangas
2007-08-25 00:03:59 +00:00
Tom Lane 8a5592daf1 Remove option to change parser of an existing text search configuration.
This prevents needing to do complex and poorly-defined updates of the
mapping table if the new parser has different token types than the old.
Per discussion.
2007-08-22 05:13:50 +00:00
Tom Lane d321421d0a Simplify the syntax of CREATE/ALTER TEXT SEARCH DICTIONARY by treating the
init options of the template as top-level options in the syntax.  This also
makes ALTER a bit easier to use, since options can be replaced individually.
I also made these statements verify that the tmplinit method will accept
the new settings before they get stored; in the original coding you didn't
find out about mistakes until the dictionary got invoked.

Under the hood, init methods now get options as a List of DefElem instead
of a raw text string --- that lets tsearch use existing options-pushing code
instead of duplicating functionality.
2007-08-22 01:39:46 +00:00
Tom Lane 140d4ebcb4 Tsearch2 functionality migrates to core. The bulk of this work is by
Oleg Bartunov and Teodor Sigaev, but I did a lot of editorializing,
so anything that's broken is probably my fault.

Documentation is nonexistent as yet, but let's land the patch so we can
get some portability testing done.
2007-08-21 01:11:32 +00:00
Andrew Dunstan fd801f4faa Provide for logfiles in machine readable CSV format. In consequence, rename
redirect_stderr to logging_collector.
Original patch from Arul Shaji, subsequently modified by Greg Smith, and then
heavily modified by me.
2007-08-19 01:41:25 +00:00
Tom Lane 817946bb04 Arrange to cache a ResultRelInfo in the executor's EState for relations that
are not one of the query's defined result relations, but nonetheless have
triggers fired against them while the query is active.  This was formerly
impossible but can now occur because of my recent patch to fix the firing
order for RI triggers.  Caching a ResultRelInfo avoids duplicating work by
repeatedly opening and closing the same relation, and also allows EXPLAIN
ANALYZE to "see" and report on these extra triggers.  Use the same mechanism
to cache open relations when firing deferred triggers at transaction shutdown;
this replaces the former one-element-cache strategy used in that case, and
should improve performance a bit when there are deferred triggers on a number
of relations.
2007-08-15 21:39:50 +00:00
Tom Lane 9cb8409762 Repair problems occurring when multiple RI updates have to be done to the same
row within one query: we were firing check triggers before all the updates
were done, leading to bogus failures.  Fix by making the triggers queued by
an RI update go at the end of the outer query's trigger event list, thereby
effectively making the processing "breadth-first".  This was indeed how it
worked pre-8.0, so the bug does not occur in the 7.x branches.
Per report from Pavel Stehule.
2007-08-15 19:15:47 +00:00
Tom Lane 67f99d216a Fix oversight in async-commit patch: there were some places in heapam.c
that still thought they could set HEAP_XMAX_COMMITTED immediately after
seeing the other transaction commit.  Make them use the same logic as
tqual.c does to determine if the hint bit can be set yet.
2007-08-14 17:35:18 +00:00
Neil Conway 849ec99753 Adjust the output of MemoryContextStats() so that the stats for a
child memory contexts is indented two spaces to the right of its
parent context.  This should make it easier to deduce the memory
context hierarchy from the output of MemoryContextStats().
2007-08-07 06:25:14 +00:00
Tom Lane c8b7e811f3 Apparently icc doesn't always define __ICC, and it's more correct to
check for __INTEL_COMPILER.  Per report from Dirk Tilger.
Not back-patched since I don't fully trust it yet ...
2007-08-05 15:11:40 +00:00
Tom Lane 8d30337566 Fix up bad layout of some comments (probably pg_indent's fault), and
improve grammar a tad.  Per Greg Stark.
2007-08-04 21:53:00 +00:00
Tom Lane 4fd8d6b3e7 Fix crash caused by log_timezone patch if we attempt to emit any elog messages
between the setting of log_line_prefix and the setting of log_timezone.  We
can't realistically set log_timezone any earlier than we do now, so the best
behavior seems to be to use GMT zone if any timestamps are to be logged during
early startup.  Create a dummy zone variable with a minimal definition of GMT
(in particular it will never know about leap seconds), so that we can set it
up without reference to any external files.
2007-08-04 19:29:25 +00:00
Tom Lane bdd6b62245 Switch over to using the src/timezone functions for formatting timestamps
displayed in the postmaster log.  This avoids Windows-specific problems with
localized time zone names that are in the wrong encoding, and generally seems
like a good idea to forestall other potential platform-dependent issues.
To preserve the existing behavior that all backends will log in the same time
zone, create a new GUC variable log_timezone that can only be changed on a
system-wide basis, and reference log-related calculations to that zone instead
of the TimeZone variable.

This fixes the issue reported by Hiroshi Saito that timestamps printed by
xlog.c startup could be improperly localized on Windows.  We still need a
simpler patch for that problem in the back branches, however.
2007-08-04 01:26:54 +00:00
Andrew Dunstan 63872601e8 Move session_start out of MyProcPort stucture and make it a global called MyStartTime,
so that we will be able to create a cookie for all processes for CSVlogs.
It is set wherever MyProcPid is set. Take the opportunity to remove the now
unnecessary session-only restriction on the %s and %c escapes in log_line_prefix.
2007-08-02 23:39:45 +00:00
Tom Lane 4a78cdeb6b Support an optional asynchronous commit mode, in which we don't flush WAL
before reporting a transaction committed.  Data consistency is still
guaranteed (unlike setting fsync = off), but a crash may lose the effects
of the last few transactions.  Patch by Simon, some editorialization by Tom.
2007-08-01 22:45:09 +00:00
Tom Lane e4f4a7f5a4 Remove FileUnlink(), which wasn't being used anywhere and interacted poorly
with the recent patch to log temp file sizes at removal time.  Doesn't seem
worth fixing since it's unused.
In passing, make a few elog messages conform to the message style guide.
2007-07-26 15:15:18 +00:00
Tom Lane 82eed4dba2 Arrange to put TOAST tables belonging to temporary tables into special schemas
named pg_toast_temp_nnn, alongside the pg_temp_nnn schemas used for the temp
tables themselves.  This allows low-level code such as the relcache to
recognize that these tables are indeed temporary, which enables various
optimizations such as not WAL-logging changes and using local rather than
shared buffers for access.  Aside from obvious performance benefits, this
provides a solution to bug #3483, in which other backends unexpectedly held
open file references to temporary tables.  The scheme preserves the property
that TOAST tables are not in any schema that's normally in the search path,
so they don't conflict with user table names.

initdb forced because of changes in system view definitions.
2007-07-25 22:16:18 +00:00
Magnus Hagander 906b2e1b37 Rename DLLIMPORT macro to PGDLLIMPORT to avoid conflict with
third party includes (like tcl) that define DLLIMPORT.
2007-07-25 12:22:54 +00:00
Magnus Hagander d602592494 Make it possible, and default, for MingW to build with SSPI support
by dynamically loading the function that's missing from the MingW
headers and library.
2007-07-24 09:00:27 +00:00
Tom Lane ad4295728e Create a new dedicated Postgres process, "wal writer", which exists to write
and fsync WAL at convenient intervals.  For the moment it just tries to
offload this work from backends, but soon it will be responsible for
guaranteeing a maximum delay before asynchronously-committed transactions
will be flushed to disk.

This is a portion of Simon Riggs' async-commit patch, committed to CVS
separately because a background WAL writer seems like it might be a good idea
independently of the async-commit feature.  I rebased walwriter.c on
bgwriter.c because it seemed like a more appropriate way of handling signals;
while the startup/shutdown logic in postmaster.c is more like autovac because
we want walwriter to quit before we start the shutdown checkpoint.
2007-07-24 04:54:09 +00:00
Magnus Hagander f70866fb23 SSPI authentication on Windows. GSSAPI compatible client when doing Kerberos
against a Unix server, and Windows-specific server-side authentication
using SSPI "negotiate" method (Kerberos or NTLM).

Only builds properly with MSVC for now.
2007-07-23 10:16:54 +00:00
Neil Conway 474774918b Implement CREATE TABLE LIKE ... INCLUDING INDEXES. Patch from NikhilS,
based in part on an earlier patch from Trevor Hardcastle, and reviewed
by myself.
2007-07-17 05:02:03 +00:00
Tom Lane 9f6f51d5d4 Hmm, so evidently _check_lock and _clear_lock take an argument of type
int not unsigned int.  Third try to get grebe building without warnings...
2007-07-16 14:02:22 +00:00
Tom Lane 5aaf09ac46 So our reward for including <sys/atomic_op.h> seems to be a bunch of
nattering about casting away volatile.  Losers.
2007-07-16 04:57:57 +00:00
Tom Lane 057d5c421f On AIX, include <sys/atomic_op.h> so that the functions we use for
TAS support are properly declared.
2007-07-16 02:03:14 +00:00
Tom Lane 6bc12a4aca Get dirmod.c on the same page as port.h about whether we use pgsymlink
on Cygwin (answer: we don't).  Also try to unwind the #ifdef spaghetti
a little bit.  Untested but hopefully I didn't break anything.
2007-07-12 23:28:49 +00:00
Magnus Hagander 784fd04940 Enable GSSAPI to build using MSVC. Always build GSSAPI when Kerberos is
enabled, because the only Kerberos library supported always contains it.
2007-07-12 14:43:21 +00:00
Magnus Hagander 65a513c249 Support GSSAPI builds where the header is <gssapi.h> and not <gssapi/gssapi.h>,
such as OpenBSD (possibly all Heimdal).

Stefan Kaltenbrunner
2007-07-12 14:36:52 +00:00
Magnus Hagander 6771994058 Fix freenig of names in Kerberos when using MIT - need to use the
free function provided in the Kerberos library.
This fixes a very hard to track down heap corruption on windows
when using debug runtimes.
2007-07-12 14:10:39 +00:00
Tom Lane e27a8df1bf Fix misspelling. 2007-07-10 16:41:01 +00:00
Magnus Hagander 6160106c74 Add support for GSSAPI authentication.
Documentation still being written, will be committed later.

Henry B. Hotz and Magnus Hagander
2007-07-10 13:14:22 +00:00
Tom Lane b09cb0cf12 Remove the pgstat_drop_relation() call from smgr_internal_unlink(), because
we don't know at that point which relation OID to tell pgstat to forget.
The code was passing the relfilenode, which is incorrect, and could possibly
cause some other relation's stats to be zeroed out.  While we could try to
clean this up, it seems much simpler and more reliable to let the next
invocation of pgstat_vacuum_tabstat() fix things; which indeed is how it
worked before I introduced the buggy code into 8.1.3 and later :-(.
Problem noticed by Itagaki Takahiro, fix is per subsequent discussion.
2007-07-08 22:23:16 +00:00
Tom Lane 5f7b1f8d9d Closer code review for PQconnectionUsedPassword() patch: in particular,
not OK to include postgres_fe.h into libpq-fe.h, hence declare it as
returning int not bool.
2007-07-08 18:28:56 +00:00
Joe Conway 51bc3dfe4b Arrange for the authentication request type to be preserved in
PGconn. Invent a new libpq connection-status function,
PQconnectionUsedPassword() that returns true if the server
demanded a password during authentication, false otherwise.
This may be useful to clients in general, but is immediately
useful to help plug a privilege escalation path in dblink.
Per list discussion and design proposed by Tom Lane.
2007-07-08 17:11:51 +00:00
Tom Lane 7af3a6fc6f Fix up hash functions for datetime datatypes so that they don't take
unwarranted liberties with int8 vs float8 values for these types.
Specifically, be sure to apply either hashint8 or hashfloat8 depending
on HAVE_INT64_TIMESTAMP.  Per my gripe of even date.
2007-07-06 04:16:00 +00:00
Neil Conway a55898131e Add ALTER VIEW ... RENAME TO, and a RENAME TO clause to ALTER SEQUENCE.
Sequences and views could previously be renamed using ALTER TABLE, but
this was a repeated source of confusion for users. Update the docs,
and psql tab completion. Patch from David Fetter; various minor fixes
by myself.
2007-07-03 01:30:37 +00:00
Tom Lane 1c7fe33fdb Fix failure to restart Postgres when Linux kernel returns EIDRM for shmctl().
This is a Linux kernel bug that apparently exists in every extant kernel
version: sometimes shmctl() will fail with EIDRM when EINVAL is correct.
We were assuming that EIDRM indicates a possible conflict with pre-existing
backends, and refusing to start the postmaster when this happens.  Fortunately,
there does not seem to be any case where Linux can legitimately return EIDRM
(it doesn't track shmem segments in a way that would allow that), so we can
get away with just assuming that EIDRM means EINVAL on this platform.

Per reports from Michael Fuhr and Jon Lapham --- it's a bit surprising
we have not seen more reports, actually.
2007-07-02 20:11:55 +00:00
Tom Lane 9fc25c0511 Improve logging of checkpoints. Patch by Greg Smith, worked over
by Heikki and a little bit by me.
2007-06-30 19:12:02 +00:00
Tom Lane 867e2c91a0 Implement "distributed" checkpoints in which the checkpoint I/O is spread
over a fairly long period of time, rather than being spat out in a burst.
This happens only for background checkpoints carried out by the bgwriter;
other cases, such as a shutdown checkpoint, are still done at full speed.

Remove the "all buffers" scan in the bgwriter, and associated stats
infrastructure, since this seems no longer very useful when the checkpoint
itself is properly throttled.

Original patch by Itagaki Takahiro, reworked by Heikki Linnakangas,
and some minor API editorialization by me.
2007-06-28 00:02:40 +00:00
Alvaro Herrera 80f3b5ad2e Remove unused "caller" argument from stringToQualifiedNameList. 2007-06-26 16:48:09 +00:00
Alvaro Herrera a03e8ad266 Remove unused BAD_LOCATION definition. 2007-06-25 17:12:07 +00:00
Alvaro Herrera bae0b56880 Improve autovacuum launcher's ability to detect a problem in worker startup,
by having the postmaster signal it when certain failures occur.  This requires
the postmaster setting a flag in shared memory, but should be as safe as the
pmsignal.c code is.

Also make sure the launcher honor's a postgresql.conf change turning it off
on SIGHUP.
2007-06-25 16:09:03 +00:00
Tom Lane 46379d6e60 Separate parse-analysis for utility commands out of parser/analyze.c
(which now deals only in optimizable statements), and put that code
into a new file parser/parse_utilcmd.c.  This helps clarify and enforce
the design rule that utility statements shouldn't be processed during
the regular parse analysis phase; all interpretation of their meaning
should happen after they are given to ProcessUtility to execute.
(We need this because we don't retain any locks for a utility statement
that's in a plan cache, nor have any way to detect that it's stale.)

We are also able to simplify the API for parse_analyze() and related
routines, because they will now always return exactly one Query structure.

In passing, fix bug #3403 concerning trying to add a serial column to
an existing temp table (this is largely Heikki's work, but we needed
all that restructuring to make it safe).
2007-06-23 22:12:52 +00:00
Tom Lane 6e07228728 Code review for log_lock_waits patch. Don't try to issue log messages from
within a signal handler (this might be safe given the relatively narrow code
range in which the interrupt is enabled, but it seems awfully risky); do issue
more informative log messages that tell what is being waited for and the exact
length of the wait; minor other code cleanup.  Greg Stark and Tom Lane
2007-06-19 20:13:22 +00:00
Tom Lane 4c310eca2e Arrange for quote_identifier() and pg_dump to not quote keywords that are
unreserved according to the grammar.  The list of unreserved words has gotten
extensive enough that the unnecessary quoting is becoming a bit of an eyesore.
To do this, add knowledge of the keyword category to keywords.c's table.
(Someday we might be able to generate keywords.c's table and the keyword lists
in gram.y from a common source.)  For the moment, lie about WITH's status in
the table so it will still get quoted --- this is because of the expectation
that WITH will become reserved when the SQL recursive-queries patch gets done.

I didn't force initdb because this affects nothing on-disk; but note that a
few regression tests have changed expected output.
2007-06-18 21:40:58 +00:00
Tom Lane 23347231a5 Tweak the API for per-datatype typmodin functions so that they are passed
an array of strings rather than an array of integers, and allow any simple
constant or identifier to be used in typmods; for example
	create table foo (f1 widget(42,'23skidoo',point));
Of course the typmodin function has still got to pack this info into a
non-negative int32 for storage, but it's still a useful improvement in
flexibility, especially considering that you can do nearly anything if you
are willing to keep the info in a side table.  We can get away with this
change since we have not yet released a version providing user-definable
typmods.  Per discussion.
2007-06-15 20:56:52 +00:00
Andrew Dunstan bd2cb9aaa5 Implement a chunking protocol for writes to the syslogger pipe, with messages
reassembled in the syslogger before writing to the log file. This prevents
partial messages from being written, which mucks up log rotation, and
messages from different backends being interleaved, which causes garbled
logs. Backport as far as 8.0, where the syslogger was introduced.

Tom Lane and Andrew Dunstan
2007-06-14 01:48:51 +00:00
Tom Lane 8be9b50ab4 Minor comment fixes. 2007-06-12 16:01:31 +00:00
Tom Lane a9545b3aef Improve UPDATE/DELETE WHERE CURRENT OF so that they can be used from plpgsql
with a plpgsql-defined cursor.  The underlying mechanism for this is that the
main SQL engine will now take "WHERE CURRENT OF $n" where $n is a refcursor
parameter.  Not sure if we should document that fact or consider it an
implementation detail.  Per discussion with Pavel Stehule.
2007-06-11 22:22:42 +00:00
Tom Lane 6808f1b1de Support UPDATE/DELETE WHERE CURRENT OF cursor_name, per SQL standard.
Along the way, allow FOR UPDATE in non-WITH-HOLD cursors; there may once
have been a reason to disallow that, but it seems to work now, and it's
really rather necessary if you want to select a row via a cursor and then
update it in a concurrent-safe fashion.

Original patch by Arul Shaji, rather heavily editorialized by Tom Lane.
2007-06-11 01:16:30 +00:00
Tom Lane 85d72f0516 Teach heapam code to know the difference between a real seqscan and the
pseudo HeapScanDesc created for a bitmap heap scan.  This avoids some useless
overhead during a bitmap scan startup, in particular invoking the syncscan
code.  (We might someday want to do that, but right now it's merely useless
contention for shared memory, to say nothing of possibly pushing useful
entries out of syncscan's small LRU list.)  This also allows elimination of
ugly pgstat_discount_heap_scan() kluge.
2007-06-09 18:49:55 +00:00
Tom Lane a04a423599 Arrange for large sequential scans to synchronize with each other, so that
when multiple backends are scanning the same relation concurrently, each page
is (ideally) read only once.

Jeff Davis, with review by Heikki and Tom.
2007-06-08 18:23:53 +00:00
Tom Lane 24ee8af573 Rework temp_tablespaces patch so that temp tablespaces are assigned separately
for each temp file, rather than once per sort or hashjoin; this allows
spreading the data of a large sort or join across multiple tablespaces.
(I remain dubious that this will make any difference in practice, but certain
people insisted.)  Arrange to cache the results of parsing the GUC variable
instead of recomputing from scratch on every demand, and push usage of the
cache down to the bottommost fd.c level.
2007-06-07 19:19:57 +00:00
Tom Lane 2d4db3675f Fix up text concatenation so that it accepts all the reasonable cases that
were accepted by prior Postgres releases.  This takes care of the loose end
left by the preceding patch to downgrade implicit casts-to-text.  To avoid
breaking desirable behavior for array concatenation, introduce a new
polymorphic pseudo-type "anynonarray" --- the added concatenation operators
are actually text || anynonarray and anynonarray || text.
2007-06-06 23:00:50 +00:00
Tom Lane 31edbadf4a Downgrade implicit casts to text to be assignment-only, except for the ones
from the other string-category types; this eliminates a lot of surprising
interpretations that the parser could formerly make when there was no directly
applicable operator.

Create a general mechanism that supports casts to and from the standard string
types (text,varchar,bpchar) for *every* datatype, by invoking the datatype's
I/O functions.  These new casts are assignment-only in the to-string direction,
explicit-only in the other, and therefore should create no surprising behavior.
Remove a bunch of thereby-obsoleted datatype-specific casting functions.

The "general mechanism" is a new expression node type CoerceViaIO that can
actually convert between *any* two datatypes if their external text
representations are compatible.  This is more general than needed for the
immediate feature, but might be useful in plpgsql or other places in future.

This commit does nothing about the issue that applying the concatenation
operator || to non-text types will now fail, often with strange error messages
due to misinterpreting the operator as array concatenation.  Since it often
(not always) worked before, we should either make it succeed or at least give
a more user-friendly error; but details are still under debate.

Peter Eisentraut and Tom Lane
2007-06-05 21:31:09 +00:00
Jan Wieck 1120b99445 The session_replication_role actually can be changed at will during
a session regardless of the existence of cached plans. The plancache
only needs to be invalidated so that rules affected by the new setting
will be reflected in the new query plans.

Jan
2007-06-05 20:00:41 +00:00
Tom Lane acfce502ba Create a GUC parameter temp_tablespaces that allows selection of the
tablespace(s) in which to store temp tables and temporary files.  This is a
list to allow spreading the load across multiple tablespaces (a random list
element is chosen each time a temp object is to be created).  Temp files are
not stored in per-database pgsql_tmp/ directories anymore, but per-tablespace
directories.

Jaime Casanova and Albert Cervera, with review by Bernd Helmle and Tom Lane.
2007-06-03 17:08:34 +00:00
Neil Conway f086be3d39 Allow leading and trailing whitespace in the input to the boolean
type. Also, add explicit casts between boolean and text/varchar. Both
of these changes are for conformance with SQL:2003.

Update the regression tests, bump the catversion.
2007-06-01 23:40:19 +00:00
Tom Lane bd0a260928 Make CREATE/DROP/RENAME DATABASE wait a little bit to see if other backends
will exit before failing because of conflicting DB usage.  Per discussion,
this seems a good idea to help mask the fact that backend exit takes nonzero
time.  Remove a couple of thereby-obsoleted sleeps in contrib and PL
regression test sequences.
2007-06-01 19:38:07 +00:00
Tom Lane bd2c980b22 Buy back some of the cycles spent in more-expensive hash functions by
selecting power-of-2, rather than prime, numbers of buckets in hash joins.
If the hash functions are doing their jobs properly by making all hash bits
equally random, this is good enough, and it saves expensive integer division
and modulus operations.
2007-06-01 17:38:44 +00:00
Tom Lane 1f559b7d3a Fix several hash functions that were taking chintzy shortcuts instead of
delivering a well-randomized hash value.  I got religion on this after
observing that performance of multi-batch hash join degrades terribly if the
higher-order bits of hash values aren't random, as indeed was true for say
hashes of small integer values.  It's now expected and documented that hash
functions should use hash_any or some comparable method to ensure that all
bits of their output are about equally random.

initdb forced because this change invalidates existing hash indexes.  For the
same reason, this isn't back-patchable; the hash join performance problem
will get a band-aid fix in the back branches.
2007-06-01 15:33:19 +00:00
Tom Lane 10f719af33 Change build_index_pathkeys() so that the expressions it builds to represent
index key columns always have the type expected by the index's associated
operators, ie, we add RelabelType nodes when dealing with binary-compatible
index opclasses.  This is needed to get varchar indexes to play nicely with
the new EquivalenceClass machinery, as per recent gripe from Josh Berkus that
CVS HEAD was failing to match a varchar index column to a constant restriction
in the query.

It seems likely that this change will allow removal of a lot of ugly ad-hoc
RelabelType-stripping that the planner has traditionally done while matching
expressions to other expressions, but I'll worry about that some other day.
2007-05-31 16:57:34 +00:00
Tom Lane d526575f89 Make large sequential scans and VACUUMs work in a limited-size "ring" of
buffers, rather than blowing out the whole shared-buffer arena.  Aside from
avoiding cache spoliation, this fixes the problem that VACUUM formerly tended
to cause a WAL flush for every page it modified, because we had it hacked to
use only a single buffer.  Those flushes will now occur only once per
ring-ful.  The exact ring size, and the threshold for seqscans to switch into
the ring usage pattern, remain under debate; but the infrastructure seems
done.  The key bit of infrastructure is a new optional BufferAccessStrategy
object that can be passed to ReadBuffer operations; this replaces the former
StrategyHintVacuum API.

This patch also changes the buffer usage-count methodology a bit: we now
advance usage_count when first pinning a buffer, rather than when last
unpinning it.  To preserve the behavior that a buffer's lifetime starts to
decrease when it's released, the clock sweep code is modified to not decrement
usage_count of pinned buffers.

Work not done in this commit: teach GiST and GIN indexes to use the vacuum
BufferAccessStrategy for vacuum-driven fetches.

Original patch by Simon, reworked by Heikki and again by Tom.
2007-05-30 20:12:03 +00:00
Tom Lane 14c4d3dea9 Fix trivial misspelling in comment. 2007-05-30 16:16:32 +00:00
Neil Conway 6af04882de Fix a bug in input processing for the "interval" type. Previously,
"microsecond" and "millisecond" units were not considered valid input
by themselves, which caused inputs like "1 millisecond" to be rejected
erroneously.

Update the docs, add regression tests, and backport to 8.2 and 8.1
2007-05-29 04:58:43 +00:00
Tom Lane 97d12b434f Ooops, I was too busy worrying about getting the transactional infrastructure
right to think carefully about how insert and delete counts map to
n_live_tuples.  Of course a deletion should reduce n_live_tuples.
2007-05-27 17:28:36 +00:00
Tom Lane 8d675c85c5 pgstat's on-proc-exit hook has to execute after the last transaction commit
or abort within a backend; rearrange InitPostgres processing to make it so.
Revealed by just-added Asserts along with ECPG regression tests (hm, I wonder
why the core regression tests didn't expose it?).  This possibly is another
reason for missing stats updates ...
2007-05-27 05:37:50 +00:00
Tom Lane 77947c51c0 Fix up pgstats counting of live and dead tuples to recognize that committed
and aborted transactions have different effects; also teach it not to assume
that prepared transactions are always committed.

Along the way, simplify the pgstats API by tying counting directly to
Relations; I cannot detect any redeeming social value in having stats
pointers in HeapScanDesc and IndexScanDesc structures.  And fix a few
corner cases in which counts might be missed because the relation's
pgstat_info pointer hadn't been set.
2007-05-27 03:50:39 +00:00
Tom Lane 604ffd280b Create hooks to let a loadable plugin monitor (or even replace) the planner
and/or create plans for hypothetical situations; in particular, investigate
plans that would be generated using hypothetical indexes.  This is a
heavily-rewritten version of the hooks proposed by Gurjeet Singh for his
Index Advisor project.  In this formulation, the index advisor can be
entirely a loadable module instead of requiring a significant part to be
in the core backend, and plans can be generated for hypothetical indexes
without requiring the creation and rolling-back of system catalog entries.

The index advisor patch as-submitted is not compatible with these hooks,
but it needs significant work anyway due to other 8.2-to-8.3 planner
changes.  With these hooks in the core backend, development of the advisor
can proceed as a pgfoundry project.
2007-05-25 17:54:25 +00:00
Tom Lane 11086f2f2b Repair planner bug introduced in 8.2 by ability to rearrange outer joins:
in cases where a sub-SELECT inserts a WHERE clause between two outer joins,
that clause may prevent us from re-ordering the two outer joins.  The code
was considering only the joins' own ON-conditions in determining reordering
safety, which is not good enough.  Add a "delay_upper_joins" flag to
OuterJoinInfo to flag that we have detected such a clause and higher-level
outer joins shouldn't be permitted to commute with this one.  (This might
seem overly coarse, but given the current rules for OJ reordering, it's
sufficient AFAICT.)

The failure case is actually pretty narrow: it needs a WHERE clause within
the RHS of a left join that checks the RHS of a lower left join, but is not
strict for that RHS (else we'd have simplified the lower join to a plain
join).  Even then no failure will be manifest unless the planner chooses to
rearrange the join order.

Per bug report from Adam Terrey.
2007-05-22 23:23:58 +00:00
Tom Lane d7153c5fad Fix best_inner_indexscan to return both the cheapest-total-cost and
cheapest-startup-cost innerjoin indexscans, and make joinpath.c consider
both of these (when different) as the inside of a nestloop join.  The
original design was based on the assumption that indexscan paths always
have negligible startup cost, and so total cost is the only important
figure of merit; an assumption that's obviously broken by bitmap
indexscans.  This oversight could lead to choosing poor plans in cases
where fast-start behavior is more important than total cost, such as
LIMIT and IN queries.  8.1-vintage brain fade exposed by an example from
Chuck D.
2007-05-22 01:40:33 +00:00
Tom Lane 2415ad9831 Teach tuplestore.c to throw away data before the "mark" point when the caller
is using mark/restore but not rewind or backward-scan capability.  Insert a
materialize plan node between a mergejoin and its inner child if the inner
child is a sort that is expected to spill to disk.  The materialize shields
the sort from the need to do mark/restore and thereby allows it to perform
its final merge pass on-the-fly; while the materialize itself is normally
cheap since it won't spill to disk unless the number of tuples with equal
key values exceeds work_mem.

Greg Stark, with some kibitzing from Tom Lane.
2007-05-21 17:57:35 +00:00
Peter Eisentraut 3963574d13 XPath fixes:
- Function renamed to "xpath".
 - Function is now strict, per discussion.
 - Return empty array in case when XPath expression detects nothing
   (previously, NULL was returned in such case), per discussion.
 - (bugfix) Work with fragments with prologue: select xpath('/a',
   '<?xml version="1.0"?><a /><b />'); // now XML datum is always wrapped
   with dummy <x>...</x>, XML prologue simply goes away (if any).
 - Some cleanup.

Nikolay Samokhvalov

Some code cleanup and documentation work by myself.
2007-05-21 17:10:29 +00:00
Tom Lane a8d539f124 To support external compression of archived WAL data, add a flag bit to
WAL records that shows whether it is safe to remove full-page images
(ie, whether or not an on-line backup was in progress when the WAL entry
was made).  Also make provision for an XLOG_NOOP record type that can be
used to fill in the extra space when decompressing the data for restore.

This is the portion of Koichi Suzuki's "full page writes" patch that
has to go into the core database.  The remainder of that work is two
external compression and decompression programs, which for the time being
will undergo separate development on pgfoundry.  Per discussion.

Also, twiddle the handling of BTREE_SPLIT records to ensure it'll be
possible to compress them (the previous coding caused essential info
to be omitted).  The other commonly-used record types seem OK already,
with the possible exception of GIN and GIST WAL records, which I don't
understand well enough to opine on.
2007-05-20 21:08:19 +00:00
Alvaro Herrera b40776d221 Have CLUSTER advance the table's relfrozenxid. The new frozen point is the
FreezeXid introduced in a recent commit, so there isn't any data loss in this
approach.

Doing it causes ALTER TABLE (or rather, the forms of it that cause a full table
rewrite) to be affected as well.  In this case, the frozen point is RecentXmin,
because after the rewrite all the tuples are relabeled with the rewriting
transaction's Xid.

TOAST tables are fixed automatically as well, as fallout of the way they were
already being handled in the respective code paths.

With this patch, there is no longer need to VACUUM tables for Xid wraparound
purposes that have been cleaned up via TRUNCATE or CLUSTER.
2007-05-18 23:19:42 +00:00
Tom Lane dbb769352d Temporary fix for the problem that pg_stat_activity, inet_client_addr(),
and inet_server_addr() fail if the client connected over a "scoped" IPv6
address.  In this case getnameinfo() will return a string ending with
a poorly-standardized "%something" zone specifier, which these functions
try to feed to network_in(), which won't take it.  So that we don't lose
functionality altogether, suppress the zone specifier before giving the
string to network_in().  Per report from Brian Hirt.

TODO: probably someday the inet type should support scoped IPv6 addresses,
and then this patch should be reverted.

Backpatch to 8.2 ... is it worth going further?
2007-05-17 23:31:49 +00:00
Tom Lane b11123b675 Fix parameter recalculation for Limit nodes: during a ReScan call we must
recompute the limit/offset immediately, so that the updated values are
available when the child's ReScan function is invoked.  Add a regression
test for this, too.  Bug is new in HEAD (due to the bounded-sorting patch)
so no need for back-patch.

I did not do anything about merging this signaling with chgParam processing,
but if we were to do that we'd still need to compute the updated values
at this point rather than during the first ProcNode call.

Per observation and test case from Greg Stark, though I didn't use his patch.
2007-05-17 19:35:08 +00:00
Alvaro Herrera 3b0347b36e Move the tuple freezing point in CLUSTER to a point further back in the past,
to avoid losing useful Xid information in not-so-old tuples.  This makes
CLUSTER behave the same as VACUUM as far a tuple-freezing behavior goes
(though CLUSTER does not yet advance the table's relfrozenxid).

While at it, move the actual freezing operation in rewriteheap.c to a more
appropriate place, and document it thoroughly.  This part of the patch from
Tom Lane.
2007-05-17 15:28:29 +00:00
Alvaro Herrera 90cbc63fd1 Have TRUNCATE advance the affected table's relfrozenxid to RecentXmin, to
avoid a later needless VACUUM for Xid-wraparound purposes.  We can do this
since the table is known to be left empty, so no Xid remains on it.

Per discussion.
2007-05-16 17:28:20 +00:00
Bruce Momjian 178214d2ae Update comments for PG_DETOAST_PACKED and VARDATA_ANY on a structures
that require alignment.

Add a paragraph to the "User-Defined Types" chapter on using these
macros since it seems like they're a hit.

Gregory Stark
2007-05-15 17:39:54 +00:00
Tom Lane 9aa3c782c9 Fix the problem that creating a user-defined type named _foo, followed by one
named foo, would work but the other ordering would not.  If a user-specified
type or table name collides with an existing auto-generated array name, just
rename the array type out of the way by prepending more underscores.  This
should not create any backward-compatibility issues, since the cases in which
this will happen would have failed outright in prior releases.

Also fix an oversight in the arrays-of-composites patch: ALTER TABLE RENAME
renamed the table's rowtype but not its array type.
2007-05-12 00:55:00 +00:00
Tom Lane d8326119c8 Fix my oversight in enabling domains-of-domains: ALTER DOMAIN ADD CONSTRAINT
needs to check the new constraint against columns of derived domains too.

Also, make it error out if the domain to be modified is used within any
composite-type columns.  Eventually we should support that case, but it seems
a bit painful, and not suitable for a back-patch.  For the moment just let the
user know we can't do it.

Backpatch to 8.2, which is the only released version that allows nested
domains.  Possibly the other part should be back-patched further.
2007-05-11 20:17:15 +00:00
Tom Lane bc8036fc66 Support arrays of composite types, including the rowtypes of regular tables
and views (but not system catalogs, nor sequences or toast tables).  Get rid
of the hardwired convention that a type's array type is named exactly "_type",
instead using a new column pg_type.typarray to provide the linkage.  (It still
will be named "_type", though, except in odd corner cases such as
maximum-length type names.)

Along the way, make tracking of owner and schema dependencies for types more
uniform: a type directly created by the user has these dependencies, while a
table rowtype or auto-generated array type does not have them, but depends on
its parent object instead.

David Fetter, Andrew Dunstan, Tom Lane
2007-05-11 17:57:14 +00:00
Tom Lane 5b7cf08d33 Reserve some pg_statistic "kind" codes for use by the ESRI ST_Geometry
datatype project.  Per request from Ale Raza (araza at esri.com).
2007-05-08 19:13:52 +00:00
Neil Conway ade493e02d Add a hash function for "numeric". Mark the equality operator for
numerics as "oprcanhash", and make the corresponding system catalog
updates. As a result, hash indexes, hashed aggregation, and hash
joins can now be used with the numeric type. Bump the catversion.

The only tricky aspect to doing this is writing a correct hash
function: it's possible for two Numerics to be equal according to
their equality operator, but have different in-memory bit patterns.
To cope with this, the hash function doesn't consider the Numeric's
"scale" or "sign", and explictly skips any leading or trailing
zeros in the Numeric's digit buffer (the current implementation
should suppress any such zeros, but it seems unwise to rely upon
this). See discussion on pgsql-patches for more details.
2007-05-08 18:56:48 +00:00
Tom Lane d2a4a4069f Add a line to the EXPLAIN ANALYZE output for a Sort node, showing the
actual sort strategy and amount of space used.  By popular demand.
2007-05-04 21:29:53 +00:00
Tom Lane c7464720a3 tas() support for Renesas' M32R processor. Kazuhiro Inaoka 2007-05-04 15:20:52 +00:00
Tom Lane 79ca7ffeb6 A few fixups in error handling: mark pg_re_throw() as noreturn for gcc,
and for other compilers, insert a dummy exit() call so that they understand
PG_RE_THROW() doesn't return.  Insert fflush(stderr) in ExceptionalCondition,
per recent buildfarm evidence that that might not happen automatically on some
platforms.  And const-ify ExceptionalCondition's declaration while at it.
2007-05-04 02:01:02 +00:00
Tom Lane d26559dbf3 Teach tuplesort.c about "top N" sorting, in which only the first N tuples
need be returned.  We keep a heap of the current best N tuples and sift-up
new tuples into it as we scan the input.  For M input tuples this means
only about M*log(N) comparisons instead of M*log(M), not to mention a lot
less workspace when N is small --- avoiding spill-to-disk for large M
is actually the most attractive thing about it.  Patch includes planner
and executor support for invoking this facility in ORDER BY ... LIMIT
queries.  Greg Stark, with some editorialization by moi.
2007-05-04 01:13:45 +00:00
Tom Lane 0fef38da21 Tweak hash index AM to use the new ReadOrZeroBuffer bufmgr API when fetching
pages it intends to zero immediately.  Just to show there is some use for that
function besides WAL recovery :-).
Along the way, fold _hash_checkpage and _hash_pageinit calls into _hash_getbuf
and friends, instead of expecting callers to do that separately.
2007-05-03 16:45:58 +00:00
Tom Lane 8c3cc86e7b During WAL recovery, when reading a page that we intend to overwrite completely
from the WAL data, don't bother to physically read it; just have bufmgr.c
return a zeroed-out buffer instead.  This speeds recovery significantly,
and also avoids unnecessary failures when a page-to-be-overwritten has corrupt
page headers on disk.  This replaces a former kluge that accomplished the
latter by pretending zero_damaged_pages was always ON during WAL recovery;
which was OK when the kluge was put in, but is unsafe when restoring a WAL
log that was written with full_page_writes off.

Heikki Linnakangas
2007-05-02 23:18:03 +00:00
Tom Lane 88f1fd2989 Fix oversight in PG_RE_THROW processing: it's entirely possible that there
isn't any place to throw the error to.  If so, we should treat the error
as FATAL, just as we would have if it'd been thrown outside the PG_TRY
block to begin with.

Although this is clearly a *potential* source of bugs, it is not clear
at the moment whether it is an *actual* source of bugs; there may not
presently be any PG_TRY blocks in code that can be reached with no outer
longjmp catcher.  So for the moment I'm going to be conservative and not
back-patch this.  The change breaks ABI for users of PG_RE_THROW and hence
might create compatibility problems for loadable modules, so we should not
put it into released branches without proof that it's needed.
2007-05-02 15:32:42 +00:00
Tom Lane c432061963 Change the timestamps recorded in transaction commit/abort xlog records
from time_t to TimestampTz representation.  This provides full gettimeofday()
resolution of the timestamps, which might be useful when attempting to
do point-in-time recovery --- previously it was not possible to specify
the stop point with sub-second resolution.  But mostly this is to get
rid of TimestampTz-to-time_t conversion overhead during commit.  Per my
proposal of a day or two back.
2007-04-30 21:01:53 +00:00
Tom Lane 641912b4d1 Fix oversight in my patch of yesterday: forgot to ensure that stats would
still be forced out at backend exit.
2007-04-30 16:37:08 +00:00
Tom Lane 957d08c81f Implement rate-limiting logic on how often backends will attempt to send
messages to the stats collector.  This avoids the problem that enabling
stats_row_level for autovacuum has a significant overhead for short
read-only transactions, as noted by Arjen van der Meijden.  We can avoid
an extra gettimeofday call by piggybacking on the one done for WAL-logging
xact commit or abort (although that doesn't help read-only transactions,
since they don't WAL-log anything).

In my proposal for this, I noted that we could change the WAL log entries
for commit/abort to record full TimestampTz precision, instead of only
time_t as at present.  That's not done in this patch, but will be committed
separately.
2007-04-30 03:23:49 +00:00
Tom Lane bbbe825f5f Modify processing of DECLARE CURSOR and EXPLAIN so that they can resolve the
types of unspecified parameters when submitted via extended query protocol.
This worked in 8.2 but I had broken it during plancache changes.  DECLARE
CURSOR is now treated almost exactly like a plain SELECT through parse
analysis, rewrite, and planning; only just before sending to the executor
do we divert it away to ProcessUtility.  This requires a special-case check
in a number of places, but practically all of them were already special-casing
SELECT INTO, so it's not too ugly.  (Maybe it would be a good idea to merge
the two by treating IntoClause as a form of utility statement?  Not going to
worry about that now, though.)  That approach doesn't work for EXPLAIN,
however, so for that I punted and used a klugy solution of running parse
analysis an extra time if under extended query protocol.
2007-04-27 22:05:49 +00:00
Tom Lane a2e923a652 Fix dynahash.c to suppress hash bucket splits while a hash_seq_search() scan
is in progress on the same hashtable.  This seems the least invasive way to
fix the recently-recognized problem that a split could cause the scan to
visit entries twice or (with much lower probability) miss them entirely.
The only field-reported problem caused by this is the "failed to re-find
shared lock object" PANIC in COMMIT PREPARED reported by Michel Dorochevsky,
which was caused by multiply visited entries.  However, it seems certain
that mdsync() is vulnerable to missing required fsync's due to missed
entries, and I am fearful that RelationCacheInitializePhase2() might be at
risk as well.  Because of that and the generalized hazard presented by this
bug, back-patch all the supported branches.

Along the way, fix pg_prepared_statement() and pg_cursor() to not assume
that the hashtables they are examining will stay static between calls.
This is risky regardless of the newly noted dynahash problem, because
hash_seq_search() has never promised to cope with deletion of table entries
other than the just-returned one.  There may be no bug here because the only
supported way to call these functions is via ExecMakeTableFunctionResult()
which will cycle them to completion before doing anything very interesting,
but it seems best to get rid of the assumption.  This affects 8.2 and HEAD
only, since those functions weren't there earlier.
2007-04-26 23:24:46 +00:00
Neil Conway 16efdb5ec7 Rename the newly-added commands for discarding session state.
RESET SESSION, RESET PLANS, and RESET TEMP are now DISCARD ALL,
DISCARD PLANS, and DISCARD TEMP, respectively. This is to avoid
confusion with the pre-existing RESET variants: the DISCARD
commands are not actually similar to RESET. Patch from Marko
Kreen, with some minor editorialization.
2007-04-26 16:13:15 +00:00
Tom Lane afcf09dd90 Some further performance tweaks for planning large inheritance trees that
are mostly excluded by constraints: do the CE test a bit earlier to save
some adjust_appendrel_attrs() work on excluded children, and arrange to
use array indexing rather than rt_fetch() to fetch RTEs in the main body
of the planner.  The latter is something I'd wanted to do for awhile anyway,
but seeing list_nth_cell() as 35% of the runtime gets one's attention.
2007-04-21 21:01:45 +00:00
Peter Eisentraut b7edb568bd Make configuration parameters fall back to their default values when they
are removed from the configuration file.

Joachim Wieland
2007-04-21 20:02:41 +00:00
Tom Lane 9d37c038fc Repair PANIC condition in hash indexes when a previous index extension attempt
failed (due to lock conflicts or out-of-space).  We might have already
extended the index's filesystem EOF before failing, causing the EOF to be
beyond what the metapage says is the last used page.  Hence the invariant
maintained by the code needs to be "EOF is at or beyond last used page",
not "EOF is exactly the last used page".  Problem was created by my patch
of 2006-11-19 that attempted to repair bug #2737.  Since that was
back-patched to 7.4, this needs to be as well.  Per report and test case
from Vlastimil Krejcir.
2007-04-19 20:24:04 +00:00
Alvaro Herrera ef23a77441 Enable configurable log of autovacuum actions. Initial patch from Simon
Riggs, additional code and docs by me.  Per discussion.
2007-04-18 16:44:18 +00:00
Magnus Hagander de9effb55f Enable IPV6 for all MSVC builds, including the VC6 libpq-only build.
Per request from Hiroshi Saito.
2007-04-16 18:39:19 +00:00
Alvaro Herrera e2a186b03c Add a multi-worker capability to autovacuum. This allows multiple worker
processes to be running simultaneously.  Also, now autovacuum processes do not
count towards the max_connections limit; they are counted separately from
regular processes, and are limited by the new GUC variable
autovacuum_max_workers.

The launcher now has intelligence to launch workers on each database every
autovacuum_naptime seconds, limited only on the max amount of worker slots
available.

Also, the global worker I/O utilization is limited by the vacuum cost-based
delay feature.  Workers are "balanced" so that the total I/O consumption does
not exceed the established limit.  This part of the patch was contributed by
ITAGAKI Takahiro.

Per discussion.
2007-04-16 18:30:04 +00:00
Tom Lane 42dc4b66e6 Make plancache store cursor options so it can pass them to planner during
a replan.  I had originally thought this was not necessary, but the new
SPI facilities create a path whereby queries planned with non-default
options can get into the cache, so it is necessary.
2007-04-16 18:21:07 +00:00
Tom Lane f01b196597 Support scrollable cursors (ie, 'direction' clause in FETCH) in plpgsql.
Pavel Stehule, reworked a bit by Tom.
2007-04-16 17:21:24 +00:00
Tom Lane 66888f7424 Expose more cursor-related functionality in SPI: specifically, allow
access to the planner's cursor-related planning options, and provide new
FETCH/MOVE routines that allow access to the full power of those commands.
Small refactoring of planner(), pg_plan_query(), and pg_plan_queries()
APIs to make it convenient to pass the planning options down from SPI.

This is the core-code portion of Pavel Stehule's patch for scrollable
cursor support in plpgsql; I'll review and apply the plpgsql changes
separately.
2007-04-16 01:14:58 +00:00
Tom Lane fa92d21a48 Avoid running build_index_pathkeys() in situations where there cannot
possibly be any useful pathkeys --- to wit, queries with neither any
join clauses nor any ORDER BY request.  It's nearly free to check for
this case and it saves a useful fraction of the planning time for simple
queries.
2007-04-15 20:09:28 +00:00
Andrew Dunstan f97d4a267a Add --with-libxslt configure option 2007-04-15 12:48:24 +00:00
Tatsuo Ishii 6041b92238 Make JOHAB client only encoding per discussions in pgsql-hackers
"Server-side support of all encodings" around 2007/3/26.
initdb required.
2007-04-15 10:56:30 +00:00
Magnus Hagander 15ebfeec2d Add O_DIRECT support on Windows.
ITAGAKI Takahiro
2007-04-13 10:30:30 +00:00
Neil Conway 6df6d8e361 Fixes for RESET SESSION patch, per Alvaro. Fix a typo in the RESET
ref page (sorry, my fault!), and simplify the coding of
ResetTempTableNamespace().
2007-04-12 22:34:45 +00:00
Neil Conway d13e903bea RESET SESSION, plus related new DDL commands. Patch from Marko Kreen,
reviewed by Neil Conway. This patch adds the following DDL command
variants: RESET SESSION, RESET TEMP, RESET PLANS, CLOSE ALL, and
DEALLOCATE ALL. RESET SESSION is intended for use by connection
pool software and the like, in order to reset a client session
to something close to its initial state.

Note that while most of these command variants can be executed
inside a transaction block (but are not transaction-aware!),
RESET SESSION cannot. While this is inconsistent, it is intended
to catch programmer mistakes: RESET SESSION in an open transaction
block is probably unintended.
2007-04-12 06:53:49 +00:00
Tom Lane 226a100568 Code review for btree page split WAL reduction patch. Make it actually work
(original code *always* created a full-page image for the left page, thus
leaving the intended savings unrealized), avoid risk of not having enough room
on the page during xlog restore, squeeze out another couple bytes in the xlog
record, clean up neglected comments.
2007-04-11 20:47:38 +00:00
Tom Lane 56218fbc48 Minor tweaking of index special-space definitions so that the various
index types can be reliably distinguished by examining the special space
on an index page.  Per my earlier proposal, plus the realization that
there's no need for btree's vacuum cycle ID to cycle through every possible
16-bit value.  Restricting its range a little costs nearly nothing and
eliminates the possibility of collisions.
Memo to self: remember to make bitmap indexes play along with this scheme,
assuming that patch ever gets accepted.
2007-04-09 22:04:08 +00:00
Tom Lane 7b78474da3 Make CLUSTER MVCC-safe. Heikki Linnakangas 2007-04-08 01:26:33 +00:00
Tom Lane f02a82b6ad Make 'col IS NULL' clauses be indexable conditions.
Teodor Sigaev, with some kibitzing from Tom Lane.
2007-04-06 22:33:43 +00:00
Tom Lane 37a609b27f Now that core functionality is depending on autoconf's AC_C_BIGENDIAN to be
right, there seems precious little reason to have a pile of hand-maintained
endianness definitions in src/include/port/*.h.  Get rid of those, and make
the couple of places that used them depend on WORDS_BIGENDIAN instead.
2007-04-06 05:36:51 +00:00
Tom Lane 3e23b68dac Support varlena fields with single-byte headers and unaligned storage.
This commit breaks any code that assumes that the mere act of forming a tuple
(without writing it to disk) does not "toast" any fields.  While all available
regression tests pass, I'm not totally sure that we've fixed every nook and
cranny, especially in contrib.

Greg Stark with some help from Tom Lane
2007-04-06 04:21:44 +00:00
Tom Lane 9c9b619473 Remove the CheckpointStartLock in favor of having backends show whether they
are in their commit critical sections via flags in the ProcArray.  Checkpoint
can watch the ProcArray to determine when it's safe to proceed.  This is
a considerably better solution to the original problem of race conditions
between checkpoint and transaction commit: it speeds up commit, since there's
one less lock to fool with, and it prevents the problem of checkpoint being
delayed indefinitely when there's a constant flow of commits.  Heikki, with
some kibitzing from Tom.
2007-04-03 16:34:36 +00:00
Tom Lane b3005276eb Decouple the values of TOAST_TUPLE_THRESHOLD and TOAST_MAX_CHUNK_SIZE.
Add the latter to the values checked in pg_control, since it can't be changed
without invalidating toast table content.  This commit in itself shouldn't
change any behavior, but it lays some necessary groundwork for experimentation
with these toast-control numbers.

Note: while TOAST_TUPLE_THRESHOLD can now be changed without initdb, some
thought still needs to be given to needs_toast_table() in toasting.c before
unleashing random changes.
2007-04-03 04:14:26 +00:00
Tom Lane 57690c6803 Support enum data types. Along the way, use macros for the values of
pg_type.typtype whereever practical.  Tom Dunstan, with some kibitzing
from Tom Lane.
2007-04-02 03:49:42 +00:00
Peter Eisentraut a482a3e58b Update catversion for new XML mapping functions 2007-04-01 09:56:02 +00:00
Peter Eisentraut 0b75afda92 Mapping schemas and databases to XML and XML Schema.
Refactor and document the remaining mapping code.
2007-04-01 09:00:26 +00:00
Magnus Hagander 335feca441 Add some instrumentation to the bgwriter, through the stats collector.
New view pg_stat_bgwriter, and the functions required to build it.
2007-03-30 18:34:56 +00:00
Tom Lane fba8113c1b Teach CLUSTER to skip writing WAL if not needed (ie, not using archiving)
--- Simon.
Also, code review and cleanup for the previous COPY-no-WAL patches --- Tom.
2007-03-29 00:15:39 +00:00
Tom Lane bf94076348 Fix array coercion expressions to ensure that the correct volatility is
seen by code inspecting the expression.  The best way to do this seems
to be to drop the original representation as a function invocation, and
instead make a special expression node type that represents applying
the element-type coercion function to each array element.  In this way
the element function is exposed and will be checked for volatility.
Per report from Guillaume Smet.
2007-03-27 23:21:12 +00:00
Tom Lane 55a7cf80a0 Allow non-superuser database owners to create procedural languages.
A DBA is allowed to create a language in his database if it's marked
"tmpldbacreate" in pg_pltemplate.  The factory default is that this is set
for all standard trusted languages, but of course a superuser may adjust
the settings.  In service of this, add the long-foreseen owner column to
pg_language; renaming, dropping, and altering owner of a PL now follow
normal ownership rules instead of being superuser-only.
Jeremy Drake, with some editorialization by Tom Lane.
2007-03-26 16:58:41 +00:00
Tom Lane 7ee8fd9113 Seems some people have been forgetting to run autoheader. 2007-03-26 02:38:22 +00:00
Tom Lane bf8236526b Remove the prohibition on executing cursor commands through SPI_execute.
Vadim had included this restriction in the original design of the SPI code,
but I'm darned if I can see a reason for it.

I left the macro definition of SPI_ERROR_CURSOR in place, so as not to
needlessly break any SPI callers that are checking for it, but that code
will never actually be returned anymore.
2007-03-25 23:27:59 +00:00
Tom Lane e85a01df67 Clean up the representation of special snapshots by including a "method
pointer" in every Snapshot struct.  This allows removal of the case-by-case
tests in HeapTupleSatisfiesVisibility, which should make it a bit faster
(I didn't try any performance tests though).  More importantly, we are no
longer violating portable C practices by assuming that small integers are
distinct from all pointer values, and HeapTupleSatisfiesDirty no longer
has a non-reentrant API involving side-effects on a global variable.

There were a couple of places calling HeapTupleSatisfiesXXX routines
directly rather than through the HeapTupleSatisfiesVisibility macro.
Since these places had to be changed anyway, I chose to make them go
through the macro for uniformity.

Along the way I renamed HeapTupleSatisfiesSnapshot to HeapTupleSatisfiesMVCC
to emphasize that it's only used with MVCC-type snapshots.  I was sorely
tempted to rename HeapTupleSatisfiesVisibility to HeapTupleSatisfiesSnapshot,
but forebore for the moment to avoid confusion and reduce the likelihood that
this patch breaks some of the pending patches.  Might want to reconsider
doing that later.
2007-03-25 19:45:14 +00:00
Tatsuo Ishii 75c6519ff6 Add new encoding EUC_JIS_2004 and SHIFT_JIS_2004,
along with new conversions among EUC_JIS_2004, SHIFT_JIS_2004 and UTF-8.
catalog version has been bump up.
2007-03-25 11:56:04 +00:00
Tom Lane 23a41573c4 Adjust DatumGetBool macro so that it isn't fooled by garbage in the Datum
to the left of the actual bool value.  While in most cases there won't be
any, our support for old-style user-defined functions violates the C spec
to the extent of calling functions that might return char or short through
a function pointer declared to return "char *", which we then coerce to
Datum.  It is not surprising that the result might contain garbage
high-order bits ... what is surprising is that we didn't see such cases
long ago.  Per report from Magnus.
2007-03-23 20:24:41 +00:00
Tom Lane 547b6e537a Fix plancache so that any required replanning is done with the same
search_path that was active when the plan was first made.  To do this,
improve namespace.c to support a stack of "override" search path settings
(we must have a stack since nested replan events are entirely possible).
This facility replaces the "special namespace" hack formerly used by
CREATE SCHEMA, and should be able to support per-function search path
settings as well.
2007-03-23 19:53:52 +00:00
Magnus Hagander 1ca6ab1c78 Remove headers for old sysv shmem emulation that I forgot.
Also remove headers for old sysv semaphore emulation that were forgotten
when that was changed about a year ago.
2007-03-23 08:30:55 +00:00
Bruce Momjian e651bcf3f6 Add xmlpath() to evaluate XPath expressions, with namespaces support.
Nikolay Samokhvalov
2007-03-22 20:14:58 +00:00
Bruce Momjian 686956375a Allow the pgstat process to restart immediately after a receiving
SIGQUIT signal, rather than waiting for PGSTAT_RESTART_INTERVAL.
2007-03-22 19:53:31 +00:00
Neil Conway 9eb78beeae Add three new regexp functions: regexp_matches, regexp_split_to_array,
and regexp_split_to_table. These functions provide access to the
capture groups resulting from a POSIX regular expression match,
and provide the ability to split a string on a POSIX regular
expression, respectively. Patch from Jeremy Drake; code review by
Neil Conway, additional comments and suggestions from Tom and
Peter E.

This patch bumps the catversion, adds some regression tests,
and updates the docs.
2007-03-20 05:45:00 +00:00
Jan Wieck 5e96b04a7c Bumping catversion due to changes to pg_trigger and pg_rewrite.
BTW, the comment in this file says that we hope we never have more than
10 catversion changes per day, but to even make this possible we should
start counting at zero, shouldn't we?


Jan
2007-03-20 03:53:26 +00:00
Jan Wieck 0fe16500d3 Changes pg_trigger and extend pg_rewrite in order to allow triggers and
rules to be defined with different, per session controllable, behaviors
for replication purposes.

This will allow replication systems like Slony-I and, as has been stated
on pgsql-hackers, other products to control the firing mechanism of
triggers and rewrite rules without modifying the system catalog directly.

The firing mechanisms are controlled by a new superuser-only GUC
variable, session_replication_role, together with a change to
pg_trigger.tgenabled and a new column pg_rewrite.ev_enabled. Both
columns are a single char data type now (tgenabled was a bool before).
The possible values in these attributes are:

     'O' - Trigger/Rule fires when session_replication_role is "origin"
           (default) or "local". This is the default behavior.

     'D' - Trigger/Rule is disabled and fires never

     'A' - Trigger/Rule fires always regardless of the setting of
           session_replication_role

     'R' - Trigger/Rule fires when session_replication_role is "replica"

The GUC variable can only be changed as long as the system does not have
any cached query plans. This will prevent changing the session role and
accidentally executing stored procedures or functions that have plans
cached that expand to the wrong query set due to differences in the rule
firing semantics.

The SQL syntax for changing a triggers/rules firing semantics is

     ALTER TABLE <tabname> <when> TRIGGER|RULE <name>;

     <when> ::= ENABLE | ENABLE ALWAYS | ENABLE REPLICA | DISABLE

psql's \d command as well as pg_dump are extended in a backward
compatible fashion.

Jan
2007-03-19 23:38:32 +00:00
Tom Lane 0f4ff460c4 Fix up the remaining places where the expression node structure would lose
available information about the typmod of an expression; namely, Const,
ArrayRef, ArrayExpr, and EXPR and ARRAY SubLinks.  In the ArrayExpr and
SubLink cases it wasn't really the data structure's fault, but exprTypmod()
being lazy.  This seems like a good idea in view of the expected increase in
typmod usage from Teodor's work to allow user-defined types to have typmods.
In particular this responds to the concerns we had about eliminating the
special-purpose hack that exprTypmod() used to have for BPCHAR Consts.
We can now tell whether or not such a Const has been cast to a specific
length, and report or display properly if so.

initdb forced due to changes in stored rules.
2007-03-17 00:11:05 +00:00
Magnus Hagander 51d7741db1 Add new columns for tuple statistics on a database level to
pg_stat_database.
2007-03-16 17:57:36 +00:00
Tom Lane 95f6d2d209 Make use of plancache module for SPI plans. In particular, since plpgsql
uses SPI plans, this finally fixes the ancient gotcha that you can't
drop and recreate a temp table used by a plpgsql function.

Along the way, clean up SPI's API a little bit by declaring SPI plan
pointers as "SPIPlanPtr" instead of "void *".  This is cosmetic but
helps to forestall simple programming mistakes.  (I have changed some
but not all of the callers to match; there are still some "void *"'s
in contrib and the PL's.  This is intentional so that we can see if
anyone's compiler complains about it.)
2007-03-15 23:12:07 +00:00
Peter Eisentraut f4ee82e3d3 Reverted waiting for further fixes:
Make configuration parameters fall back to their default values when they
are removed from the configuration file.

Joachim Wieland
2007-03-13 14:32:25 +00:00
Tom Lane b9527e9840 First phase of plan-invalidation project: create a plan cache management
module and teach PREPARE and protocol-level prepared statements to use it.
In service of this, rearrange utility-statement processing so that parse
analysis does not assume table schemas can't change before execution for
utility statements (necessary because we don't attempt to re-acquire locks
for utility statements when reusing a stored plan).  This requires some
refactoring of the ProcessUtility API, but it ends up cleaner anyway,
for instance we can get rid of the QueryContext global.

Still to do: fix up SPI and related code to use the plan cache; I'm tempted to
try to make SQL functions use it too.  Also, there are at least some aspects
of system state that we want to ensure remain the same during a replan as in
the original processing; search_path certainly ought to behave that way for
instance, and perhaps there are others.
2007-03-13 00:33:44 +00:00
Peter Eisentraut f84308f195 Make configuration parameters fall back to their default values when they
are removed from the configuration file.

Joachim Wieland
2007-03-12 22:09:28 +00:00
Alvaro Herrera 626eb02198 Cleanup the bootstrap code a little, and rename "dummy procs" in the code
comments and variables to "auxiliary proc", per Heikki's request.
2007-03-07 13:35:03 +00:00
Bruce Momjian a535cdf130 Revert temp_tablespaces because of coding problems, per Tom. 2007-03-06 02:06:15 +00:00
Bruce Momjian 63c678d17b Fix for COPY-after-truncate feature.
Simon Riggs
2007-03-03 20:08:41 +00:00
Bruce Momjian ae35867a39 Remove undo information from pg_controldata --- never used.
Florian G. Pflug
2007-03-03 20:02:27 +00:00
Bruce Momjian 0763a56501 Add lo_truncate() to backend and libpq for large object truncation.
Kris Jurka
2007-03-03 19:52:47 +00:00
Neil Conway 90d76525c5 Add resetStringInfo(), which clears the content of a StringInfo, and
fixup various places in the tree that were clearing a StringInfo by hand.
Making this function a part of the API simplifies client code slightly,
and avoids needlessly peeking inside the StringInfo interface.
2007-03-03 19:32:55 +00:00
Bruce Momjian e52c4a6e26 Add GUC log_lock_waits to log long wait times.
Simon Riggs
2007-03-03 18:46:40 +00:00
Tom Lane 61c3e5b248 Make log_min_error_statement put LOG level at the same priority as
log_min_messages does; and arrange to suppress the duplicative output
that would otherwise result from log_statement and log_duration messages.
Bruce Momjian and Tom Lane.
2007-03-02 23:37:23 +00:00
Tom Lane fb276438b6 Suppress useless searches for unused line pointers in PageAddItem. To do
this, add a 16-bit "flags" field to page headers by stealing some bits from
pd_tli.  We use one flag bit as a hint to indicate whether there are any
unused line pointers; the remaining 15 are available for future use.

This is a cut-down form of an idea proposed by Hiroki Kataoka in July 2005.
At the time it was rejected because the original patch increased the size of
page headers and it wasn't clear that the benefit outweighed the distributed
cost.  The flag-bit approach gets most of the benefit without requiring an
increase in the page header size.

Heikki Linnakangas and Tom Lane
2007-03-02 00:48:44 +00:00
Peter Eisentraut 7b76bfbe18 Fix date/time formats for XML Schema output.
Pavel Stehule
2007-03-01 14:52:04 +00:00
Tom Lane 234a02b2a8 Replace direct assignments to VARATT_SIZEP(x) with SET_VARSIZE(x, len).
Get rid of VARATT_SIZE and VARATT_DATA, which were simply redundant with
VARSIZE and VARDATA, and as a consequence almost no code was using the
longer names.  Rename the length fields of struct varlena and various
derived structures to catch anyplace that was accessing them directly;
and clean up various places so caught.  In itself this patch doesn't
change any behavior at all, but it is necessary infrastructure if we hope
to play any games with the representation of varlena headers.
Greg Stark and Tom Lane
2007-02-27 23:48:10 +00:00
Tom Lane c7ff7663e4 Get rid of the separate EState for subplans, and just let them share the
parent query's EState.  Now that there's a single flat rangetable for both
the main plan and subplans, there's no need anymore for a separate EState,
and removing it allows cleaning up some crufty code in nodeSubplan.c and
nodeSubqueryscan.c.  Should be a tad faster too, although any difference
will probably be hard to measure.  This is the last bit of subsidiary
mop-up work from changing to a flat rangetable.
2007-02-27 01:11:26 +00:00
Tom Lane 655aa5b330 Now that plans have flat rangetable lists, it's a lot easier to get EXPLAIN to
drill down into subplan targetlists to print the referent expression for an
OUTER or INNER var in an upper plan node.  Hence, make it do that always, and
banish the old hack of showing "?columnN?" when things got too complicated.

Along the way, fix an EXPLAIN bug I introduced by suppressing subqueries from
execution-time range tables: get_name_for_var_field() assumed it could look at
rte->subquery to find out the real type of a RECORD var.  That doesn't work
anymore, but instead we can look at the input plan of the SubqueryScan plan
node.
2007-02-23 21:59:45 +00:00
Bruce Momjian 9cc2a71c38 Move BLCKSZ < 1024 check to guc.c. 2007-02-23 21:36:19 +00:00
Tom Lane eab6b8b27e Turn the rangetable used by the executor into a flat list, and avoid storing
useless substructure for its RangeTblEntry nodes.  (I chose to keep using the
same struct node type and just zero out the link fields for unneeded info,
rather than making a separate ExecRangeTblEntry type --- it seemed too
fragile to have two different rangetable representations.)

Along the way, put subplans into a list in the toplevel PlannedStmt node,
and have SubPlan nodes refer to them by list index instead of direct pointers.
Vadim wanted to do that years ago, but I never understood what he was on about
until now.  It makes things a *whole* lot more robust, because we can stop
worrying about duplicate processing of subplans during expression tree
traversals.  That's been a constant source of bugs, and it's finally gone.

There are some consequent simplifications yet to be made, like not using
a separate EState for subplans in the executor, but I'll tackle that later.
2007-02-22 22:00:26 +00:00
Bruce Momjian 6f519ad01c btree source code cleanups:
I refactored findsplitloc and checksplitloc so that the division of
labor is more clear IMO. I pushed all the space calculation inside the
loop to checksplitloc.

I also fixed the off by 4 in free space calculation caused by
PageGetFreeSpace subtracting sizeof(ItemIdData), even though it was
harmless, because it was distracting and I felt it might come back to
bite us in the future if we change the page layout or alignments.
There's now a new function PageGetExactFreeSpace that doesn't do the
subtraction.

findsplitloc now tries the "just the new item to right page" split as
well. If people don't like the refactoring, I can write a patch to just
add that.

Heikki Linnakangas
2007-02-21 20:02:17 +00:00
Bruce Momjian 6765df9174 Add configure --enable-profiling to enable GCC profiling. Patches from
Korry Douglas and Nikhil S
2007-02-21 15:12:39 +00:00
Bruce Momjian 272b6ef20d Prevent BLCKSZ < 1024, and have initdb test shared buffers based on the
BLCKSZ value.
2007-02-20 23:49:38 +00:00
Tom Lane 9cbd0c155d Remove the Query structure from the executor's API. This allows us to stop
storing mostly-redundant Query trees in prepared statements, portals, etc.
To replace Query, a new node type called PlannedStmt is inserted by the
planner at the top of a completed plan tree; this carries just the fields of
Query that are still needed at runtime.  The statement lists kept in portals
etc. now consist of intermixed PlannedStmt and bare utility-statement nodes
--- no Query.  This incidentally allows us to remove some fields from Query
and Plan nodes that shouldn't have been there in the first place.

Still to do: simplify the execution-time range table; at the moment the
range table passed to the executor still contains Query trees for subqueries.

initdb forced due to change of stored rules.
2007-02-20 17:32:18 +00:00
Peter Eisentraut 63e03ba923 Add missing OIDs to pg_proc. 2007-02-20 10:00:25 +00:00
Bruce Momjian 3e803f7273 Add "isodow" option to EXTRACT() and date_part() where Sunday = 7. 2007-02-19 17:41:39 +00:00
Tom Lane 7c5e5439d2 Get rid of some old and crufty global variables in the planner. When
this code was last gone over, there wasn't really any alternative to
globals because we didn't have the PlannerInfo struct being passed all
through the planner code.  Now that we do, we can restructure things
to avoid non-reentrancy.  I'm fooling with this because otherwise I'd
have had to add another global variable for the planned compact
range table list.
2007-02-19 07:03:34 +00:00
Tom Lane b8c3267792 Put function expressions and values lists into FunctionScan and ValuesScan
plan nodes, so that the executor does not need to get these items from
the range table at runtime.  This will avoid needing to include these
fields in the compact range table I'm expecting to make the executor use.
2007-02-19 02:23:12 +00:00
Bruce Momjian 89a624439e Create AVG() aggregates for int8 and NUMERIC which do not compute X^2,
as a performance enhancement.

Mark Kirkwood
2007-02-17 00:55:58 +00:00
Tom Lane 8249409bc1 Adjust the definition of is_pushed_down so that it's always true for INNER
JOIN quals, just like WHERE quals, even if they reference every one of the
join's relations.  Now that we can reorder outer and inner joins, it's
possible for such a qual to end up being assigned to an outer join plan node,
and we mustn't have it treated as a join qual rather than a filter qual for
the node.  (If it were, the join could produce null-extended rows that it
shouldn't.)  Per bug report from Pelle Johansson.
2007-02-16 20:57:19 +00:00
Tom Lane b6c9165ea0 Code review for SSLKEY patch. 2007-02-16 17:07:00 +00:00
Peter Eisentraut 355e05ab41 Functions for mapping table data and table schemas to XML (a.k.a. XML export) 2007-02-16 07:46:55 +00:00
Bruce Momjian 4ebb0cf9c3 Add two new format fields for use with to_char(), to_date() and
to_timestamp():
    - ID for day-of-week
    - IDDD for day-of-year

This makes it possible to convert ISO week dates to and from text
fully represented in either week ('IYYY-IW-ID') or day-of-year
('IYYY-IDDD') format.

I have also added an 'isoyear' field for use with extract / date_part.

Brendan Jurd
2007-02-16 03:39:46 +00:00
Bruce Momjian c7b08050d9 SSL improvements:
o read global SSL configuration file
	o add GUC "ssl_ciphers" to control allowed ciphers
	o add libpq environment variable PGSSLKEY to control SSL hardware keys

Victor B. Wagner
2007-02-16 02:59:41 +00:00
Tom Lane 6bef118b01 Restructure code that is responsible for ensuring that clauseless joins are
considered when it is necessary to do so because of a join-order restriction
(that is, an outer-join or IN-subselect construct).  The former coding was a
bit ad-hoc and inconsistent, and it missed some cases, as exposed by Mario
Weilguni's recent bug report.  His specific problem was that an IN could be
turned into a "clauseless" join due to constant-propagation removing the IN's
joinclause, and if the IN's subselect involved more than one relation and
there was more than one such IN linking to the same upper relation, then the
only valid join orders involve "bushy" plans but we would fail to consider the
specific paths needed to get there.  (See the example case added to the join
regression test.)  On examining the code I wonder if there weren't some other
problem cases too; in particular it seems that GEQO was defending against a
different set of corner cases than the main planner was.  There was also an
efficiency problem, in that when we did realize we needed a clauseless join
because of an IN, we'd consider clauseless joins against every other relation
whether this was sensible or not.  It seems a better design is to use the
outer-join and in-clause lists as a backup heuristic, just as the rule of
joining only where there are joinclauses is a heuristic: we'll join two
relations if they have a usable joinclause *or* this might be necessary to
satisfy an outer-join or IN-clause join order restriction.  I refactored the
code to have just one place considering this instead of three, and made sure
that it covered all the cases that any of them had been considering.

Backpatch as far as 8.1 (which has only the IN-clause form of the disease).
By rights 8.0 and 7.4 should have the bug too, but they accidentally fail
to fail, because the joininfo structure used in those releases preserves some
memory of there having once been a joinclause between the inner and outer
sides of an IN, and so it leads the code in the right direction anyway.
I'll be conservative and not touch them.
2007-02-16 00:14:01 +00:00
Alvaro Herrera 1820650934 Restructure autovacuum in two processes: a dummy process, which runs
continuously, and requests vacuum runs of "autovacuum workers" to postmaster.
The workers do the actual vacuum work.  This allows for future improvements,
like allowing multiple autovacuum jobs running in parallel.

For now, the code keeps the original behavior of having a single autovac
process at any time by sleeping until the previous worker has finished.
2007-02-15 23:23:23 +00:00
Tom Lane bfe553fb49 Repair oversight in 8.2 change that improved the handling of "pseudoconstant"
WHERE clauses.  createplan.c is now willing to stick a gating Result node
almost anywhere in the plan tree, and in particular one can wind up directly
underneath a MergeJoin node.  This means it had better be willing to handle
Mark/Restore.  Fortunately, that's trivial in such cases, since we can just
pass off the call to the input node (which the planner has previously ensured
can handle Mark/Restore).  Per report from Phil Frost.
2007-02-15 03:07:13 +00:00
Bruce Momjian a9eb53969a Move fsync method macro defines into /include/access/xlogdefs.h so they
can be used by src/tools/fsync/test_fsync.c.
2007-02-14 05:00:40 +00:00
Tom Lane 7bddca3450 Fix up foreign-key mechanism so that there is a sound semantic basis for the
equality checks it applies, instead of a random dependence on whatever
operators might be named "=".  The equality operators will now be selected
from the opfamily of the unique index that the FK constraint depends on to
enforce uniqueness of the referenced columns; therefore they are certain to be
consistent with that index's notion of equality.  Among other things this
should fix the problem noted awhile back that pg_dump may fail for foreign-key
constraints on user-defined types when the required operators aren't in the
search path.  This also means that the former warning condition about "foreign
key constraint will require costly sequential scans" is gone: if the
comparison condition isn't indexable then we'll reject the constraint
entirely. All per past discussions.

Along the way, make the RI triggers look into pg_constraint for their
information, instead of using pg_trigger.tgargs; and get rid of the always
error-prone fixed-size string buffers in ri_triggers.c in favor of building up
the RI queries in StringInfo buffers.

initdb forced due to columns added to pg_constraint and pg_trigger.
2007-02-14 01:58:58 +00:00
Peter Eisentraut eb19144894 Add support for optionally escaping periods when converting SQL identifiers
to XML names, which will be required for supporting XML export.
2007-02-11 22:18:16 +00:00
Tom Lane f44271176e Call pgstat_drop_database during DROP DATABASE, so that any stats file
entries for the victim database go away sooner rather than later.  We already
did the equivalent thing at the per-relation level, not sure why it's not
been done for whole databases.  With this change, pgstat_vacuum_tabstat
should usually not find anything to do; though we still need it as a backstop
in case DROPDB or TABPURGE messages get lost under load.
2007-02-09 16:12:19 +00:00
Tom Lane c398300330 Combine cmin and cmax fields of HeapTupleHeaders into a single field, by
keeping private state in each backend that has inserted and deleted the same
tuple during its current top-level transaction.  This is sufficient since
there is no need to be able to determine the cmin/cmax from any other
transaction.  This gets us back down to 23-byte headers, removing a penalty
paid in 8.0 to support subtransactions.  Patch by Heikki Linnakangas, with
minor revisions by moi, following a design hashed out awhile back on the
pghackers list.
2007-02-09 03:35:35 +00:00
Bruce Momjian b79575ce45 Reduce WAL activity for page splits:
> Currently, an index split writes all the data on the split page to
> WAL. That's a lot of WAL traffic. The tuples that are copied to the
> right page need to be WAL logged, but the tuples that stay on the
> original page don't.

Heikki Linnakangas
2007-02-08 05:05:53 +00:00
Tom Lane aec4cf1c8c Add a function pg_stat_clear_snapshot() that discards any statistics snapshot
already collected in the current transaction; this allows plpgsql functions to
watch for stats updates even though they are confined to a single transaction.
Use this instead of the previous kluge involving pg_stat_file() to wait for
the stats collector to update in the stats regression test.  Internally,
decouple storage of stats snapshots from transaction boundaries; they'll
now stick around until someone calls pgstat_clear_snapshot --- which xact.c
still does at transaction end, to maintain the previous behavior.  This makes
the logic a lot cleaner, at the price of a couple dozen cycles per transaction
exit.
2007-02-07 23:11:30 +00:00
Peter Eisentraut 4f64a07bee Add strlcat() from OpenBSD, to be used for replacing strncat and other
strange coding practices.
2007-02-07 00:28:55 +00:00
Peter Eisentraut 037f8413fa Move NAMEDATALEN definition from postgres_ext.h to pg_config_manual.h. It
used to be part of libpq's exported interface many releases ago, but now
it's no longer necessary to make it accessible to clients.
2007-02-06 09:16:08 +00:00
Tom Lane ab05eedecc Add support for cross-type hashing in hashed subplans (hashed IN/NOT IN cases
that aren't turned into true joins).  Since this is the last missing bit of
infrastructure, go ahead and fill out the hash integer_ops and float_ops
opfamilies with cross-type operators.  The operator family project is now
DONE ... er, except for documentation ...
2007-02-06 02:59:15 +00:00
Tom Lane 23c4978e6c Rename MaxTupleSize to MaxHeapTupleSize to clarify that it's not meant to
describe the maximum size of index tuples (which is typically AM-dependent
anyway); and consequently remove the bogus deduction for "special space"
that was built into it.

Adjust TOAST_TUPLE_THRESHOLD and TOAST_MAX_CHUNK_SIZE to avoid wasting two
bytes per toast chunk, and to ensure that the calculation correctly tracks any
future changes in page header size.  The computation had been inaccurate in a
way that didn't cause any harm except space wastage, but future changes could
have broken it more drastically.

Fix the calculation of BTMaxItemSize, which was formerly computed as 1 byte
more than it could safely be.  This didn't cause any harm in practice because
it's only compared against maxalign'd lengths, but future changes in the size
of page headers or btree special space could have exposed the problem.

initdb forced because of change in TOAST_MAX_CHUNK_SIZE, which alters the
storage of toast tables.
2007-02-05 04:22:18 +00:00
Tom Lane a2e092e1c7 Don't MAXALIGN in the checks to decide whether a tuple is over TOAST's
threshold for tuple length.  On 4-byte-MAXALIGN machines, the toast code
creates tuples that have t_len exactly TOAST_TUPLE_THRESHOLD ... but this
number is not itself maxaligned, so if heap_insert maxaligns t_len before
comparing to TOAST_TUPLE_THRESHOLD, it'll uselessly recurse back to
tuptoaster.c, wasting cycles.  (It turns out that this does not happen on
8-byte-MAXALIGN machines, because for them the outer MAXALIGN in the
TOAST_MAX_CHUNK_SIZE macro reduces TOAST_MAX_CHUNK_SIZE so that toast tuples
will be less than TOAST_TUPLE_THRESHOLD in size.  That MAXALIGN is really
incorrect, but we can't remove it now, see below.)  There isn't any particular
value in maxaligning before comparing to the thresholds, so just don't do
that, which saves a small number of cycles in itself.

These numbers should be rejiggered to minimize wasted space on toast-relation
pages, but we can't do that in the back branches because changing
TOAST_MAX_CHUNK_SIZE would force an initdb (by changing the contents of toast
tables).  We can move the toast decision thresholds a bit, though, which is
what this patch effectively does.

Thanks to Pavan Deolasee for discovering the unintended recursion.

Back-patch into 8.2, but not further, pending more testing.  (HEAD is about
to get a further patch modifying the thresholds, so it won't help much
for testing this form of the patch.)
2007-02-04 20:00:37 +00:00
Peter Eisentraut ec020e1ceb Implement XMLSERIALIZE for real. Analogously, make the xml to text cast
observe the xmloption.

Reorganize the representation of the XML option in the parse tree and the
API to make it easier to manage and understand.

Add regression tests for parsing back XML expressions.
2007-02-03 14:06:56 +00:00
Tom Lane 5413eef8dc Repair failure to check that a table is still compatible with a previously
made query plan.  Use of ALTER COLUMN TYPE creates a hazard for cached
query plans: they could contain Vars that claim a column has a different
type than it now has.  Fix this by checking during plan startup that Vars
at relation scan level match the current relation tuple descriptor.  Since
at that point we already have at least AccessShareLock, we can be sure the
column type will not change underneath us later in the query.  However,
since a backend's locks do not conflict against itself, there is still a
hole for an attacker to exploit: he could try to execute ALTER COLUMN TYPE
while a query is in progress in the current backend.  Seal that hole by
rejecting ALTER TABLE whenever the target relation is already open in
the current backend.

This is a significant security hole: not only can one trivially crash the
backend, but with appropriate misuse of pass-by-reference datatypes it is
possible to read out arbitrary locations in the server process's memory,
which could allow retrieving database content the user should not be able
to see.  Our thanks to Jeff Trout for the initial report.

Security: CVE-2007-0556
2007-02-02 00:07:03 +00:00
Bruce Momjian 8b4ff8b6a1 Wording cleanup for error messages. Also change can't -> cannot.
Standard English uses "may", "can", and "might" in different ways:

        may - permission, "You may borrow my rake."

        can - ability, "I can lift that log."

        might - possibility, "It might rain today."

Unfortunately, in conversational English, their use is often mixed, as
in, "You may use this variable to do X", when in fact, "can" is a better
choice.  Similarly, "It may crash" is better stated, "It might crash".
2007-02-01 19:10:30 +00:00
Neil Conway 05ce7d6a41 Rewrite uuid input and output routines to avoid dependency on the
nonportable "hh" sprintf(3) length modifier. Instead, do the parsing
and output by hand. The code to do this isn't ideal, but this is
an interim measure anyway: the uuid type should probably use the
in-memory struct layout specified by RFC 4122. For now, this patch
should hopefully rectify the buildfarm failures for the uuid test.

Along the way, re-add pg_cast entries for uuid <-> varchar, which
I mistakenly removed earlier, and bump the catversion.
2007-01-31 19:33:54 +00:00
Teodor Sigaev d4c6da1527 Allow GIN's extractQuery method to signal that nothing can satisfy the query.
In this case extractQuery should returns -1 as nentries. This changes
prototype of extractQuery method to use int32* instead of uint32* for
nentries argument.
Based on that gincostestimate may see two corner cases: nothing will be found
or seqscan should be used.

Per proposal at http://archives.postgresql.org/pgsql-hackers/2007-01/msg01581.php

PS tsearch_core patch should be sightly modified to support changes, but I'm
waiting a verdict about reviewing of tsearch_core patch.
2007-01-31 15:09:45 +00:00
Tom Lane a635c08fa1 Add support for cross-type hashing in hash index searches and hash joins.
Hashing for aggregation purposes still needs work, so it's not time to
mark any cross-type operators as hashable for general use, but these cases
work if the operators are so marked by hand in the system catalogs.
2007-01-30 01:33:36 +00:00
Tom Lane b39e91501c Improve hash join to discard input tuples immediately if they can't
match because they contain a null join key (and the join operator is
known strict).  Improves performance significantly when the inner
relation contains a lot of nulls, as per bug #2930.
2007-01-28 23:21:26 +00:00
Neil Conway 74a1a2b8b1 Rename the uuid_t type to pg_uuid_t, to avoid a conflict with any
definitions of uuid_t that may be provided by the system headers. This
should hopefully fix the Win32 build problems reported by Magnus.
2007-01-28 20:25:38 +00:00
Neil Conway a534068e0e Add a new builtin type, "uuid". This implements a UUID type, similar to
that defined in RFC 4122. This patch includes the basic implementation,
plus regression tests. Documentation and perhaps some additional
functionality will come later. Catversion bumped.

Patch from Gevik Babakhani; review from Peter, Tom, and myself.
2007-01-28 16:16:54 +00:00
Bruce Momjian 91ed399517 Use autoconf build-in sys_siglist macro AC_DECL_SYS_SIGLIST, rather than
create our own.
2007-01-28 03:50:34 +00:00
Bruce Momjian 82480fc254 Use sys_siglist[] to print out signal names for signal exits, rather
than just numbers.
2007-01-28 01:12:05 +00:00
Tom Lane 4355d214c2 On Windows, use pgwin32_waitforsinglesocket() instead of select() to wait for
input in the stats collector.  Our select() emulation is apparently buggy
for UDP sockets :-(.  This should resolve problems with stats collection
(and hence autovacuum) failing under more than minimal load.  Diagnosis
and patch by Magnus Hagander.

Patch probably needs to be back-ported to 8.1 and 8.0, but first let's
see if it makes the buildfarm happy...
2007-01-26 20:06:52 +00:00
Neil Conway 8ff2bccee3 Squelch some VC++ compiler warnings. Mark float literals with the "f"
suffix, to distinguish them from doubles. Make some function declarations
and definitions use the "const" qualifier for arguments consistently.
Ignore warning 4102 ("unreferenced label"), because such warnings
are always emitted by bison-generated code. Patch from Magnus Hagander.
2007-01-26 17:45:42 +00:00
Bruce Momjian 70268b50dd Update Win32 exception comment. 2007-01-25 21:50:49 +00:00
Peter Eisentraut 22bd156ff0 Various fixes in the logic of XML functions:
- Add new SQL command SET XML OPTION (also available via regular GUC) to
  control the DOCUMENT vs. CONTENT option in implicit parsing and
  serialization operations.

- Subtle corrections in the handling of the standalone property in
  xmlroot().

- Allow xmlroot() to work on content fragments.

- Subtle corrections in the handling of the version property in
  xmlconcat().

- Code refactoring for producing XML declarations.
2007-01-25 11:53:52 +00:00
Bruce Momjian 148ea5cbea Add GUC temp_tablespaces to provide a default location for temporary
objects.

Jaime Casanova
2007-01-25 04:35:11 +00:00
Bruce Momjian 6441288ec9 Add 'output file' option for pg_dumpall, especially useful for Win32,
where output redirection of child processes (pg_dump) doesn't work.

Dave Page
2007-01-25 03:30:43 +00:00
Bruce Momjian ef65f6f7a4 Prevent WAL logging when COPY is done in the same transation that
created it.

Simon Riggs
2007-01-25 02:17:26 +00:00
Bruce Momjian 867c133599 Add comment about exception lists in both winnt.h and ntstatus.h. 2007-01-23 16:21:17 +00:00
Tom Lane a33cf1041f Add CREATE/ALTER/DROP OPERATOR FAMILY commands, also COMMENT ON OPERATOR
FAMILY; and add FAMILY option to CREATE OPERATOR CLASS to allow adding a
class to a pre-existing family.  Per previous discussion.  Man, what a
tedious lot of cutting and pasting ...
2007-01-23 05:07:18 +00:00
Bruce Momjian 882b9948d7 Back out use of FormatMessage(), does error values, not exception
values.  Point to /include/ntstatus.h for an exception list, rather than
a URL.
2007-01-23 03:28:49 +00:00
Bruce Momjian 610f60a092 Print meaningfull error text for abonormal process exit on Win32, rather
than hex codes, using FormatMessage().
2007-01-23 01:45:11 +00:00
Tom Lane 4f06c688c7 Put back planner's ability to cache the results of mergejoinscansel(),
which I had removed in the first cut of the EquivalenceClass rewrite to
simplify that patch a little.  But it's still important --- in a four-way
join problem mergejoinscansel() was eating about 40% of the planning time
according to gprof.  Also, improve the EquivalenceClass code to re-use
join RestrictInfos rather than generating fresh ones for each join
considered.  This saves some memory space but more importantly improves
the effectiveness of caching planning info in RestrictInfos.
2007-01-22 20:00:40 +00:00
Bruce Momjian d26a5f1fea Uppercase hex value. 2007-01-22 18:32:57 +00:00
Bruce Momjian 208ae0c290 When system() fails in Win32, report it as an exception, print the
exception value in hex, and give a URL where the value can be looked-up.
2007-01-22 18:31:51 +00:00
Tom Lane 5a7471c307 Add COST and ROWS options to CREATE/ALTER FUNCTION, plus underlying pg_proc
columns procost and prorows, to allow simple user adjustment of the estimated
cost of a function call, as well as control of the estimated number of rows
returned by a set-returning function.  We might eventually wish to extend this
to allow function-specific estimation routines, but there seems to be
consensus that we should try a simple constant estimate first.  In particular
this provides a relatively simple way to control the order in which different
WHERE clauses are applied in a plan node, which is a Good Thing in view of the
fact that the recent EquivalenceClass planner rewrite made that much less
predictable than before.
2007-01-22 01:35:23 +00:00
Tom Lane 066926dfbb Refactor some lsyscache routines to eliminate duplicate code and save
a couple of syscache lookups in make_pathkey_from_sortinfo().
2007-01-21 00:57:15 +00:00
Tom Lane fcf4b146c6 Simplify pg_am representation of ordering-capable access methods:
provide just a boolean 'amcanorder', instead of fields that specify the
sort operator strategy numbers.  We have decided to require ordering-capable
AMs to use btree-compatible strategy numbers, so the old fields are
overkill (and indeed misleading about what's allowed).
2007-01-20 23:13:01 +00:00
Neil Conway aef0f53b62 Make setseed() return void, rather than an int4 without any use. Per
pgsql-patches discussion of September 20, 2006. Bump the catversion.
2007-01-20 21:47:10 +00:00
Tom Lane f41803bb39 Refactor planner's pathkeys data structure to create a separate, explicit
representation of equivalence classes of variables.  This is an extensive
rewrite, but it brings a number of benefits:
* planner no longer fails in the presence of "incomplete" operator families
that don't offer operators for every possible combination of datatypes.
* avoid generating and then discarding redundant equality clauses.
* remove bogus assumption that derived equalities always use operators
named "=".
* mergejoins can work with a variety of sort orders (e.g., descending) now,
instead of tying each mergejoinable operator to exactly one sort order.
* better recognition of redundant sort columns.
* can make use of equalities appearing underneath an outer join.
2007-01-20 20:45:41 +00:00
Neil Conway 2b7334d487 Refactor the index AM API slightly: move currentItemData and
currentMarkData from IndexScanDesc to the opaque structs for the
AMs that need this information (currently gist and hash).

Patch from Heikki Linnakangas, fixes by Neil Conway.
2007-01-20 18:43:35 +00:00
Peter Eisentraut b4c8d49036 Fix xmlconcat by properly merging the XML declarations. Add aggregate
function xmlagg.
2007-01-20 09:27:20 +00:00
Peter Eisentraut 4b48ad4fb2 Add support for converting binary values (i.e. bytea) into xml values,
with new GUC parameter "xmlbinary" that controls the output encoding, as
per SQL/XML standard.
2007-01-19 16:58:46 +00:00
Alvaro Herrera 5b4a08896b Change the sed rules in the regression test for pg_regress hackery to create
the generated files, to help Visual C++ to run these tests.  The tests still
pass in VPATH and normal builds.

Patch from Magnus Hagander, editorialized by me.
2007-01-19 16:42:24 +00:00
Tom Lane eddbf39756 Extend yesterday's patch so that the bgwriter is also told to forget
pending fsyncs during DROP DATABASE.  Obviously necessary in hindsight :-(
2007-01-17 16:25:01 +00:00
Neil Conway cf57ef4e50 Implement width_bucket() for the float8 data type.
The implementation is somewhat ugly logic-wise, but I don't see an
easy way to make it more concise.

When writing this, I noticed that my previous implementation of
width_bucket() doesn't handle NaN correctly:

    postgres=# select width_bucket('NaN', 1, 5, 5);
     width_bucket
    --------------
                6
    (1 row)

AFAICS SQL:2003 does not define a NaN value, so it doesn't address how
width_bucket() should behave here. The patch changes width_bucket() so
that ereport(ERROR) is raised if NaN is specified for the operand or the
lower or upper bounds to width_bucket(). For float8, NaN is disallowed
for any of the floating-point inputs, and +/- infinity is disallowed
for the histogram bounds (but allowed for the operand).

Update docs and regression tests, bump the catversion.
2007-01-16 21:41:14 +00:00
Alvaro Herrera eb63cc3da8 Arrange for autovacuum to be killed when another operation wants to be alone
accessing it, like DROP DATABASE.  This allows the regression tests to pass
with autovacuum enabled, which open the gates for finally enabling autovacuum
by default.
2007-01-16 13:28:57 +00:00
Peter Eisentraut 2f8f76bcd5 Add support for xmlval IS DOCUMENT expression. 2007-01-14 13:11:54 +00:00
Peter Eisentraut 8b35795362 Use XML output escaping also in XMLFOREST. 2007-01-12 21:47:27 +00:00
Bruce Momjian a5ec2ec77a Allow Borland CC to compile libpq and psql.
Backpatch to 8.2.X.

L Bayuk
2007-01-11 02:42:31 +00:00
Bruce Momjian 945d0b4b09 Allow Borland CC to compile libpq and psql.
L Bayuk
2007-01-11 02:39:52 +00:00
Peter Eisentraut c0e977c18f Use libxml's xmlwriter API for producing XML elements, instead of doing
our own printing dance.  This does a better job of quoting and escaping the
values.
2007-01-10 20:33:54 +00:00
Tom Lane c4e7e675d8 Make sure BYTE_ORDER gets defined in 64-bit builds on Solaris,
per Stefan Kaltenbrunner.
2007-01-10 18:22:50 +00:00
Tom Lane a191a169d6 Change the planner-to-executor API so that the planner tells the executor
which comparison operators to use for plan nodes involving tuple comparison
(Agg, Group, Unique, SetOp).  Formerly the executor looked up the default
equality operator for the datatype, which was really pretty shaky, since it's
possible that the data being fed to the node is sorted according to some
nondefault operator class that could have an incompatible idea of equality.
The planner knows what it has sorted by and therefore can provide the right
equality operator to use.  Also, this change moves a couple of catalog lookups
out of the executor and into the planner, which should help startup time for
pre-planned queries by some small amount.  Modify the planner to remove some
other cavalier assumptions about always being able to use the default
operators.  Also add "nulls first/last" info to the Plan node for a mergejoin
--- neither the executor nor the planner can cope yet, but at least the API is
in place.
2007-01-10 18:06:05 +00:00
Bruce Momjian 40f797be03 Enable another five tuple status bits by using the high bits of the
nattr field, and rename the field.

Heikki Linnakangas
2007-01-09 22:01:00 +00:00
Bruce Momjian be8a431881 Add GUC log_temp_files to log the use of temporary files.
Bill Moran
2007-01-09 21:31:17 +00:00
Tom Lane 4431758229 Support ORDER BY ... NULLS FIRST/LAST, and add ASC/DESC/NULLS FIRST/NULLS LAST
per-column options for btree indexes.  The planner's support for this is still
pretty rudimentary; it does not yet know how to plan mergejoins with
nondefault ordering options.  The documentation is pretty rudimentary, too.
I'll work on improving that stuff later.

Note incompatible change from prior behavior: ORDER BY ... USING will now be
rejected if the operator is not a less-than or greater-than member of some
btree opclass.  This prevents less-than-sane behavior if an operator that
doesn't actually define a proper sort ordering is selected.
2007-01-09 02:14:16 +00:00
Peter Eisentraut d807c7ef3f Some fine-tuning of xmlpi in corner cases:
- correct error codes
- do syntax checks in correct order
- strip leading spaces of argument
2007-01-07 22:49:56 +00:00