Commit Graph

670 Commits

Author SHA1 Message Date
Teodor Sigaev
8a3631f8d8 GIN: Generalized Inverted iNdex.
text[], int4[], Tsearch2 support for GIN.
2006-05-02 11:28:56 +00:00
Bruce Momjian
e6004f0151 Add statement_timestamp(), clock_timestamp(), and
transaction_timestamp() (just like now()).

Also update statement_timeout() to mention it is statement arrival time
that is measured.

Catalog version updated.
2006-04-25 00:25:22 +00:00
Bruce Momjian
5bbea03f3b Suppress more compiler warnings caused by macro tests. 2006-04-24 22:24:58 +00:00
Bruce Momjian
7384e95b0c Add one more paren to macro. 2006-04-24 22:17:04 +00:00
Bruce Momjian
88fc941355 Suprress compiler warning in gcc 4.2.
Report by Kris Jurka
2006-04-24 22:06:32 +00:00
Tom Lane
defe93463c Make the world safe for full_page_writes. Allow XLOG records that try to
update no-longer-existing pages to fall through as no-ops, but make a note
of each page number referenced by such records.  If we don't see a later
XLOG entry dropping the table or truncating away the page, complain at
the end of XLOG replay.  Since this fixes the known failure mode for
full_page_writes = off, revert my previous band-aid patch that disabled
that GUC variable.
2006-04-14 20:27:24 +00:00
Tom Lane
49a7610c36 Fix an ancient oversight in btree xlog replay. When trying to determine if an
upper-level insertion completes a previously-seen split, we cannot simply grab
the downlink block number out of the buffer, because the buffer could contain
a later state of the page --- or perhaps the page doesn't even exist at all
any more, due to relation truncation.  These possibilities have been masked up
to now because the use of full_page_writes effectively ensured that no xlog
replay routine ever actually saw a page state newer than its own change.
Since we're deprecating full_page_writes in 8.1.*, there's no need to fix this
in existing release branches, but we need a fix in HEAD if we want to have any
hope of re-allowing full_page_writes.  Accordingly, adjust the contents of
btree WAL records so that we can always get the downlink block number from the
WAL record rather than having to depend on buffer contents.  Per report from
Kevin Grittner and Peter Brant.

Improve a few comments in related code while at it.
2006-04-13 03:53:05 +00:00
Tom Lane
09b5271ebd Add a field to the first page of each WAL file to indicate the
XLOG_BLCKSZ.  This ought to help in preventing configuration mismatch
problems if anyone tries to ship PITR files between servers compiled
with different XLOG_BLCKSZ settings.  Simon Riggs
2006-04-05 03:34:05 +00:00
Tom Lane
eaef111396 Define a separately configurable XLOG_BLCKSZ symbol for the page size
used within WAL files.  Historically this was the same as the data file
BLCKSZ, but there's no necessary connection, and it's possible that
performance gains might ensue from reducing XLOG_BLCKSZ.  In any case
distinguishing two symbols should improve code clarity.  This commit
does not actually change the page size, only provide the infrastructure
to make it possible to do so.  initdb forced because of addition of a
field to pg_control.
Mark Wong, with some help from Simon Riggs and Tom Lane.
2006-04-03 23:35:05 +00:00
Teodor Sigaev
8d02b15e33 Eliminate ajust scan code. Since concurrent GiST it doesn't
do real work. That was missed during concurrence development.
2006-04-03 13:44:33 +00:00
Tom Lane
89bda95d82 Remove the 'slow' path for btree index build, which built the btree
incrementally by successive inserts rather than by sorting the data.
We were only using the slow path during bootstrap, apparently because
when first written it failed during bootstrap --- but it works fine now
AFAICT.  Removing it saves a hundred or so lines of code and produces
noticeably (~10%) smaller initial states of the system catalog indexes.
While that won't make much difference for heavily-modified catalogs,
for the more static ones there may be a useful long-term performance
improvement.
2006-04-01 03:03:37 +00:00
Tom Lane
a8b8f4db23 Clean up WAL/buffer interactions as per my recent proposal. Get rid of the
misleadingly-named WriteBuffer routine, and instead require routines that
change buffer pages to call MarkBufferDirty (which does exactly what it says).
We also require that they do so before calling XLogInsert; this takes care of
the synchronization requirement documented in SyncOneBuffer.  Note that
because bufmgr takes the buffer content lock (in shared mode) while writing
out any buffer, it doesn't matter whether MarkBufferDirty is executed before
the buffer content change is complete, so long as the content change is
completed before releasing exclusive lock on the buffer.  So it's OK to set
the dirtybit before we fill in the LSN.
This eliminates the former kluge of needing to set the dirtybit in LockBuffer.
Aside from making the code more transparent, we can also add some new
debugging assertions, in particular that the caller of MarkBufferDirty must
hold the buffer content lock, not merely a pin.
2006-03-31 23:32:07 +00:00
Tom Lane
89395bfa6f Improve gist XLOG code to follow the coding rules needed to prevent
torn-page problems.  This introduces some issues of its own, mainly
that there are now some critical sections of unreasonably broad scope,
but it's a step forward anyway.  Further cleanup will require some
code refactoring that I'd prefer to get Oleg and Teodor involved in.
2006-03-30 23:03:10 +00:00
Tom Lane
6d61cdec07 Clean up and document the API for XLogOpenRelation and XLogReadBuffer.
This commit doesn't make much functional change, but it does eliminate some
duplicated code --- for instance, PageIsNew tests are now done inside
XLogReadBuffer rather than by each caller.
The GIST xlog code still needs a lot of love, but I'll worry about that
separately.
2006-03-29 21:17:39 +00:00
Tom Lane
0a20207060 Arrange to emit a description of the current XLOG record as error context
when an error occurs during xlog replay.  Also, replace the former risky
'write into a fixed-size buffer with no overflow detection' API for XLOG
record description routines; use an expansible StringInfo instead.  (The
latter accounts for most of the patch bulk.)

Qingqing Zhou
2006-03-24 04:32:13 +00:00
Tom Lane
2316013961 Clean up representation of function RTEs for functions returning RECORD.
The original coding stored the raw parser output (ColumnDef and TypeName
nodes) which was ugly, bulky, and wrong because it failed to create any
dependency on the referenced datatype --- and in fact would not track type
renamings and suchlike.  Instead store a list of column type OIDs in the
RTE.

Also fix up general failure of recordDependencyOnExpr to do anything sane
about recording dependencies on datatypes.  While there are many cases where
there will be an indirect dependency (eg if an operator returns a datatype,
the dependency on the operator is enough), we do have to record the datatype
as a separate dependency in examples like CoerceToDomain.

initdb forced because of change of stored rules.
2006-03-16 00:31:55 +00:00
Bruce Momjian
f2f5b05655 Update copyright for 2006. Update scripts. 2006-03-05 15:59:11 +00:00
Tom Lane
fd267c1ebc Skip ambulkdelete scan if there's nothing to delete and the index is not
partial.  None of the existing AMs do anything useful except counting
tuples when there's nothing to delete, and we can get a tuple count
from the heap as long as it's not a partial index.  (hash actually can
skip anyway because it maintains a tuple count in the index metapage.)
GIST is not currently able to exploit this optimization because, due to
failure to index NULLs, GIST is always effectively partial.  Possibly
we should fix that sometime.
Simon Riggs w/ some review by Tom Lane.
2006-02-11 23:31:34 +00:00
Bruce Momjian
77bb65d3fc Revert based on Tom's recommendation:
> Allow VACUUM to complete faster by avoiding scanning the indexes when no
> rows were removed from the heap by the VACUUM.
2006-02-11 17:14:09 +00:00
Bruce Momjian
bf324946b3 Allow VACUUM to complete faster by avoiding scanning the indexes when no
rows were removed from the heap by the VACUUM.

Simon Riggs
2006-02-11 16:59:09 +00:00
Tom Lane
5997386a0a Remove the no-longer-useful HashItem/HashItemData level of structure.
Same motivation as for BTItem.
2006-01-25 23:26:11 +00:00
Tom Lane
c389760c32 Remove the no-longer-useful BTItem/BTItemData level of structure, and
just refer to btree index entries as plain IndexTuples, which is what
they have been for a very long time.  This is mostly just an exercise
in removing extraneous notation, but it does save a palloc/pfree cycle
per index insertion.
2006-01-25 23:04:21 +00:00
Tom Lane
3a0a16cb7e Allow row comparisons to be used as indexscan qualifications.
This completes the project to upgrade our handling of row comparisons.
2006-01-25 20:29:24 +00:00
Tom Lane
7ccaf13a06 Instead of using a numberOfRequiredKeys count to distinguish required
and non-required keys in a btree index scan, mark the required scankeys
with private flag bits SK_BT_REQFWD and/or SK_BT_REQBKWD.  This seems
at least marginally clearer to me, and it eliminates a wired-into-the-
data-structure assumption that required keys are consecutive.  Even though
that assumption will remain true for the foreseeable future, having it
in there makes the code seem more complex than necessary.
2006-01-23 22:31:41 +00:00
Tom Lane
f7ea931287 Some minor code cleanup, falling out from the removal of rtree. SK_NEGATE
isn't being used anywhere anymore, and there seems no point in a generic
index_keytest() routine when two out of three remaining access methods
aren't using it.  Also, add a comment documenting a convention for
letting access methods define private flag bits in ScanKey sk_flags.
There are no such flags at the moment but I'm thinking about changing
btree's handling of "required keys" to use flag bits in the keys
rather than a count of required key positions.  Also, if some AM did
still want SK_NEGATE then it would be reasonable to treat it as a private
flag bit.
2006-01-14 22:03:35 +00:00
Tom Lane
cefcbbf1fd Push the responsibility for handling ignore_killed_tuples down into
_bt_checkkeys(), instead of checking it in the top-level nbtree.c routines
as formerly.  This saves a little bit of loop overhead, but more importantly
it lets us skip performing the index key comparisons for dead tuples.
2005-12-07 19:37:53 +00:00
Tom Lane
887a7c61f6 Get rid of slru.c's hardwired insistence on a fixed number of slots per
SLRU area.  The number of slots is still a compile-time constant (someday
we might want to change that), but at least it's a different constant for
each SLRU area.  Increase number of subtrans buffers to 32 based on
experimentation with a heavily subtrans-bashing test case, and increase
number of multixact member buffers to 16, since it's obviously silly for
it not to be at least twice the number of multixact offset buffers.
2005-12-06 23:08:34 +00:00
Tom Lane
a615acf555 Arrange for read-only accesses to SLRU page buffers to take only a shared
lock, not exclusive, if the desired page is already in memory.  This can
be demonstrated to be a significant win on the pg_subtrans cache when there
is a large window of open transactions.  It should be useful for pg_clog
as well.  I didn't try to make GetMultiXactIdMembers() use the code, as
that would have taken some restructuring, and what with the local cache
for multixact contents it probably wouldn't really make a difference.
Per my recent proposal.
2005-12-06 18:10:06 +00:00
Tom Lane
a98871b7ac Tweak indexscan machinery to avoid taking an AccessShareLock on an index
if we already have a stronger lock due to the index's table being the
update target table of the query.  Same optimization I applied earlier
at the table level.  There doesn't seem to be much interest in the more
radical idea of not locking indexes at all, so do what we can ...
2005-12-03 05:51:03 +00:00
Tom Lane
70f1482de3 Change seqscan logic so that we check visibility of all tuples on a page
when we first read the page, rather than checking them one at a time.
This allows us to take and release the buffer content lock just once
per page, instead of once per tuple.  Since it's a shared lock the
contention penalty for holding the lock longer shouldn't be too bad.
We can safely do this only when using an MVCC snapshot; else the
assumption that visibility won't change over time is uncool.  Therefore
there are now two code paths depending on the snapshot type.  I also
made the same change in nodeBitmapHeapscan.c, where it can be done always
because we only support MVCC snapshots for bitmap scans anyway.
Also make some incidental cleanups in the APIs of these functions.
Per a suggestion from Qingqing Zhou.
2005-11-26 03:03:07 +00:00
Bruce Momjian
436a2956d8 Re-run pgindent, fixing a problem where comment lines after a blank
comment line where output as too long, and update typedefs for /lib
directory.  Also fix case where identifiers were used as variable names
in the backend, but as typedefs in ecpg (favor the backend for
indenting).

Backpatch to 8.1.X.
2005-11-22 18:17:34 +00:00
Tom Lane
dd218ae7b0 Remove the t_datamcxt field of HeapTupleData. This was introduced for
the convenience of tuptoaster.c and is no longer needed, so may as well
get rid of some small amount of overhead.
2005-11-20 19:49:08 +00:00
Tom Lane
40314f2dac Modify tuptoaster's API so that it does not try to modify the passed
tuple in-place, but instead passes back an all-new tuple structure if
any changes are needed.  This is a much cleaner and more robust solution
for the bug discovered by Alexey Beschiokov; accordingly, revert the
quick hack I installed yesterday.
With this change, HeapTupleData.t_datamcxt is no longer needed; will
remove it in a separate commit in HEAD only.
2005-11-20 18:38:20 +00:00
Tom Lane
2a8d3d83ef R-tree is dead ... long live GiST. 2005-11-07 17:36:47 +00:00
Tom Lane
6236991143 Add simple sanity checks on newly-read pages to GiST, too. 2005-11-06 22:39:21 +00:00
Tom Lane
766dc45d9f Add defenses to btree and hash index AMs to do simple sanity checks
on every index page they read; in particular to catch the case of an
all-zero page, which PageHeaderIsValid allows to pass.  It turns out
hash already had this idea, but it was just Assert()ing things rather
than doing a straight error check, and the Asserts were partially
redundant with PageHeaderIsValid anyway.  Per recent failure example
from Jim Nasby.  (gist still needs the same treatment.)
2005-11-06 19:29:01 +00:00
Tom Lane
18691d8ee3 Clean up representation of SLRU page state. This is the cleaner fix
for the SLRU race condition that I posted a few days ago, but we decided
not to use in 8.1 and older branches.
2005-11-05 21:19:47 +00:00
Bruce Momjian
1dc3498251 Standard pgindent run for 8.1. 2005-10-15 02:49:52 +00:00
Alvaro Herrera
a84429a1aa Remove an unused typedef. 2005-10-07 14:55:36 +00:00
Tom Lane
35e9b1cc1e Clean up a couple of ad-hoc computations of the maximum number of tuples
on a page, as suggested by ITAGAKI Takahiro.  Also, change a few places
that were using some other estimates of max-items-per-page to consistently
use MaxOffsetNumber.  This is conservatively large --- we could have used
the new MaxHeapTuplesPerPage macro, or a similar one for index tuples ---
but those places are simply declaring a fixed-size buffer and assuming it
will work, rather than actively testing for overrun.  It seems safer to
size these buffers in a way that can't overflow even if the page is
corrupt.
2005-09-02 19:02:20 +00:00
Tom Lane
0007490e09 Convert the arithmetic for shared memory size calculation from 'int'
to 'Size' (that is, size_t), and install overflow detection checks in it.
This allows us to remove the former arbitrary restrictions on NBuffers
etc.  It won't make any difference in a 32-bit machine, but in a 64-bit
machine you could theoretically have terabytes of shared buffers.
(How efficiently we could manage 'em remains to be seen.)  Similarly,
num_temp_buffers, work_mem, and maintenance_work_mem can be set above
2Gb on a 64-bit machine.  Original patch from Koichi Suzuki, additional
work by moi.
2005-08-20 23:26:37 +00:00
Tatsuo Ishii
ba2fc7eb4b Make GetMultiXactIdMembers() a public function. 2005-08-20 01:29:27 +00:00
Tom Lane
f57e3f4cf3 Repair problems with VACUUM destroying t_ctid chains too soon, and with
insufficient paranoia in code that follows t_ctid links.  (We must do both
because even with VACUUM doing it properly, the intermediate state with
a dangling t_ctid link is visible concurrently during lazy VACUUM, and
could be seen afterwards if either type of VACUUM crashes partway through.)
Also try to improve documentation about what's going on.  Patch is a bit
bulky because passing the XMAX information around required changing the
APIs of some low-level heapam.c routines, but it's not conceptually very
complicated.  Per trouble report from Teodor and subsequent analysis.
This needs to be back-patched, but I'll do that after 8.1 beta is out.
2005-08-20 00:40:32 +00:00
Tom Lane
721e53785d Solve the problem of OID collisions by probing for duplicate OIDs
whenever we generate a new OID.  This prevents occasional duplicate-OID
errors that can otherwise occur once the OID counter has wrapped around.
Duplicate relfilenode values are also checked for when creating new
physical files.  Per my recent proposal.
2005-08-12 01:36:05 +00:00
Tom Lane
2a4fad1a0e Add NOWAIT option to SELECT FOR UPDATE/SHARE.
Original patch by Hans-Juergen Schoenig, revisions by Karel Zak
and Tom Lane.
2005-08-01 20:31:16 +00:00
Tom Lane
5d5f1a79e6 Clean up a number of autovacuum loose ends. Make the stats collector
track shared relations in a separate hashtable, so that operations done
from different databases are counted correctly.  Add proper support for
anti-XID-wraparound vacuuming, even in databases that are never connected
to and so have no stats entries.  Miscellaneous other bug fixes.
Alvaro Herrera, some additional fixes by Tom Lane.
2005-07-29 19:30:09 +00:00
Bruce Momjian
a923602855 Add pg_column_size() to return storage size of a column, including
possible compression.

Mark Kirkwood
2005-07-06 19:02:54 +00:00
Tom Lane
eb5949d190 Arrange for the postmaster (and standalone backends, initdb, etc) to
chdir into PGDATA and subsequently use relative paths instead of absolute
paths to access all files under PGDATA.  This seems to give a small
performance improvement, and it should make the system more robust
against naive DBAs doing things like moving a database directory that
has a live postmaster in it.  Per recent discussion.
2005-07-04 04:51:52 +00:00
Teodor Sigaev
898a7bd13b Bug fixes for GiST crash recovery.
- add forgotten check of lsn for insert completion
- remove level of pages: hard to check in recovery
- some cleanups
2005-06-30 17:52:14 +00:00
Tom Lane
b5f7cff84f Clean up the rather historically encumbered interface to now() and
current time: provide a GetCurrentTimestamp() function that returns
current time in the form of a TimestampTz, instead of separate time_t
and microseconds fields.  This is what all the callers really want
anyway, and it eliminates low-level dependencies on AbsoluteTime,
which is a deprecated datatype that will have to disappear eventually.
2005-06-29 22:51:57 +00:00
Tom Lane
7762619e95 Replace pg_shadow and pg_group by new role-capable catalogs pg_authid
and pg_auth_members.  There are still many loose ends to finish in this
patch (no documentation, no regression tests, no pg_dump support for
instance).  But I'm going to commit it now anyway so that Alvaro can
make some progress on shared dependencies.  The catalog changes should
be pretty much done.
2005-06-28 05:09:14 +00:00
Teodor Sigaev
e8cab5fe49 Concurrency for GiST
- full concurrency for insert/update/select/vacuum:
        - select and vacuum never locks more than one page simultaneously
        - select (gettuple) hasn't any lock across it's calls
        - insert never locks more than two page simultaneously:
                - during search of leaf to insert it locks only one page
                  simultaneously
                - while walk upward to the root it locked only parent (may be
                  non-direct parent) and child. One of them X-lock, another may
                  be S- or X-lock
- 'vacuum full' locks index
- improve gistgetmulti
- simplify XLOG records

Fix bug in index_beginscan_internal: LockRelation may clean
  rd_aminfo structure, so move GET_REL_PROCEDURE after LockRelation
2005-06-27 12:45:23 +00:00
Tom Lane
b90f8f20f0 Extend r-tree operator classes to handle Y-direction tests equivalent
to the existing X-direction tests.  An rtree class now includes 4 actual
2-D tests, 4 1-D X-direction tests, and 4 1-D Y-direction tests.
This involved adding four new Y-direction test operators for each of
box and polygon; I followed the PostGIS project's lead as to the names
of these operators.
NON BACKWARDS COMPATIBLE CHANGE: the poly_overleft (&<) and poly_overright
(&>) operators now have semantics comparable to box_overleft and box_overright.
This is necessary to make r-tree indexes work correctly on polygons.
Also, I changed circle_left and circle_right to agree with box_left and
box_right --- formerly they allowed the boundaries to touch.  This isn't
actually essential given the lack of any r-tree opclass for circles, but
it seems best to sync all the definitions while we are at it.
2005-06-24 20:53:34 +00:00
Tom Lane
9a09248edd Fix rtree and contrib/rtree_gist search behavior for the 1-D box and
polygon operators (<<, &<, >>, &>).  Per ideas originally put forward
by andrew@supernews and later rediscovered by moi.  This patch just
fixes the existing opclasses, and does not add any new behavior as I
proposed earlier; that can be sorted out later.  In principle this
could be back-patched, since it changes only search behavior and not
system catalog entries nor rtree index contents.  I'm not currently
planning to do that, though, since I think it could use more testing.
2005-06-24 00:18:52 +00:00
Tom Lane
b95ae32b41 Avoid WAL-logging individual tuple insertions during CREATE TABLE AS
(a/k/a SELECT INTO).  Instead, flush and fsync the whole relation before
committing.  We do still need the WAL log when PITR is active, however.
Simon Riggs and Tom Lane.
2005-06-20 18:37:02 +00:00
Teodor Sigaev
1bfdd1a893 fix founded hole in recovery after crash, add vacuum_delay_point() 2005-06-20 15:22:38 +00:00
Teodor Sigaev
d544ec8bbd 1. full functional WAL for GiST
2. improve vacuum for gist
   - use FSM
   - full vacuum:
      - reforms parent tuple if it's needed
        ( tuples was deleted on child page or parent tuple remains invalid
          after crash recovery )
      - truncate index file if possible
3. fixes bugs and mistakes
2005-06-20 10:29:37 +00:00
Tom Lane
e26b0abda3 Arrange to fsync two-phase-commit state files only during checkpoints;
given reasonably short lifespans for prepared transactions, this should
mean that only a small minority of state files ever need to be fsynced
at all.  Per discussion with Heikki Linnakangas.
2005-06-19 20:00:39 +00:00
Tom Lane
a8d1075f27 Add a time-of-preparation column to the pg_prepared_xacts view, per an
old suggestion by Oliver Jowett.  Also, add a transaction column to the
pg_locks view to show the xid of each transaction holding or awaiting
locks; this allows prepared transactions to be properly associated with
the locks they own.  There was already a column named 'transaction',
and I chose to rename it to 'transactionid' --- since this column is
new in the current devel cycle there should be no backwards compatibility
issue to worry about.
2005-06-18 19:33:42 +00:00
Tom Lane
d0a89683a3 Two-phase commit. Original patch by Heikki Linnakangas, with additional
hacking by Alvaro Herrera and Tom Lane.
2005-06-17 22:32:51 +00:00
Teodor Sigaev
37c839365c WAL for GiST. It work for online backup and so on, but on
recovery after crash (power loss etc) it may say that it can't restore
index and index should be reindexed.

Some refactoring code.
2005-06-14 11:45:14 +00:00
Tom Lane
c186c93148 Change the planner to allow indexscan qualification clauses to use
nonconsecutive columns of a multicolumn index, as per discussion around
mid-May (pghackers thread "Best way to scan on-disk bitmaps").  This
turns out to require only minimal changes in btree, and so far as I can
see none at all in GiST.  btcostestimate did need some work, but its
original assumption that index selectivity == heap selectivity was
quite bogus even before this.
2005-06-13 23:14:49 +00:00
Tom Lane
f5b2f60bd1 Change WAL-logging scheme for multixacts to be more like regular
transaction IDs, rather than like subtrans; in particular, the information
now survives a database restart.  Per previous discussion, this is
essential for PITR log shipping and for 2PC.
2005-06-08 15:50:28 +00:00
Tom Lane
ee7ac7b11e Modify XLogInsert API to make callers specify whether pages to be backed
up have the standard layout with unused space between pd_lower and pd_upper.
When this is set, XLogInsert will omit the unused space without bothering
to scan it to see if it's zero.  That saves time in XLogInsert, and also
allows reversion of my earlier patch to make PageRepairFragmentation et al
explicitly re-zero freed space.  Per suggestion by Heikki Linnakangas.
2005-06-06 20:22:58 +00:00
Tom Lane
4c8495a1f2 Remove the mostly-stubbed-out-anyway support routines for WAL UNDO.
That code is never going to be used in the foreseeable future, and
where it's more than a stub it's making the redo routines harder to
read.
2005-06-06 17:01:25 +00:00
Tom Lane
21fda22ec4 Change CRCs in WAL records from 64bit to 32bit for performance reasons.
Instead of a separate CRC on each backup block, include backup blocks
in their parent WAL record's CRC; this is important to ensure that the
backup block really goes with the WAL record, ie there was not a page
tear right at the start of the backup block.  Implement a simple form
of compression of backup blocks: drop any run of zeroes starting at
pd_lower, so as not to store the unused 'hole' that commonly exists in
PG heap and index pages.  Tweak PageRepairFragmentation and related
routines to ensure they keep the unused space zeroed, so that the above
compression method remains effective.  All per recent discussions.
2005-06-02 05:55:29 +00:00
Tom Lane
32e8fc4a28 Arrange to cache fmgr lookup information for an index's access method
routines in the index's relcache entry, instead of doing a fresh fmgr_info
on every index access.  We were already doing this for the index's opclass
support functions; not sure why we didn't think to do it for the AM
functions too.  This supersedes the former method of caching (only)
amgettuple in indexscan scan descriptors; it's an improvement because the
function lookup can be amortized across multiple statements instead of
being repeated for each statement.  Even though lookup for builtin
functions is pretty cheap, this seems to drop a percent or two off some
simple benchmarks.
2005-05-27 23:31:21 +00:00
Bruce Momjian
b492c3accc Add parentheses to macros when args are used in computations. Without
them, the executation behavior could be unexpected.
2005-05-25 21:40:43 +00:00
Bruce Momjian
6dc7760ac3 Add support for wal_fsync_writethrough for Darwin, and restructure the
code to better handle writethrough.

Chris Campbell
2005-05-20 14:53:26 +00:00
Neil Conway
c891e05f26 Cleanup GiST header files. Since GiST extensions are often written as
external projects, we should be careful about what parts of the GiST
API are considered implementation details, and which are part of the
public API. Therefore, I've moved internal-only declarations into
gist_private.h -- future backward-incompatible changes to gist.h should
be made with care, to avoid needlessly breaking external GiST extensions.

Also did some related header cleanup: remove some unnecessary #includes
from gist.h, and remove some unused definitions: isAttByVal(), _gistdump(),
and GISTNStrategies.
2005-05-17 03:34:18 +00:00
Neil Conway
eda6dd32d1 GiST improvements:
- make sure we always invoke user-supplied GiST methods in a short-lived
  memory context. This means the backend isn't exposed to any memory leaks
  that be in those methods (in fact, it is probably a net loss for most
  GiST methods to bother manually freeing memory now). This also means
  we can do away with a lot of ugly manual memory management in the
  GiST code itself.

- keep the current page of a GiST index scan pinned, rather than doing a
  ReadBuffer() for each tuple produced by the scan. Since ReadBuffer() is
  expensive, this is a perf. win

- implement dead tuple killing for GiST indexes (which is easy to do, now
  that we keep a pin on the current scan page). Now all the builtin indexes
  implement dead tuple killing.

- cleanup a lot of ugly code in GiST
2005-05-17 00:59:30 +00:00
Tom Lane
278bd0cc22 For some reason access/tupmacs.h has been #including utils/memutils.h,
which is neither needed by nor related to that header.  Remove the bogus
inclusion and instead include the header in those C files that actually
need it.  Also fix unnecessary inclusions and bad inclusion order in
tsearch2 files.
2005-05-06 17:24:55 +00:00
Tom Lane
126eaef651 Clean up MultiXactIdExpand's API by separating out the case where we
are creating a new MultiXactId from two regular XIDs.  The original
coding was unnecessarily complicated and didn't save any code anyway.
2005-05-03 19:42:41 +00:00
Tom Lane
bedb78d386 Implement sharable row-level locks, and use them for foreign key references
to eliminate unnecessary deadlocks.  This commit adds SELECT ... FOR SHARE
paralleling SELECT ... FOR UPDATE.  The implementation uses a new SLRU
data structure (managed much like pg_subtrans) to represent multiple-
transaction-ID sets.  When more than one transaction is holding a shared
lock on a particular row, we create a MultiXactId representing that set
of transactions and store its ID in the row's XMAX.  This scheme allows
an effectively unlimited number of row locks, just as we did before,
while not costing any extra overhead except when a shared lock actually
has to be shared.   Still TODO: use the regular lock manager to control
the grant order when multiple backends are waiting for a row lock.

Alvaro Herrera and Tom Lane.
2005-04-28 21:47:18 +00:00
Tom Lane
162bd08b3f Completion of project to use fixed OIDs for all system catalogs and
indexes.  Replace all heap_openr and index_openr calls by heap_open
and index_open.  Remove runtime lookups of catalog OID numbers in
various places.  Remove relcache's support for looking up system
catalogs by name.  Bulky but mostly very boring patch ...
2005-04-14 20:03:27 +00:00
Tom Lane
2193a856a2 Simplify initdb-time assignment of OIDs as I proposed yesterday, and
avoid encroaching on the 'user' range of OIDs by allowing automatic
OID assignment to use values below 16k until we reach normal operation.

initdb not forced since this doesn't make any incompatible change;
however a lot of stuff will have different OIDs after your next initdb.
2005-04-13 18:54:57 +00:00
Tom Lane
119191609c Remove dead push/pop rollback code. Vadim once planned to implement
transaction rollback via UNDO but I think that's highly unlikely to
happen, so we may as well remove the stubs.  (Someday we ought to
rip out the stub xxx_undo routines, too.)  Per Alvaro.
2005-03-28 01:50:34 +00:00
Tom Lane
bf3dbb5881 First steps towards index scans with heap access decoupled from index
access: define new index access method functions 'amgetmulti' that can
fetch multiple TIDs per call.  (The functions exist but are totally
untested as yet.)  Since I was modifying pg_am anyway, remove the
no-longer-needed 'rel' parameter from amcostestimate functions, and
also remove the vestigial amowner column that was creating useless
work for Alvaro's shared-object-dependencies project.
Initdb forced due to changes in pg_am.
2005-03-27 23:53:05 +00:00
Tom Lane
617dd33b6e Eliminate duplicate hasnulls bit testing in index tuple access, and
clean up itup.h a little bit.
2005-03-27 18:38:27 +00:00
Tom Lane
ee4ddac137 Convert index-related tuple handling routines from char 'n'/' ' to bool
convention for isnull flags.  Also, remove the useless InsertIndexResult
return struct from index AM aminsert calls --- there is no reason for
the caller to know where in the index the tuple was inserted, and we
were wasting a palloc cycle per insert to deliver this uninteresting
value (plus nontrivial complexity in some AMs).
I forced initdb because of the change in the signature of the aminsert
routines, even though nothing really looks at those pg_proc entries...
2005-03-21 01:24:04 +00:00
Neil Conway
fe7015f5e8 Change the return value of HeapTupleSatisfiesUpdate() to be an enum,
rather than an integer, and fix the associated fallout. From Alvaro
Herrera.
2005-03-20 23:40:34 +00:00
Tom Lane
f97aebd162 Revise TupleTableSlot code to avoid unnecessary construction and disassembly
of tuples when passing data up through multiple plan nodes.  A slot can now
hold either a normal "physical" HeapTuple, or a "virtual" tuple consisting
of Datum/isnull arrays.  Upper plan levels can usually just copy the Datum
arrays, avoiding heap_formtuple() and possible subsequent nocachegetattr()
calls to extract the data again.  This work extends Atsushi Ogawa's earlier
patch, which provided the key idea of adding Datum arrays to TupleTableSlots.
(I believe however that something like this was foreseen way back in Berkeley
days --- see the old comment on ExecProject.)  A test case involving many
levels of join of fairly wide tables (about 80 columns altogether) showed
about 3x overall speedup, though simple queries will probably not be
helped very much.

I have also duplicated some code in heaptuple.c in order to provide versions
of heap_formtuple and friends that use "bool" arrays to indicate null
attributes, instead of the old convention of "char" arrays containing either
'n' or ' '.  This provides a better match to the convention used by
ExecEvalExpr.  While I have not made a concerted effort to get rid of uses
of the old routines, I think they should be deprecated and eventually removed.
2005-03-16 21:38:10 +00:00
Tom Lane
a9b05bdc83 Avoid O(N^2) overhead in repeated nocachegetattr calls when columns of
a tuple are being accessed via ExecEvalVar and the attcacheoff shortcut
isn't usable (due to nulls and/or varlena columns).  To do this, cache
Datums extracted from a tuple in the associated TupleTableSlot.
Also some code cleanup in and around the TupleTable handling.
Atsushi Ogawa with some kibitzing by Tom Lane.
2005-03-14 04:41:13 +00:00
Tom Lane
a52b4fb131 Adjust creation/destruction of TupleDesc data structure to reduce the
number of palloc calls.  This has a salutory impact on plpgsql operations
with record variables (which create and destroy tupdescs constantly)
and probably helps a bit in some other cases too.
2005-03-07 04:42:17 +00:00
Tom Lane
4aefe75553 Remove some no-longer-needed kluges for bootstrapping, in particular
the AMI_OVERRIDE flag.  The fact that TransactionLogFetch treats
BootstrapTransactionId as always committed is sufficient to make
bootstrap work, and getting rid of extra tests in heavily used code
paths seems like a win.  The files produced by initdb are demonstrably
the same after this change.
2005-02-20 21:46:50 +00:00
Tom Lane
60b2444cc3 Add code to prevent transaction ID wraparound by enforcing a safe limit
in GetNewTransactionId().  Since the limit value has to be computed
before we run any real transactions, this requires adding code to database
startup to scan pg_database and determine the oldest datfrozenxid.
This can conveniently be combined with the first stage of an attack on
the problem that the 'flat file' copies of pg_shadow and pg_group are
not properly updated during WAL recovery.  The code I've added to
startup resides in a new file src/backend/utils/init/flatfiles.c, and
it is responsible for rewriting the flat files as well as initializing
the XID wraparound limit value.  This will eventually allow us to get
rid of GetRawDatabaseInfo too, but we'll need an initdb so we can add
a trigger to pg_database.
2005-02-20 02:22:07 +00:00
Neil Conway
a885ecd6ef Change heap_modifytuple() to require a TupleDesc rather than a
Relation. Patch from Alvaro Herrera, minor editorializing by
Neil Conway.
2005-01-27 23:24:11 +00:00
Neil Conway
b4297c177c This patch makes some improvements to the rtree index implementation:
(1) Keep a pin on the scan's current buffer and mark buffer. This
avoids the need to do a ReadBuffer() for each tuple produced by the
scan. Since ReadBuffer() is expensive, this is a significant win.

(2) Convert a ReleaseBuffer(); ReadBuffer() pair into
ReleaseAndReadBuffer(). Surely not a huge win, but it saves a lock
acquire/release...

(3) Remove a bunch of duplicated code in rtget.c; make rtnext() handle
both the "initial result" and "subsequent result" cases.

(4) Add support for index tuple killing

(5) Remove rtscancache(): it is dead code, for the same reason that
gistscancache() is dead code (an index scan ought not be invoked with
NoMovementScanDirection).

The end result is about a 10% improvement in rtree index scan perf,
according to contrib/rtree_gist/bench.
2005-01-18 23:25:55 +00:00
Bruce Momjian
2daed8c5b3 Update copyrights that were missed. 2005-01-01 05:43:09 +00:00
PostgreSQL Daemon
2ff501590b Tag appropriate files for rc3
Also performed an initial run through of upgrading our Copyright date to
extend to 2005 ... first run here was very simple ... change everything
where: grep 1996-2004 && the word 'Copyright' ... scanned through the
generated list with 'less' first, and after, to make sure that I only
picked up the right entries ...
2004-12-31 22:04:05 +00:00
Tom Lane
5374d097de Change planner to use the current true disk file size as its estimate of
a relation's number of blocks, rather than the possibly-obsolete value
in pg_class.relpages.  Scale the value in pg_class.reltuples correspondingly
to arrive at a hopefully more accurate number of rows.  When pg_class
contains 0/0, estimate a tuple width from the column datatypes and divide
that into current file size to estimate number of rows.  This improved
methodology allows us to jettison the ancient hacks that put bogus default
values into pg_class when a table is first created.  Also, per a suggestion
from Simon, make VACUUM (but not VACUUM FULL or ANALYZE) adjust the value
it puts into pg_class.reltuples to try to represent the mean tuple density
instead of the minimal density that actually prevails just after VACUUM.
These changes alter the plans selected for certain regression tests, so
update the expected files accordingly.  (I removed join_1.out because
it's not clear if it still applies; we can add back any variant versions
as they are shown to be needed.)
2004-12-01 19:00:56 +00:00
Tom Lane
c7866f6645 Fix obsolete comments. 2004-11-12 20:08:40 +00:00
Tom Lane
9ffc8ed58b Repair possible failure to update hint bits back to disk, per
http://archives.postgresql.org/pgsql-hackers/2004-10/msg00464.php.
This fix is intended to be permanent: it moves the responsibility for
calling SetBufferCommitInfoNeedsSave() into the tqual.c routines,
eliminating the requirement for callers to test whether t_infomask changed.
Also, tighten validity checking on buffer IDs in bufmgr.c --- several
routines were paranoid about out-of-range shared buffer numbers but not
about out-of-range local ones, which seems a tad pointless.
2004-10-15 22:40:29 +00:00
Tom Lane
8f9f198603 Restructure subtransaction handling to reduce resource consumption,
as per recent discussions.  Invent SubTransactionIds that are managed like
CommandIds (ie, counter is reset at start of each top transaction), and
use these instead of TransactionIds to keep track of subtransaction status
in those modules that need it.  This means that a subtransaction does not
need an XID unless it actually inserts/modifies rows in the database.
Accordingly, don't assign it an XID nor take a lock on the XID until it
tries to do that.  This saves a lot of overhead for subtransactions that
are only used for error recovery (eg plpgsql exceptions).  Also, arrange
to release a subtransaction's XID lock as soon as the subtransaction
exits, in both the commit and abort cases.  This avoids holding many
unique locks after a long series of subtransactions.  The price is some
additional overhead in XactLockTableWait, but that seems acceptable.
Finally, restructure the state machine in xact.c to have a more orthogonal
set of states for subtransactions.
2004-09-16 16:58:44 +00:00
Tom Lane
7488929c96 Simplify IsXactIsoLevelSerializable test. A cycle saved is a cycle
earned ...
2004-09-05 23:01:26 +00:00
Tom Lane
50742aed68 Add WAL logging for CREATE/DROP DATABASE and CREATE/DROP TABLESPACE.
Fix TablespaceCreateDbspace() to be able to create a dummy directory
in place of a dropped tablespace's symlink.  This eliminates the open
problem of a PANIC during WAL replay when a replayed action attempts
to touch a file in a since-deleted tablespace.  It also makes for a
significant improvement in the usability of PITR replay.
2004-08-29 21:08:48 +00:00
Tom Lane
0ffe11abd3 Widen xl_len field of XLogRecord header to 32 bits, so that we'll have
a more tolerable limit on the number of subtransactions or deleted files
in COMMIT and ABORT records.  Buy back the extra space by eliminating the
xl_xact_prev field, which isn't being used for anything and is rather
unlikely ever to be used for anything.
This does not force initdb, but you do need to do pg_resetxlog if you
want to upgrade an existing 8.0 installation without initdb.
2004-08-29 16:34:48 +00:00
Bruce Momjian
b6b71b85bc Pgindent run for 8.0. 2004-08-29 05:07:03 +00:00
Bruce Momjian
da9a8649d8 Update copyright to 2004. 2004-08-29 04:13:13 +00:00
Tom Lane
4dbb880d3c Rearrange pg_subtrans handling as per recent discussion. pg_subtrans
updates are no longer WAL-logged nor even fsync'd; we do not need to,
since after a crash no old pg_subtrans data is needed again.  We truncate
pg_subtrans to RecentGlobalXmin at each checkpoint.  slru.c's API is
refactored a little bit to separate out the necessary decisions.
2004-08-23 23:22:45 +00:00
Tom Lane
f009c316ba Tweak code so that pg_subtrans is never consulted for XIDs older than
RecentXmin (== MyProc->xmin).  This ensures that it will be safe to
truncate pg_subtrans at RecentGlobalXmin, which should largely eliminate
any fear of bloat.  Along the way, eliminate SubTransXidsHaveCommonAncestor,
which isn't really needed and could not give a trustworthy result anyway
under the lookback restriction.
In an unrelated but nearby change, #ifdef out GetUndoRecPtr, which has
been dead code since 2001 and seems unlikely to ever be resurrected.
2004-08-22 02:41:58 +00:00
Tom Lane
58c41712d5 Add functions pg_start_backup, pg_stop_backup to create backup label
and history files as per recent discussion.  While at it, remove
pg_terminate_backend, since we have decided we do not have time during
this release cycle to address the reliability concerns it creates.
Split the 'Miscellaneous Functions' documentation section into
'System Information Functions' and 'System Administration Functions',
which hopefully will draw the eyes of those looking for such things.
2004-08-03 20:32:36 +00:00
Tom Lane
efcaf1e868 Some mop-up work for savepoints (nested transactions). Store a small
number of active subtransaction XIDs in each backend's PGPROC entry,
and use this to avoid expensive probes into pg_subtrans during
TransactionIdIsInProgress.  Extend EOXactCallback API to allow add-on
modules to get control at subxact start/end.  (This is deliberately
not compatible with the former API, since any uses of that API probably
need manual review anyway.)  Add basic reference documentation for
SAVEPOINT and related commands.  Minor other cleanups to check off some
of the open issues for subtransactions.
Alvaro Herrera and Tom Lane.
2004-08-01 17:32:22 +00:00
Tom Lane
beda4814c1 plpgsql does exceptions.
There are still some things that need refinement; in particular I fear
that the recognized set of error condition names probably has little in
common with what Oracle recognizes.  But it's a start.
2004-07-31 07:39:21 +00:00
Tom Lane
cc813fc2b8 Replace nested-BEGIN syntax for subtransactions with spec-compliant
SAVEPOINT/RELEASE/ROLLBACK-TO syntax.  (Alvaro)
Cause COMMIT of a failed transaction to report ROLLBACK instead of
COMMIT in its command tag.  (Tom)
Fix a few loose ends in the nested-transactions stuff.
2004-07-27 05:11:48 +00:00
Tom Lane
2042b3428d Invent WAL timelines, as per recent discussion, to make point-in-time
recovery more manageable.  Also, undo recent change to add FILE_HEADER
and WASTED_SPACE records to XLOG; instead make the XLOG page header
variable-size with extra fields in the first page of an XLOG file.
This should fix the boundary-case bugs observed by Mark Kirkwood.
initdb forced due to change of XLOG representation.
2004-07-21 22:31:26 +00:00
Tom Lane
66ec2db728 XLOG file archiving and point-in-time recovery. There are still some
loose ends and a glaring lack of documentation, but it basically works.

Simon Riggs with some editorialization by Tom Lane.
2004-07-19 02:47:16 +00:00
Tom Lane
fe548629c5 Invent ResourceOwner mechanism as per my recent proposal, and use it to
keep track of portal-related resources separately from transaction-related
resources.  This allows cursors to work in a somewhat sane fashion with
nested transactions.  For now, cursor behavior is non-subtransactional,
that is a cursor's state does not roll back if you abort a subtransaction
that fetched from the cursor.  We might want to change that later.
2004-07-17 03:32:14 +00:00
Tom Lane
94d4d240bb Rename XLOG_BTREE_NEWPAGE xlog record type into XLOG_HEAP_NEWPAGE, and
shift support code into heapam.c accordingly.  This is in service of
soon-to-be-committed ALTER TABLE SET TABLESPACE code that will want to
use this same record type for both heaps and indexes.

Theoretically I should have forced initdb for this, but in practice there
is no change in xlog contents because CVS tip will never really emit this
record type anyhow...
2004-07-11 18:01:45 +00:00
Tom Lane
573a71a5da Nested transactions. There is still much left to do, especially on the
performance front, but with feature freeze upon us I think it's time to
drive a stake in the ground and say that this will be in 7.5.

Alvaro Herrera, with some help from Tom Lane.
2004-07-01 00:52:04 +00:00
Tom Lane
ae93e5fd6e Make the world very nearly safe for composite-type columns in tables.
1. Solve the problem of not having TOAST references hiding inside composite
values by establishing the rule that toasting only goes one level deep:
a tuple can contain toasted fields, but a composite-type datum that is
to be inserted into a tuple cannot.  Enforcing this in heap_formtuple
is relatively cheap and it avoids a large increase in the cost of running
the tuptoaster during final storage of a row.
2. Fix some interesting problems in expansion of inherited queries that
reference whole-row variables.  We never really did this correctly before,
but it's now relatively painless to solve by expanding the parent's
whole-row Var into a RowExpr() selecting the proper columns from the
child.
If you dike out the preventive check in CheckAttributeType(),
composite-type columns now seem to actually work.  However, we surely
cannot ship them like this --- without I/O for composite types, you
can't get pg_dump to dump tables containing them.  So a little more
work still to do.
2004-06-05 01:55:05 +00:00
Tom Lane
8f2ea8b7b5 Resurrect heap_deformtuple(), this time implemented as a singly nested
loop over the fields instead of a loop around heap_getattr.  This is
considerably faster (O(N) instead of O(N^2)) when there are nulls or
varlena fields, since those prevent use of attcacheoff.  Replace loops
over heap_getattr with heap_deformtuple in situations where all or most
of the fields have to be fetched, such as printtup and tuptoaster.
Profiling done more than a year ago shows that this should be a nice
win for situations involving many-column tables.
2004-06-04 20:35:21 +00:00
Tom Lane
2095206de1 Adjust btree index build to not use shared buffers, thereby avoiding the
locking conflict against concurrent CHECKPOINT that was discussed a few
weeks ago.  Also, if not using WAL archiving (which is always true ATM
but won't be if PITR makes it into this release), there's no need to
WAL-log the index build process; it's sufficient to force-fsync the
completed index before commit.  This seems to gain about a factor of 2
in my tests, which is consistent with writing half as much data.  I did
not try it with WAL on a separate drive though --- probably the gain would
be a lot less in that scenario.
2004-06-02 17:28:18 +00:00
Tom Lane
9b178555fc Per previous discussions, get rid of use of sync(2) in favor of
explicitly fsync'ing every (non-temp) file we have written since the
last checkpoint.  In the vast majority of cases, the burden of the
fsyncs should fall on the bgwriter process not on backends.  (To this
end, we assume that an fsync issued by the bgwriter will force out
blocks written to the same file by other processes using other file
descriptors.  Anyone have a problem with that?)  This makes the world
safe for WIN32, which ain't even got sync(2), and really makes the world
safe for Unixen as well, because sync(2) never had the semantics we need:
it offers no way to wait for the requested I/O to finish.

Along the way, fix a bug I recently introduced in xlog recovery:
file truncation replay failed to clear bufmgr buffers for the dropped
blocks, which could result in 'PANIC:  heap_delete_redo: no block'
later on in xlog replay.
2004-05-31 03:48:10 +00:00
Tom Lane
076a055acf Separate out bgwriter code into a logically separate module, rather
than being random pieces of other files.  Give bgwriter responsibility
for all checkpoint activity (other than a post-recovery checkpoint);
so this child process absorbs the functionality of the former transient
checkpoint and shutdown subprocesses.  While at it, create an actual
include file for postmaster.c, which for some reason never had its own
file before.
2004-05-29 22:48:23 +00:00
Tom Lane
1a321f26d8 Code review for EXEC_BACKEND changes. Reduce the number of #ifdefs by
about a third, make it work on non-Windows platforms again.  (But perhaps
I broke the WIN32 code, since I have no way to test that.)  Fold all the
paths that fork postmaster child processes to go through the single
routine SubPostmasterMain, which takes care of resurrecting the state that
would normally be inherited from the postmaster (including GUC variables).
Clean up some places where there's no particularly good reason for the
EXEC and non-EXEC cases to work differently.  Take care of one or two
FIXMEs that remained in the code.
2004-05-28 05:13:32 +00:00
Tom Lane
16974ee910 Get rid of the former rather baroque mechanism for propagating the values
of ThisStartUpID and RedoRecPtr into new backends.  It's a lot easier just
to make them all grab the values out of shared memory during startup.
This helps to decouple the postmaster from checkpoint execution, which I
need since I'm intending to let the bgwriter do it instead, and it also
fixes a bug in the Win32 port: ThisStartUpID wasn't getting propagated at
all AFAICS.  (Doesn't give me a lot of faith in the amount of testing that
port has gotten.)
2004-05-27 17:12:57 +00:00
Tom Lane
4d86ae4260 For multi-table ANALYZE, use per-table transactions when possible
(ie, when not inside a transaction block), so that we can avoid holding
locks longer than necessary.  Per trouble report from Philip Warner.
2004-05-22 23:14:38 +00:00
Tom Lane
4af3421161 Get rid of rd_nblocks field in relcache entries. Turns out this was
costing us lots more to maintain than it was worth.  On shared tables
it was of exactly zero benefit because we couldn't trust it to be
up to date.  On temp tables it sometimes saved an lseek, but not often
enough to be worth getting excited about.  And the real problem was that
we forced an lseek on every relcache flush in order to update the field.
So all in all it seems best to lose the complexity.
2004-05-08 19:09:25 +00:00
Tom Lane
37fa3b6c89 Tweak indexscan and seqscan code to arrange that steps from one page to
the next are handled by ReleaseAndReadBuffer rather than separate
ReleaseBuffer and ReadBuffer calls.  This cuts the number of acquisitions
of the BufMgrLock by a factor of 2 (possibly more, if an indexscan happens
to pull successive rows from the same heap page).  Unfortunately this
doesn't seem enough to get us out of the recently discussed context-switch
storm problem, but it's surely worth doing anyway.
2004-04-21 18:24:26 +00:00
Bruce Momjian
823ac7c2b4 This is a cleanup patch for access/transam/xact.c. It only removes some
#ifdef NOT_USED code, and adds a new TBLOCK state which signals the fact
that StartTransaction() has been executed.

Alvaro Herrera
2004-04-05 03:11:39 +00:00
Tom Lane
375369acd1 Replace TupleTableSlot convention for whole-row variables and function
results with tuples as ordinary varlena Datums.  This commit does not
in itself do much for us, except eliminate the horrid memory leak
associated with evaluation of whole-row variables.  However, it lays the
groundwork for allowing composite types as table columns, and perhaps
some other useful features as well.  Per my proposal of a few days ago.
2004-04-01 21:28:47 +00:00
Teodor Sigaev
f2c064afcb Cleanup vectors of GISTENTRY and eliminate problem with 64-bit strict-aligned
boxes. Change interface to user-defined GiST support methods union and
picksplit. Now instead of bytea struct it used special GistEntryVector
structure.
2004-03-30 15:45:33 +00:00
Tatsuo Ishii
0b86ade1c2 Add NOWAIT option to LOCK command 2004-03-11 01:47:41 +00:00
Tom Lane
c3c09be34b Commit the reasonably uncontroversial parts of J.R. Nield's PITR patch, to
wit: Add a header record to each WAL segment file so that it can be reliably
identified.  Avoid splitting WAL records across segment files (this is not
strictly necessary, but makes it simpler to incorporate the header records).
Make WAL entries for file creation, deletion, and truncation (as foreseen but
never implemented by Vadim).  Also, add support for making XLOG_SEG_SIZE
configurable at compile time, similarly to BLCKSZ.  Fix a couple bugs I
introduced in WAL replay during recent smgr API changes.  initdb is forced
due to changes in pg_control contents.
2004-02-11 22:55:26 +00:00
Tom Lane
391c3811a2 Rename SortMem and VacuumMem to work_mem and maintenance_work_mem.
Make btree index creation and initial validation of foreign-key constraints
use maintenance_work_mem rather than work_mem as their memory limit.
Add some code to guc.c to allow these variables to be referenced by their
old names in SHOW and SET commands, for backwards compatibility.
2004-02-03 17:34:04 +00:00
Bruce Momjian
f4921e5ca3 Attached is a patch that fixes some trivial typos and alignment. Please
apply.

Alvaro Herrera
2004-01-26 22:51:56 +00:00
Tom Lane
9bd681a522 Repair problem identified by Olivier Prenant: ALTER DATABASE SET search_path
should not be too eager to reject paths involving unknown schemas, since
it can't really tell whether the schemas exist in the target database.
(Also, when reading pg_dumpall output, it could be that the schemas
don't exist yet, but eventually will.)  ALTER USER SET has a similar issue.
So, reduce the normal ERROR to a NOTICE when checking search_path values
for these commands.  Supporting this requires changing the API for GUC
assign_hook functions, which causes the patch to touch a lot of places,
but the changes are conceptually trivial.
2004-01-19 19:04:40 +00:00
Tom Lane
0966516b75 Tighten short-circuit tests for deciding whether we need to invoke
tuptoaster.c --- fields that are compressed in-line are not a reason
to invoke the toaster.  Along the way, add a couple more htup.h macros
to eliminate confusing negated tests, and get rid of the already
vestigial TUPLE_TOASTER_ACTIVE symbol.
2004-01-16 20:51:30 +00:00
Neil Conway
bc028beb16 Make the 'wal_debug' GUC variable a boolean (rather than an integer), and
hide it behind #ifdef WAL_DEBUG blocks.
2004-01-06 17:26:23 +00:00
Tom Lane
569659ae16 Improve btree's initial-positioning-strategy code so that we never need
to step more than one entry after descending the search tree to arrive at
the correct place to start the scan.  This can improve the behavior
substantially when there are many entries equal to the chosen boundary
value.  Per suggestion from Dmitry Tkach, 14-Jul-03.
2003-12-21 01:23:06 +00:00
Bruce Momjian
d75b2ec4eb This patch is the next step towards (re)allowing fork/exec.
Claudio Natoli
2003-12-20 17:31:21 +00:00
Peter Eisentraut
2afacfc403 This patch properly sets the prototype for the on_shmem_exit and
on_proc_exit functions, and adjust all other related code to use
the proper types too.

by Kurt Roeckx
2003-12-12 18:45:10 +00:00
PostgreSQL Daemon
55b113257c make sure the $Id tags are converted to $PostgreSQL as well ... 2003-11-29 22:41:33 +00:00
Tom Lane
fa5c8a055a Cross-data-type comparisons are now indexable by btrees, pursuant to my
pghackers proposal of 8-Nov.  All the existing cross-type comparison
operators (int2/int4/int8 and float4/float8) have appropriate support.
The original proposal of storing the right-hand-side datatype as part of
the primary key for pg_amop and pg_amproc got modified a bit in the event;
it is easier to store zero as the 'default' case and only store a nonzero
when the operator is actually cross-type.  Along the way, remove the
long-since-defunct bigbox_ops operator class.
2003-11-12 21:15:59 +00:00
Tom Lane
c1d62bfd00 Add operator strategy and comparison-value datatype fields to ScanKey.
Remove the 'strategy map' code, which was a large amount of mechanism
that no longer had any use except reverse-mapping from procedure OID to
strategy number.  Passing the strategy number to the index AM in the
first place is simpler and faster.
This is a preliminary step in planned support for cross-datatype index
operations.  I'm committing it now since the ScanKeyEntryInitialize()
API change touches quite a lot of files, and I want to commit those
changes before the tree drifts under me.
2003-11-09 21:30:38 +00:00
Peter Eisentraut
96889392e9 Implement isolation levels read uncommitted and repeatable read as acting
like the next higher one.
2003-11-06 22:08:15 +00:00
Tom Lane
90b2202975 Fix bad interaction between NOTIFY processing and V3 extended query
protocol, per report from Igor Shevchenko.  NOTIFY thought it could
do its thing if transaction blockState is TBLOCK_DEFAULT, but in
reality it had better check the low-level transaction state is
TRANS_DEFAULT as well.  Formerly it was not possible to wait for the
client in a state where the first is true and the second is not ...
but now we can have such a state.  Minor cleanup in StartTransaction()
as well.
2003-10-16 16:50:41 +00:00
Tom Lane
55d85f42a8 Repair RI trigger visibility problems (this time for sure ;-)) per recent
discussion on pgsql-hackers: in READ COMMITTED mode we just have to force
a QuerySnapshot update in the trigger, but in SERIALIZABLE mode we have
to run the scan under a current snapshot and then complain if any rows
would be updated/deleted that are not visible in the transaction snapshot.
2003-10-01 21:30:53 +00:00
Tom Lane
e33f205a94 Adjust btree index build procedure so that the btree metapage looks
invalid (has the wrong magic number) until the build is entirely
complete.  This turns out to cost no additional writes in the normal
case, since we were rewriting the metapage at the end of the process
anyway.  In normal scenarios there's no real gain in security, because
a failed index build would roll back the transaction leaving an unused
index file, but for rebuilding shared system indexes this seems to add
some useful protection.
2003-09-29 23:40:26 +00:00
Tom Lane
8934790052 Add a mechanism to let dynamically loaded modules register post-commit/
post-abort cleanup hooks.  I'm surprised that we have not needed this
already, but I need it now to fix a plpgsql problem, and the usefulness
for other dynamically loaded modules seems obvious.
2003-09-28 23:26:20 +00:00
Tom Lane
c63a5452d8 Get rid of ReferentialIntegritySnapshotOverride by extending Executor API
to allow es_snapshot to be set to SnapshotNow rather than a query snapshot.
This solves a bug reported by Wade Klaver, wherein triggers fired as a
result of RI cascade updates could misbehave.
2003-09-25 18:58:36 +00:00
Tom Lane
db18703b5a Fix LISTEN/NOTIFY race condition reported by Gavin Sherry. While a
really general fix might be difficult, I believe the only case where
AtCommit_Notify could see an uncommitted tuple is where the other guy
has just unlistened and not yet committed.  The best solution seems to
be to just skip updating that tuple, on the assumption that the other
guy does not want to hear about the notification anyway.  This is not
perfect --- if the other guy rolls back his unlisten instead of committing,
then he really should have gotten this notify.  But to do that, we'd have
to wait to see if he commits or not, or make UNLISTEN hold exclusive lock
on pg_listener until commit.  Either of these answers is deadlock-prone,
not to mention horrible for interactive performance.  Do it this way
for now.  (What happened to that project to do LISTEN/NOTIFY in memory
with no table, anyway?)
2003-09-15 23:33:43 +00:00
Tom Lane
7a3693716d Reimplement hash index locking algorithms, per my recent proposal to
pghackers.  This fixes the problem recently reported by Markus KrÌutner
(hash bucket split corrupts the state of scans being done concurrently),
and I believe it also fixes all the known problems with deadlocks in
hash index operations.  Hash indexes are still not really ready for prime
time (since they aren't WAL-logged), but this is a step forward.
2003-09-04 22:06:27 +00:00
Tom Lane
d70610c4ee Several fixes for hash indexes that involve changing the on-disk index
layout; therefore, this change forces REINDEX of hash indexes (though
not a full initdb).  Widen hashm_ntuples to double so that hash space
management doesn't get confused by more than 4G entries; enlarge the
allowed number of free-space-bitmap pages; replace the useless bshift
field with a useful bmshift field; eliminate 4 bytes of wasted space
in the per-page special area.
2003-09-02 18:13:32 +00:00
Tom Lane
39673ca47b Rewrite hashbulkdelete() to make it amenable to new bucket locking
scheme.  A pleasant side effect is that it is *much* faster when deleting
a large fraction of the indexed tuples, because of elimination of
redundant hash_step activity induced by hash_adjscans.  Various other
continuing code cleanup.
2003-09-02 02:18:38 +00:00
Tom Lane
65c2d427fb Preliminary cleanup for hash index code (doesn't attack the locking problem
yet).  Fix a couple of bugs that would only appear if multiple bitmap pages
are used, including a buffer reference leak and incorrect computation of bit
indexes.  Get rid of 'overflow address' concept, which accomplished nothing
except obfuscating the code and creating a risk of failure due to limited
range of offset field.  Rename some misleadingly-named fields and routines,
and improve documentation.
2003-09-01 20:26:34 +00:00
Tom Lane
302f1a86dc Rewriter and planner should use only resno, not resname, to identify
target columns in INSERT and UPDATE targetlists.  Don't rely on resname
to be accurate in ruleutils, either.  This fixes bug reported by
Donald Fraser, in which renaming a column referenced in a rule did not
work very well.
2003-08-11 23:04:50 +00:00
Bruce Momjian
46785776c4 Another pgindent run with updated typedefs. 2003-08-08 21:42:59 +00:00
Tom Lane
2f9c859ea1 Fix some copyright notices that weren't updated. Improve copyright tool
so it won't miss 'em again.
2003-08-04 23:59:41 +00:00
Bruce Momjian
f3c3deb7d0 Update copyrights to 2003. 2003-08-04 02:40:20 +00:00
Bruce Momjian
089003fb46 pgindent run. 2003-08-04 00:43:34 +00:00
Tom Lane
e8db9b26d0 elog mop-up. 2003-07-27 17:10:07 +00:00
Tom Lane
bff0422b6c Revise hash join and hash aggregation code to use the same datatype-
specific hash functions used by hash indexes, rather than the old
not-datatype-aware ComputeHashFunc routine.  This makes it safe to do
hash joining on several datatypes that previously couldn't use hashing.
The sets of datatypes that are hash indexable and hash joinable are now
exactly the same, whereas before each had some that weren't in the other.
2003-06-22 22:04:55 +00:00
Bruce Momjian
0abe7431c6 This patch extracts page buffer pooling and the simple
least-recently-used strategy from clog.c into slru.c.  It doesn't
change any visible behaviour and passes all regression tests plus a
TruncateCLOG test done manually.

Apart from refactoring I made a little change to SlruRecentlyUsed,
formerly ClogRecentlyUsed:  It now skips incrementing lru_counts, if
slotno is already the LRU slot, thus saving a few CPU cycles.  To make
this work, lru_counts are initialised to 1 in SimpleLruInit.

SimpleLru will be used by pg_subtrans (part of the nested transactions
project), so the main purpose of this patch is to avoid future code
duplication.

Manfred Koizar
2003-06-11 22:37:46 +00:00
Tom Lane
f85f43dfb5 Backend support for autocommit removed, per recent discussions. The
only remnant of this failed experiment is that the server will take
SET AUTOCOMMIT TO ON.  Still TODO: provide some client-side autocommit
logic in libpq.
2003-05-14 03:26:03 +00:00
Tom Lane
30f609484d Add binary I/O routines for a bunch more datatypes. Still a few to go,
but that was enough tedium for one day.  Along the way, move the few
support routines for types xid and cid into a more logical place.
2003-05-12 23:08:52 +00:00
Tom Lane
c0a8c3ac13 Update 3.0 protocol support to match recent agreements about how to
handle multiple 'formats' for data I/O.  Restructure CommandDest and
DestReceiver stuff one more time (it's finally starting to look a bit
clean though).  Code now matches latest 3.0 protocol document as far
as message formats go --- but there is no support for binary I/O yet.
2003-05-08 18:16:37 +00:00
Tom Lane
79913910d4 Restructure command destination handling so that we pass around
DestReceiver pointers instead of just CommandDest values.  The DestReceiver
is made at the point where the destination is selected, rather than
deep inside the executor.  This cleans up the original kluge implementation
of tstoreReceiver.c, and makes it easy to support retrieving results
from utility statements inside portals.  Thus, you can now do fun things
like Bind and Execute a FETCH or EXPLAIN command, and it'll all work
as expected (e.g., you can Describe the portal, or use Execute's count
parameter to suspend the output partway through).  Implementation involves
stuffing the utility command's output into a Tuplestore, which would be
kind of annoying for huge output sets, but should be quite acceptable
for typical uses of utility commands.
2003-05-06 20:26:28 +00:00
Tom Lane
2cf57c8f8d Implement feature of new FE/BE protocol whereby RowDescription identifies
the column by table OID and column number, if it's a simple column
reference.  Along the way, get rid of reskey/reskeyop fields in Resdoms.
Turns out that representation was not convenient for either the planner
or the executor; we can make the planner deliver exactly what the
executor wants with no more effort.
initdb forced due to change in stored rule representation.
2003-05-06 00:20:33 +00:00
Tom Lane
16503e6fa4 Extended query protocol: parse, bind, execute, describe FE/BE messages.
Only lightly tested as yet, since libpq doesn't know anything about 'em.
2003-05-05 00:44:56 +00:00
Tom Lane
4db9689d1a Add transaction status field to ReadyForQuery messages, and make room
for tableID/columnID in RowDescription.  (The latter isn't really
implemented yet though --- the backend always sends zeroes, and libpq
just throws away the data.)
2003-04-26 20:23:00 +00:00
Tom Lane
0797bb5c50 During VACUUM FULL, truncate off any deletable pages that are at the
end of a btree index.  This isn't super-effective, since we won't move
nondeletable pages, but it's better than nothing.  Also, improve stats
displayed during VACUUM VERBOSE.
2003-02-24 00:57:17 +00:00
Tom Lane
3bbd6af37c Adjust btbulkdelete logic so that only one WAL record is issued while
deleting multiple index entries on a single index page.  This makes for
a very substantial reduction in the amount of WAL traffic during a
large delete operation.
2003-02-23 22:43:09 +00:00
Tom Lane
13dadef8b5 Improve coding of log_heap_clean() and heap_xlog_clean(). 2003-02-23 20:32:12 +00:00
Tom Lane
88dc31e3f2 First cut at recycling space in btree indexes. Still some rough edges
to fix, but it seems to basically work...
2003-02-23 06:17:13 +00:00
Tom Lane
799bc58dc7 More infrastructure for btree compaction project. Tree-traversal code
now knows what to do upon hitting a dead page (in theory anyway, it's
untested...).  Add a post-VACUUM-cleanup entry point for index AMs, to
provide a place for dead-page scavenging to happen.
Also, fix oversight that broke btpo_prev links in temporary indexes.
initdb forced due to additions in pg_am.
2003-02-22 00:45:05 +00:00
Tom Lane
70508ba7ae Make btree index structure adjustments and WAL logging changes needed to
support btree compaction, as per proposal of a few days ago.  btree index
pages no longer store parent links, instead they have a level indicator
(counting up from zero for leaf pages).  The FixBTree recovery logic is
removed, and replaced by code that detects missing parent-level insertions
during WAL replay.  Also, generate appropriate WAL entries when updating
btree metapage and when building a btree index from scratch.  I believe
btree indexes are now completely WAL-legal for the first time.
initdb forced due to index and WAL changes.
2003-02-21 00:06:22 +00:00
Bruce Momjian
48ee6f4916 This trivial patch removes the usage of some old statistics code that no
longer works -- IncrHeapAccessStat() didn't actually *do* anything
anymore, so no reason to keep it around AFAICS. I also fixed a
grammatical error in a comment.


Neil Conway
2003-02-13 05:35:11 +00:00
Tom Lane
a4482f4c4c Fix coredump problem in plpgsql's RETURN NEXT. When a SELECT INTO
that's selecting into a RECORD variable returns zero rows, make it
assign an all-nulls row to the RECORD; this is consistent with what
happens when the SELECT INTO target is not a RECORD.  In support of
this, tweak the SPI code so that a valid tuple descriptor is returned
even when a SPI select returns no rows.
2003-01-21 22:06:12 +00:00
Peter Eisentraut
b65cd56240 Read-only transactions, as defined in SQL. 2003-01-10 22:03:30 +00:00
Tom Lane
cbca6c4896 Fix for bug #866. 7.3 contains new logic for avoiding redundant calls to
the index AM when we know we are fetching a unique row.  However, this
logic did not consider the possibility that it would be asked to fetch
backwards.  Also fix mark/restore to work correctly in this scenario.
2003-01-08 19:41:40 +00:00
Tom Lane
17ac74797a Put back error test for DECLARE CURSOR outside a transaction block ...
but do it correctly now.
2002-11-18 01:17:39 +00:00
Bruce Momjian
2986aa6a66 Add checkpoint_warning to warn of excessive checkpoints caused by too
few WAL files.
2002-11-15 02:44:57 +00:00
Bruce Momjian
63e9734542 Update xact.c comments for clarity. 2002-11-13 03:12:05 +00:00
Tom Lane
200b151615 Fix places that were using IsTransactionBlock() as an (inadequate) check
that they'd get to commit immediately on finishing.  There's now a
centralized routine PreventTransactionChain() that implements the
necessary tests.
2002-10-21 22:06:20 +00:00
Tom Lane
b2ab1e6bc9 Ensure that before truncating CLOG, we force a checkpoint even if no
recent WAL activity has occurred.  Without this, it's possible that a
later crash might leave tuples on disk with un-updated commit status
bits.
2002-09-26 22:58:34 +00:00
Tom Lane
c87469e64a Fix problems with loss of tuple commit status bits during WAL redo of
VACUUM FULL tuple moves.  Store full-width t_infomask in WAL, rather
than storing low 8 bits and expecting to be able to reconstruct upper
bits.  While at it, remove redundant t_oid field from WAL headers
(the OID, if present, is now recorded in the data portion of the tuple).
WAL version number bumped --- this does not force an initdb, you can
instead run pg_resetxlog after a clean shutdown of the old postmaster.
2002-09-26 22:46:29 +00:00
Bruce Momjian
e50f52a074 pgindent run. 2002-09-04 20:31:48 +00:00
Tom Lane
c7a165adc6 Code review for HeapTupleHeader changes. Add version number to page headers
(overlaying low byte of page size) and add HEAP_HASOID bit to t_infomask,
per earlier discussion.  Simplify scheme for overlaying fields in tuple
header (no need for cmax to live in more than one place).  Don't try to
clear infomask status bits in tqual.c --- not safe to do it there.  Don't
try to force output table of a SELECT INTO to have OIDs, either.  Get rid
of unnecessarily complex three-state scheme for TupleDesc.tdhasoids, which
has already caused one recent failure.  Improve documentation.
2002-09-02 01:05:06 +00:00
Tom Lane
26993b2918 AUTOCOMMIT mode is now an available backend GUC variable; setting it
to false provides more SQL-spec-compliant behavior than we had before.
I am not sure that setting it false is actually a good idea yet; there
is a lot of client-side code that will probably be broken by turning
autocommit off.  But it's a start.

Loosely based on a patch by David Van Wie.
2002-08-30 22:18:07 +00:00
Bruce Momjian
63653f7ffa Complete TODO item:
* Remove wal_files postgresql.conf option because WAL files are
	  now recycled
2002-08-30 16:50:50 +00:00
Tom Lane
58de480999 Clean up comments to be careful about the distinction between variable-
width types and varlena types, since with the introduction of CSTRING as
a more-or-less-real type, these concepts aren't identical.  I've tried to
use varlena consistently to denote datatypes with typlen = -1, ie, they
have a length word and are potentially TOASTable; while the term variable
width covers both varlena and cstring (and, perhaps, someday other types
with other rules for computing the actual width).  No code changes in this
commit except for renaming a couple macros.
2002-08-25 17:20:01 +00:00
Tom Lane
976246cc7e The cstring datatype can now be copied, passed around, etc. The typlen
value '-2' is used to indicate a variable-width type whose width is
computed as strlen(datum)+1.  Everything that looks at typlen is updated
except for array support, which Joe Conway is working on; at the moment
it wouldn't work to try to create an array of cstring.
2002-08-24 15:00:47 +00:00
Tom Lane
b663f3443b Add a bunch of pseudo-types to replace the behavior formerly associated
with OPAQUE, as per recent pghackers discussion.  I still want to do some
more work on the 'cstring' pseudo-type, but I'm going to commit the bulk
of the changes now before the tree starts shifting under me ...
2002-08-22 00:01:51 +00:00
Bruce Momjian
d04e9137c9 Reverse out XLogDir/-X write-ahead log handling, per discussion.
Original patch from Thomas.
2002-08-17 15:12:07 +00:00
Tom Lane
5df307c778 Restructure local-buffer handling per recent pghackers discussion.
The local buffer manager is no longer used for newly-created relations
(unless they are TEMP); a new non-TEMP relation goes through the shared
bufmgr and thus will participate normally in checkpoints.  But TEMP relations
use the local buffer manager throughout their lifespan.  Also, operations
in TEMP relations are not logged in WAL, thus improving performance.
Since it's no longer necessary to fsync relations as they move out of the
local buffers into shared buffers, quite a lot of smgr.c/md.c/fd.c code
is no longer needed and has been removed: there's no concept of a dirty
relation anymore in md.c/fd.c, and we never fsync anything but WAL.
Still TODO: improve local buffer management algorithms so that it would
be reasonable to increase NLocBuffer.
2002-08-06 02:36:35 +00:00
Thomas G. Lockhart
ac1a3dcf24 Fix compilation problem with assert checking enabled for recent xlog
location feature.
2002-08-05 01:24:16 +00:00
Thomas G. Lockhart
af704cdfb4 Implement WAL log location control using "-X" or PGXLOG. 2002-08-04 06:26:38 +00:00
Bruce Momjian
b0f5086e41 oid is needed, it is added at the end of the struct (after the null
bitmap, if present).

Per Tom Lane's suggestion the information whether a tuple has an oid
or not is carried in the tuple descriptor.  For debugging reasons
tdhasoid is of type char, not bool.  There are predefined values for
WITHOID, WITHOUTOID and UNDEFOID.

This patch has been generated against a cvs snapshot from last week
and I don't expect it to apply cleanly to current sources.  While I
post it here for public review, I'm working on a new version against a
current snapshot.  (There's been heavy activity recently; hope to
catch up some day ...)

This is a long patch;  if it is too hard to swallow, I can provide it
in smaller pieces:

Part 1:  Accessor macros
Part 2:  tdhasoid in TupDesc
Part 3:  Regression test
Part 4:  Parameter withoid to heap_addheader
Part 5:  Eliminate t_oid from HeapTupleHeader

Part 2 is the most hairy part because of changes in the executor and
even in the parser;  the other parts are straightforward.

Up to part 4 the patched postmaster stays binary compatible to
databases created with an unpatched version.  Part 5 is small (100
lines) and finally breaks compatibility.

Manfred Koizar
2002-07-20 05:16:59 +00:00
Tom Lane
7c6df91dda Second phase of committing Rod Taylor's pg_depend/pg_constraint patch.
pg_relcheck is gone; CHECK, UNIQUE, PRIMARY KEY, and FOREIGN KEY
constraints all have real live entries in pg_constraint.  pg_depend
exists, and RESTRICT/CASCADE options work on most kinds of DROP;
however, pg_depend is not yet very well populated with dependencies.
(Most of the ones that are present at this point just replace formerly
hardwired associations, such as the implicit drop of a relation's pg_type
entry when the relation is dropped.)  Need to add more logic to create
dependency entries, improve pg_dump to dump constraints in place of
indexes and triggers, and add some regression tests.
2002-07-12 18:43:19 +00:00
Bruce Momjian
20e83274bb Fix typo in xl_heaptid comment
Manfred Koizar
2002-07-08 01:52:23 +00:00
Bruce Momjian
33f1687879 There already was a macro PageGetItemId; this is now used in (almost)
all places, where pd_linp is accessed.  Also introduce new macros
SizeOfPageHeaderData and BTMaxItemSize. This is just source code
cosmetic, no behaviour changed.

Manfred Koizar
2002-07-02 05:48:44 +00:00
Bruce Momjian
97bfffe50e This patch, which is built upon the "HeapTupleHeader accessor macros"
patch from 2002-06-10, is supposed to reduce the heap tuple header size
by four bytes on most architectures.  Of course it changes the on-disk
tuple format and therefore requires initdb.

This overlays cmin/cmax/xmax fields into only two fields.

Manfred Koizar
2002-07-02 05:46:14 +00:00
Bruce Momjian
d84fe82230 Update copyright to 2002. 2002-06-20 20:29:54 +00:00
Bruce Momjian
3c35face41 This patch wraps all accesses to t_xmin, t_cmin, t_xmax, and t_cmax in
HeapTupleHeaderData in setter and getter macros called
HeapTupleHeaderGetXmin, HeapTupleHeaderSetXmin etc.

It also introduces a "virtual" field xvac by defining
HeapTupleHeaderGetXvac and HeapTupleHeaderSetXvac.  Xvac is used by
VACUUM, in fact it is stored in t_cmin.

Manfred Koizar
2002-06-15 19:54:24 +00:00
Tom Lane
3212cf9417 Distinguish between MaxHeapAttributeNumber and MaxTupleAttributeNumber,
where the latter is made slightly larger to allow for in-memory tuples
containing resjunk attributes.  Responds to today's complaint that one
cannot UPDATE a table containing the allegedly-legal maximum number of
columns.

Also, apply Manfred Koizar's recent patch to avoid extra alignment padding
when there is a null bitmap.  This saves bytes in some cases while not
creating any backward-compatibility problem AFAICS.
2002-05-27 19:53:33 +00:00
Tom Lane
4d567013cf Remove AMI_OVERRIDE tests from tqual.c routines; they aren't necessary
and just slow down normal operations (only fractionally, but a cycle saved
is a cycle earned).  Improve documentation of AMI_OVERRIDE behavior.
2002-05-25 20:00:12 +00:00
Tom Lane
3f4d488022 Mark index entries "killed" when they are no longer visible to any
transaction, so as to avoid returning them out of the index AM.  Saves
repeated heap_fetch operations on frequently-updated rows.  Also detect
queries on unique keys (equality to all columns of a unique index), and
don't bother continuing scan once we have found first match.

Killing is implemented in the btree and hash AMs, but not yet in rtree
or gist, because there isn't an equally convenient place to do it in
those AMs (the outer amgetnext routine can't do it without re-pinning
the index page).

Did some small cleanup on APIs of HeapTupleSatisfies, heap_fetch, and
index_insert to make this a little easier.
2002-05-24 18:57:57 +00:00
Tom Lane
959e61e917 Remove global variable scanCommandId in favor of storing a command ID
in snapshots, per my proposal of a few days ago.  Also, tweak heapam.c
routines (heap_insert, heap_update, heap_delete, heap_mark4update) to
be passed the command ID to use, instead of doing GetCurrentCommandID.
For catalog updates they'll still get passed current command ID, but
for updates generated from the main executor they'll get passed the
command ID saved in the snapshot the query is using.  This should fix
some corner cases associated with functions and triggers that advance
current command ID while an outer query is still in progress.
2002-05-21 22:05:55 +00:00
Tom Lane
44fbe20d62 Restructure indexscan API (index_beginscan, index_getnext) per
yesterday's proposal to pghackers.  Also remove unnecessary parameters
to heap_beginscan, heap_rescan.  I modified pg_proc.h to reflect the
new numbers of parameters for the AM interface routines, but did not
force an initdb because nothing actually looks at those fields.
2002-05-20 23:51:44 +00:00
Tom Lane
f0811a74b3 Merge the last few variable.c configuration variables into the generic
GUC support.  It's now possible to set datestyle, timezone, and
client_encoding from postgresql.conf and per-database or per-user
settings.  Also, implement rollback of SET commands that occur in a
transaction that later fails.  Create a SET LOCAL var = value syntax
that sets the variable only for the duration of the current transaction.
All per previous discussions in pghackers.
2002-05-17 01:19:19 +00:00
Tom Lane
838fe25a95 Create a new GUC variable search_path to control the namespace search
path.  The default behavior if no per-user schemas are created is that
all users share a 'public' namespace, thus providing behavior backwards
compatible with 7.2 and earlier releases.  Probably the semantics and
default setting will need to be fine-tuned, but this is a start.
2002-04-01 03:34:27 +00:00
Tom Lane
d5e99ab4d6 pg_type has a typnamespace column; system now supports creating types
in different namespaces.  Also, cleanup work on relation namespace
support: drop, alter, rename commands work for tables in non-default
namespaces.
2002-03-29 19:06:29 +00:00
Tom Lane
1dbf8aa7a8 pg_class has a relnamespace column. You can create and access tables
in schemas other than the system namespace; however, there's no search
path yet, and not all operations work yet on tables outside the system
namespace.
2002-03-26 19:17:02 +00:00
Tom Lane
01747692fe Repair two problems with WAL logging of sequence nextvalI() ops, as
per recent pghackers discussion: force a new WAL record at first nextval
after a checkpoint, and ensure that xlog is flushed to disk if a nextval
record is the only thing emitted by a transaction.
2002-03-15 19:20:36 +00:00
Tom Lane
c422b5ca6b Code review for improved-hashing patch. Fix some portability issues
(char != unsigned char, Datum != uint32); make use of new hash code in
dynahash hash tables and hash joins.
2002-03-09 17:35:37 +00:00
Bruce Momjian
7ab7467318 I've attached a patch which implements Bob Jenkin's hash function for
PostgreSQL. This hash function replaces the one used by hash indexes and
the catalog cache. Hash joins use a different, relatively poor-quality
hash function, but I'll fix that later.

As suggested by Tom Lane, this patch also changes the size of the fixed
hash table used by the catalog cache to be a power-of-2 (instead of a
prime: I chose 256 instead of 257). This allows the catcache to lookup
hash buckets using a simple bitmask. This should improve the performance
of the catalog cache slightly, since the previous method (modulo a
prime) was slow.

In my tests, this improves the performance of hash indexes by between 4%
and 8%; the performance when using btree indexes or seqscans is
basically unchanged.

Neil Conway <neilconway@rogers.com>
2002-03-06 20:49:46 +00:00
Bruce Momjian
03194432de I attach a version of my toast-slicing patch, against current CVS
(current as of a few hours ago.)

This patch:

1. Adds PG_GETARG_xxx_P_SLICE() macros and associated support routines.

2. Adds routines in src/backend/access/tuptoaster.c for fetching only
necessary chunks of a toasted value. (Modelled on latest changes to
assume chunks are returned in order).

3. Amends text_substr and bytea_substr to use new methods. It now
handles multibyte cases -and should still lead to a performance
improvement in the multibyte case where the substring is near the
beginning of the string.

4. Added new command: ALTER TABLE tabname ALTER COLUMN colname SET
STORAGE {PLAIN | EXTERNAL | EXTENDED | MAIN} to parser and documented in
alter-table.sgml. (NB I used ColId as the item type for the storage
mode string, rather than a new production - I hope this makes sense!).
All this does is sets attstorage for the specified column.

4. AlterTableAlterColumnStatistics is now AlterTableAlterColumnFlags and
handles both statistics and storage (it uses the subtype code to
distinguish). The previous version of my patch also re-arranged other
code in backend/commands/command.c but I have dropped that from this
patch.(I plan to return to it separately).

5. Documented new macros (and also the PG_GETARG_xxx_P_COPY macros) in
xfunc.sgml. ref/alter_table.sgml also contains documentation for ALTER
COLUMN SET STORAGE.

John Gray
2002-03-05 05:33:31 +00:00
Tom Lane
6779c55c22 Clean up BeginCommand and related routines. BeginCommand and EndCommand
are now both invoked once per received SQL command (raw parsetree) from
pg_exec_query_string.  BeginCommand is actually just an empty routine
at the moment --- all its former operations have been pushed into tuple
receiver setup routines in printtup.c.  This makes for a clean distinction
between BeginCommand/EndCommand (once per command) and the tuple receiver
setup/teardown routines (once per ExecutorRun call), whereas the old code
was quite ad hoc.  Along the way, clean up the calling conventions for
ExecutorRun a little bit.
2002-02-27 19:36:13 +00:00
Bruce Momjian
f5dff44736 I've attached a simple patch which should improve the performance of
hashname() and reduce the penalty incured when NAMEDATALEN is increased.
I posted this to -hackers a couple days ago, and there haven't been any
major complaints. It passes the regression tests. See -hackers for more
discussion, as well as the suggestion from Tom Lane on which this patch
is based.

Unless anyone sees any problems, please apply for 7.3.

Cheers,

Neil Conway
2002-02-25 04:06:52 +00:00
Tom Lane
7863404417 A bunch of changes aimed at reducing backend startup time...
Improve 'pg_internal.init' relcache entry preload mechanism so that it is
safe to use for all system catalogs, and arrange to preload a realistic
set of system-catalog entries instead of only the three nailed-in-cache
indexes that were formerly loaded this way.  Fix mechanism for deleting
out-of-date pg_internal.init files: this must be synchronized with transaction
commit, not just done at random times within transactions.  Drive it off
relcache invalidation mechanism so that no special-case tests are needed.

Cache additional information in relcache entries for indexes (their pg_index
tuples and index-operator OIDs) to eliminate repeated lookups.  Also cache
index opclass info at the per-opclass level to avoid repeated lookups during
relcache load.

Generalize 'systable scan' utilities originally developed by Hiroshi,
move them into genam.c, use in a number of places where there was formerly
ugly code for choosing either heap or index scan.  In particular this allows
simplification of the logic that prevents infinite recursion between syscache
and relcache during startup: we can easily switch to heapscans in relcache.c
when and where needed to avoid recursion, so IndexScanOK becomes simpler and
does not need any expensive initialization.

Eliminate useless opening of a heapscan data structure while doing an indexscan
(this saves an mdnblocks call and thus at least one kernel call).
2002-02-19 20:11:20 +00:00
Bruce Momjian
ea08e6cd55 New pgindent run with fixes suggested by Tom. Patch manually reviewed,
initdb/regression tests pass.
2001-11-05 17:46:40 +00:00
Tom Lane
7d05310828 Fix problem reported by Alex Korn: if a relation has been dropped and
recreated since the start of our transaction, our first reference to it
errored out because we'd try to reuse our old relcache entry for it.
Do this by accepting SI inval messages just before relcache search in
heap_openr, so that dead relcache entries will be flushed before we
search.  Also, break heap_open/openr into two pairs of routines,
relation_open(r) and heap_open(r).  The relation_open routines make
no tests on relkind and so can be used to open anything that has a
pg_class entry.  The heap_open routines are wrappers that add a relkind
test to preserve their established behavior.  Use the relation_open
routines in several places that had various kluge solutions for opening
rels that might be either heap or index rels.

Also, remove the old 'heap stats' code that's been superseded by Jan's
stats collector, and clean up some inconsistencies in error reporting
between the different types of ALTER TABLE.
2001-11-02 16:30:29 +00:00
Bruce Momjian
6783b2372e Another pgindent run. Fixes enum indenting, and improves #endif
spacing.  Also adds space for one-line comments.
2001-10-28 06:26:15 +00:00
Bruce Momjian
b81844b173 pgindent run on all C files. Java run to follow. initdb/regression
tests pass.
2001-10-25 05:50:21 +00:00
Thomas G. Lockhart
9310075a13 Accept an INTERVAL argument for SET TIME ZONE per SQL99.
Modified the parser and the SET handlers to use full Node structures
 rather than simply a character string argument.
Implement INTERVAL() YEAR TO MONTH (etc) syntax per SQL99.
 Does not yet accept the goofy string format that goes along with, but
 this should be fairly straight forward to fix now as a bug or later
 as a feature.
Implement precision for the INTERVAL() type.
 Use the typmod mechanism for both of INTERVAL features.
Fix the INTERVAL syntax in the parser:
 opt_interval was in the wrong place.
INTERVAL is now a reserved word, otherwise we get reduce/reduce errors.
Implement an explicit date_part() function for TIMETZ.
 Should fix coersion problem with INTERVAL reported by Peter E.
Fix up some error messages for date/time types.
 Use all caps for type names within message.
Fix recently introduced side-effect bug disabling 'epoch' as a recognized
 field for date_part() etc. Reported by Peter E. (??)
Bump catalog version number.
Rename "microseconds" current transaction time field
 from ...Msec to ...Usec. Duh!
date/time regression tests updated for reference platform, but a few
 changes will be necessary for others.
2001-10-18 17:30:21 +00:00
Tom Lane
85801a4dbd Rearrange fmgr.c and relcache so that it's possible to keep FmgrInfo
lookup info in the relcache for index access method support functions.
This makes a huge difference for dynamically loaded support functions,
and should save a few cycles even for built-in ones.  Also tweak dfmgr.c
so that load_external_function is called only once, not twice, when
doing fmgr_info for a dynamically loaded function.  All per performance
gripe from Teodor Sigaev, 5-Oct-01.
2001-10-06 23:21:45 +00:00
Tom Lane
499abb0c0f Implement new 'lightweight lock manager' that's intermediate between
existing lock manager and spinlocks: it understands exclusive vs shared
lock but has few other fancy features.  Replace most uses of spinlocks
with lightweight locks.  All remaining uses of spinlocks have very short
lock hold times (a few dozen instructions), so tweak spinlock backoff
code to work efficiently given this assumption.  All per my proposal on
pghackers 26-Sep-01.
2001-09-29 04:02:27 +00:00
Thomas G. Lockhart
6f58115ddd Measure the current transaction time to milliseconds.
Define a new function, GetCurrentTransactionStartTimeUsec() to get the time
 to this precision.
Allow now() and timestamp 'now' to use this higher precision result so
 we now have fractional seconds in this "constant".
Add timestamp without time zone type.
Move previous timestamp type to timestamp with time zone.
Accept another ISO variant for date/time values: yyyy-mm-ddThh:mm:ss
 (note the "T" separating the day from hours information).
Remove 'current' from date/time types; convert to 'now' in input.
Separate time and timetz regression tests.
Separate timestamp and timestamptz regression test.
2001-09-28 08:09:14 +00:00
Tom Lane
220ae48cca Suppress compiler warning. 2001-09-17 00:29:10 +00:00
Hiroshi Inoue
fc5ec424ab Apply 7.1.3 changes to the current tree also. 2001-09-08 16:15:28 +00:00
Tom Lane
bc7d37a525 Transaction IDs wrap around, per my proposal of 13-Aug-01. More
documentation to come, but the code is all here.  initdb forced.
2001-08-26 16:56:03 +00:00
Tom Lane
2589735da0 Replace implementation of pg_log as a relation accessed through the
buffer manager with 'pg_clog', a specialized access method modeled
on pg_xlog.  This simplifies startup (don't need to play games to
open pg_log; among other things, OverrideTransactionSystem goes away),
should improve performance a little, and opens the door to recycling
commit log space by removing no-longer-needed segments of the commit
log.  Actual recycling is not there yet, but I felt I should commit
this part separately since it'd still be useful if we chose not to
do transaction ID wraparound.
2001-08-25 18:52:43 +00:00
Tom Lane
7326e78c42 Ensure that all TransactionId comparisons are encapsulated in macros
(TransactionIdPrecedes, TransactionIdFollows, etc).  First step on the
way to transaction ID wrap solution ...
2001-08-23 23:06:38 +00:00
Tom Lane
a54075a6d6 Update GiST for new pg_opclass arrangement (finally a clean solution
for haskeytype).  Update GiST contrib modules too.  Add linear-time split
algorithm for R-tree GiST opclass.
From Oleg Bartunov and Teodor Sigaev.
2001-08-22 18:24:26 +00:00
Tom Lane
f933766ba7 Restructure pg_opclass, pg_amop, and pg_amproc per previous discussions in
pgsql-hackers.  pg_opclass now has a row for each opclass supported by each
index AM, not a row for each opclass name.  This allows pg_opclass to show
directly whether an AM supports an opclass, and furthermore makes it possible
to store additional information about an opclass that might be AM-dependent.
pg_opclass and pg_amop now store "lossy" and "haskeytype" information that we
previously expected the user to remember to provide in CREATE INDEX commands.
Lossiness is no longer an index-level property, but is associated with the
use of a particular operator in a particular index opclass.

Along the way, IndexSupportInitialize now uses the syscaches to retrieve
pg_amop and pg_amproc entries.  I find this reduces backend launch time by
about ten percent, at the cost of a couple more special cases in catcache.c's
IndexScanOK.

Initial work by Oleg Bartunov and Teodor Sigaev, further hacking by Tom Lane.

initdb forced.
2001-08-21 16:36:06 +00:00
Tom Lane
bf56f0759b Make OIDs optional, per discussions in pghackers. WITH OIDS is still the
default, but OIDS are removed from many system catalogs that don't need them.
Some interesting side effects: TOAST pointers are 20 bytes not 32 now;
pg_description has a three-column key instead of one.

Bugs fixed in passing: BINARY cursors work again; pg_class.relhaspkey
has some usefulness; pg_dump dumps comments on indexes, rules, and
triggers in a valid order.

initdb forced.
2001-08-10 18:57:42 +00:00
Bruce Momjian
13923be7c8 1. null-safe interface to GiST
(as proposed in http://fts.postgresql.org/db/mw/msg.html?mid=1028327)

2. support for 'pass-by-value' arguments - to test this
   we used special opclass for int4 with values in range [0-2^15]
   More testing will be done after resolving problem with
   index_formtuple and implementation of B-tree using GiST

3. small patch to contrib modules (seg,cube,rtree_gist,intarray) -
   mark functions as 'isstrict' where needed.

Oleg Bartunov
2001-08-10 14:34:28 +00:00
Tom Lane
7d4d5c00f0 Arrange to recycle old XLOG log segment files as new segment files,
rather than deleting them only to have to create more.  Steady state
is 2*CHECKPOINT_SEGMENTS + WAL_FILES + 1 segment files, which will
simply be renamed rather than constantly deleted and recreated.
To make this safe, added current XLOG file/offset number to page
header of XLOG pages, so that an un-overwritten page from an old
incarnation of a logfile can be reliably told from a valid page.
This change means that if you try to restart postmaster in a CVS-tip
database after installing the change, you'll get a complaint about
bad XLOG page magic number.  If you don't want to initdb, run
contrib/pg_resetxlog (and be sure you shut down the old postmaster
cleanly).
2001-07-19 02:12:35 +00:00
Tom Lane
c8076f09d2 Restructure index AM interface for index building and index tuple deletion,
per previous discussion on pghackers.  Most of the duplicate code in
different AMs' ambuild routines has been moved out to a common routine
in index.c; this means that all index types now do the right things about
inserting recently-dead tuples, etc.  (I also removed support for EXTEND
INDEX in the ambuild routines, since that's about to go away anyway, and
it cluttered the code a lot.)  The retail indextuple deletion routines have
been replaced by a "bulk delete" routine in which the indexscan is inside
the access method.  I haven't pushed this change as far as it should go yet,
but it should allow considerable simplification of the internal bookkeeping
for deletions.  Also, add flag columns to pg_am to eliminate various
hardcoded tests on AM OIDs, and remove unused pg_am columns.

Fix rtree and gist index types to not attempt to store NULLs; before this,
gist usually crashed, while rtree managed not to crash but computed wacko
bounding boxes for NULL entries (which might have had something to do with
the performance problems we've heard about occasionally).

Add AtEOXact routines to hash, rtree, and gist, all of which have static
state that needs to be reset after an error.  We discovered this need long
ago for btree, but missed the other guys.

Oh, one more thing: concurrent VACUUM is now the default.
2001-07-15 22:48:19 +00:00
Tom Lane
b9f3a929ee Create a new HeapTupleSatisfiesVacuum() routine in tqual.c that embodies the
validity checking rules for VACUUM.  Make some other rearrangements of the
VACUUM code to allow more code to be shared between full and lazy VACUUM.
Minor code cleanups and added comments for TransactionId manipulations.
2001-07-12 04:11:13 +00:00
Tom Lane
af5ced9cfd Further work on connecting the free space map (which is still just a
stub) into the rest of the system.  Adopt a cleaner approach to preventing
deadlock in concurrent heap_updates: allow RelationGetBufferForTuple to
select any page of the rel, and put the onus on it to lock both buffers
in a consistent order.  Remove no-longer-needed isExtend hack from
API of ReleaseAndReadBuffer.
2001-06-29 21:08:25 +00:00
Jan Wieck
8d80b0d980 Statistical system views (yet without the config stuff, but
it's hard to keep such massive changes in sync with the tree
so I need to get it in and work from there now).

Jan
2001-06-22 19:16:24 +00:00
Tom Lane
1d584f97b9 Clean up various to-do items associated with system indexes:
pg_database now has unique indexes on oid and on datname.
pg_shadow now has unique indexes on usename and on usesysid.
pg_am now has unique index on oid.
pg_opclass now has unique index on oid.
pg_amproc now has unique index on amid+amopclaid+amprocnum.
Remove pg_rewrite's unnecessary index on oid, delete unused RULEOID syscache.
Remove index on pg_listener and associated syscache for performance reasons
(caching rows that are certain to change before you need 'em again is
rather pointless).
Change pg_attrdef's nonunique index on adrelid into a unique index on
adrelid+adnum.

Fix various incorrect settings of pg_class.relisshared, make that the
primary reference point for whether a relation is shared or not.
IsSharedSystemRelationName() is now only consulted to initialize relisshared
during initial creation of tables and indexes.  In theory we might now
support shared user relations, though it's not clear how one would get
entries for them into pg_class &etc of multiple databases.

Fix recently reported bug that pg_attribute rows created for an index all have
the same OID.  (Proof that non-unique OID doesn't matter unless it's
actually used to do lookups ;-))

There's no need to treat pg_trigger, pg_attrdef, pg_relcheck as bootstrap
relations.  Convert them into plain system catalogs without hardwired
entries in pg_class and friends.

Unify global.bki and template1.bki into a single init script postgres.bki,
since the alleged distinction between them was misleading and pointless.
Not to mention that it didn't work for setting up indexes on shared
system relations.

Rationalize locking of pg_shadow, pg_group, pg_attrdef (no need to use
AccessExclusiveLock where ExclusiveLock or even RowExclusiveLock will do).
Also, hold locks until transaction commit where necessary.
2001-06-12 05:55:50 +00:00
Tom Lane
bdadc9bf1c Remove RelationGetBufferWithBuffer(), which is horribly confused about
appropriate pin-count manipulation, and instead use ReleaseAndReadBuffer.
Make use of the fact that the passed-in buffer (if there is one) must
be pinned to avoid grabbing the bufmgr spinlock when we are able to
return this same buffer.  Eliminate unnecessary 'previous tuple' and
'next tuple' fields of HeapScanDesc and IndexScanDesc, thereby removing
a whole lot of bookkeeping from heap_getnext() and related routines.
2001-06-09 18:16:59 +00:00
Tom Lane
0b370ea7c8 Clean up some minor problems exposed by further thought about Panon's bug
report on old-style functions invoked by RI triggers.  We had a number of
other places that were being sloppy about which memory context FmgrInfo
subsidiary data will be allocated in.  Turns out none of them actually
cause a problem in 7.1, but this is for arcane reasons such as the fact
that old-style triggers aren't supported anyway.  To avoid getting burnt
later, I've restructured the trigger support so that we don't keep trigger
FmgrInfo structs in relcache memory.  Some other related cleanups too:
it's not really necessary to call fmgr_info at all while setting up
the index support info in relcache entries, because those ScanKeyEntry
structs are never used to invoke the functions.  This should speed up
relcache initialization a tiny bit.
2001-06-01 02:41:36 +00:00
Tom Lane
3043810d97 Updates to make GIST work with multi-key indexes (from Oleg Bartunov
and Teodor Sigaev).  Declare key values as Datum where appropriate,
rather than char* (Tom Lane).
2001-05-31 18:16:55 +00:00
Tom Lane
f1d5d0905c Tweak StrategyEvaluation data structure to eliminate hardwired limit on
number of strategies supported by an index AM.  Add missing copyright
notices and CVS $Header$ markers to GIST source files.
2001-05-30 19:53:40 +00:00
Tom Lane
baf57ee07b Remove unused, redundant header files. 2001-05-30 19:39:50 +00:00
Bruce Momjian
f6923ff3ac Oops, only wanted python change in the last commit. Backing out. 2001-05-25 15:45:34 +00:00
Bruce Momjian
dffb673692 While changing Cygwin Python to build its core as a DLL (like Win32
Python) to support shared extension modules, I have learned that Guido
prefers the style of the attached patch to solve the above problem.
I feel that this solution is particularly appropriate in this case
because the following:

    PglargeType
    PgType
    PgQueryType

are already being handled in the way that I am proposing for PgSourceType.

Jason Tishler
2001-05-25 15:34:50 +00:00
Tom Lane
27336e4f7a Repair race condition introduced into heap_update() in 7.1 ---
PageGetFreeSpace() was being called while not holding the buffer lock, which
not only could yield a garbage answer, but even if it's the right answer there
might be less space available after we reacquire the buffer lock.

Also repair potential deadlock introduced by my recent performance improvement
in RelationGetBufferForTuple(): it was possible for two heap_updates to try to
lock two buffers in opposite orders.  The fix creates a global rule that
buffers of a single heap relation should be locked in decreasing block number
order.  Currently, this only applies to heap_update; VACUUM can get away with
ignoring the rule since it holds exclusive lock on the whole relation anyway.
However, if we try to implement a VACUUM that can run in parallel with other
transactions, VACUUM will also have to obey the lock order rule.
2001-05-16 22:35:12 +00:00
Bruce Momjian
1e7b79cebc Remove unused tables pg_variable, pg_inheritproc, pg_ipl tables. Initdb
forced.
2001-05-14 20:30:21 +00:00
Tom Lane
f905d65ee3 Rewrite of planner statistics-gathering code. ANALYZE is now available as
a separate statement (though it can still be invoked as part of VACUUM, too).
pg_statistic redesigned to be more flexible about what statistics are
stored.  ANALYZE now collects a list of several of the most common values,
not just one, plus a histogram (not just the min and max values).  Random
sampling is used to make the process reasonably fast even on very large
tables.  The number of values and histogram bins collected is now
user-settable via an ALTER TABLE command.

There is more still to do; the new stats are not being used everywhere
they could be in the planner.  But the remaining changes for this project
should be localized, and the behavior is already better than before.

A not-very-related change is that sorting now makes use of btree comparison
routines if it can find one, rather than invoking '<' twice.
2001-05-07 00:43:27 +00:00
Tom Lane
571dbe4606 Improve comments for xlog item size #defines. 2001-03-25 22:40:58 +00:00
Bruce Momjian
0686d49da0 Remove dashes in comments that don't need them, rewrap with pgindent. 2001-03-22 06:16:21 +00:00
Bruce Momjian
9e1552607a pgindent run. Make it all clean. 2001-03-22 04:01:46 +00:00
Tom Lane
af6e88a9cf Remove NEXTXID xlog record type to avoid three-way deadlock risk.
NEXTXID isn't really necessary, per previous discussion in pghackers,
but I mulishy insisted we should put it in anyway.  Mea culpa.
2001-03-18 20:18:59 +00:00
Tom Lane
9d645fd84c Support syncing WAL log to disk using either fsync(), fdatasync(),
O_SYNC, or O_DSYNC (as available on a given platform).  Add GUC parameter
to control sync method.
Also, add defense to XLogWrite to prevent it from going nuts if passed
a target write position that's past the end of the buffers so far filled
by XLogInsert.
2001-03-16 05:44:33 +00:00