Commit Graph

289 Commits

Author SHA1 Message Date
Bruce Momjian 9098ab9e32 Update copyrights in source tree to 2008. 2008-01-01 19:46:01 +00:00
Bruce Momjian fdf5a5efb7 pgindent run for 8.3. 2007-11-15 21:14:46 +00:00
Tom Lane c293ba9eff If an index depends on no columns of its table, give it a dependency on the
whole table instead, to ensure that it goes away when the table is dropped.
Per bug #3723 from Sam Mason.

Backpatch as far as 7.4; AFAICT 7.3 does not have the issue, because it doesn't
have general-purpose expression indexes and so there must be at least one
column referenced by an index.
2007-11-08 23:22:54 +00:00
Tom Lane 6daef2bca4 Remove hack in pg_tablespace_aclmask() that disallowed permissions
on pg_global even to superusers, and replace it with checks in various
other places to complain about invalid uses of pg_global.  This ends
up being a bit more code but it allows a more specific error message
to be given, and it un-breaks pg_tablespace_size() on pg_global.
Per discussion.
2007-10-12 18:55:12 +00:00
Tom Lane 282d2a03dd HOT updates. When we update a tuple without changing any of its indexed
columns, and the new version can be stored on the same heap page, we no longer
generate extra index entries for the new version.  Instead, index searches
follow the HOT-chain links to ensure they find the correct tuple version.

In addition, this patch introduces the ability to "prune" dead tuples on a
per-page basis, without having to do a complete VACUUM pass to recover space.
VACUUM is still needed to clean up dead index entries, however.

Pavan Deolasee, with help from a bunch of other people.
2007-09-20 17:56:33 +00:00
Tom Lane d526575f89 Make large sequential scans and VACUUMs work in a limited-size "ring" of
buffers, rather than blowing out the whole shared-buffer arena.  Aside from
avoiding cache spoliation, this fixes the problem that VACUUM formerly tended
to cause a WAL flush for every page it modified, because we had it hacked to
use only a single buffer.  Those flushes will now occur only once per
ring-ful.  The exact ring size, and the threshold for seqscans to switch into
the ring usage pattern, remain under debate; but the infrastructure seems
done.  The key bit of infrastructure is a new optional BufferAccessStrategy
object that can be passed to ReadBuffer operations; this replaces the former
StrategyHintVacuum API.

This patch also changes the buffer usage-count methodology a bit: we now
advance usage_count when first pinning a buffer, rather than when last
unpinning it.  To preserve the behavior that a buffer's lifetime starts to
decrease when it's released, the clock sweep code is modified to not decrement
usage_count of pinned buffers.

Work not done in this commit: teach GiST and GIN indexes to use the vacuum
BufferAccessStrategy for vacuum-driven fetches.

Original patch by Simon, reworked by Heikki and again by Tom.
2007-05-30 20:12:03 +00:00
Alvaro Herrera 90cbc63fd1 Have TRUNCATE advance the affected table's relfrozenxid to RecentXmin, to
avoid a later needless VACUUM for Xid-wraparound purposes.  We can do this
since the table is known to be left empty, so no Xid remains on it.

Per discussion.
2007-05-16 17:28:20 +00:00
Tom Lane fba8113c1b Teach CLUSTER to skip writing WAL if not needed (ie, not using archiving)
--- Simon.
Also, code review and cleanup for the previous COPY-no-WAL patches --- Tom.
2007-03-29 00:15:39 +00:00
Tom Lane e85a01df67 Clean up the representation of special snapshots by including a "method
pointer" in every Snapshot struct.  This allows removal of the case-by-case
tests in HeapTupleSatisfiesVisibility, which should make it a bit faster
(I didn't try any performance tests though).  More importantly, we are no
longer violating portable C practices by assuming that small integers are
distinct from all pointer values, and HeapTupleSatisfiesDirty no longer
has a non-reentrant API involving side-effects on a global variable.

There were a couple of places calling HeapTupleSatisfiesXXX routines
directly rather than through the HeapTupleSatisfiesVisibility macro.
Since these places had to be changed anyway, I chose to make them go
through the macro for uniformity.

Along the way I renamed HeapTupleSatisfiesSnapshot to HeapTupleSatisfiesMVCC
to emphasize that it's only used with MVCC-type snapshots.  I was sorely
tempted to rename HeapTupleSatisfiesVisibility to HeapTupleSatisfiesSnapshot,
but forebore for the moment to avoid confusion and reduce the likelihood that
this patch breaks some of the pending patches.  Might want to reconsider
doing that later.
2007-03-25 19:45:14 +00:00
Bruce Momjian 63c678d17b Fix for COPY-after-truncate feature.
Simon Riggs
2007-03-03 20:08:41 +00:00
Tom Lane 7bddca3450 Fix up foreign-key mechanism so that there is a sound semantic basis for the
equality checks it applies, instead of a random dependence on whatever
operators might be named "=".  The equality operators will now be selected
from the opfamily of the unique index that the FK constraint depends on to
enforce uniqueness of the referenced columns; therefore they are certain to be
consistent with that index's notion of equality.  Among other things this
should fix the problem noted awhile back that pg_dump may fail for foreign-key
constraints on user-defined types when the required operators aren't in the
search path.  This also means that the former warning condition about "foreign
key constraint will require costly sequential scans" is gone: if the
comparison condition isn't indexable then we'll reject the constraint
entirely. All per past discussions.

Along the way, make the RI triggers look into pg_constraint for their
information, instead of using pg_trigger.tgargs; and get rid of the always
error-prone fixed-size string buffers in ri_triggers.c in favor of building up
the RI queries in StringInfo buffers.

initdb forced due to columns added to pg_constraint and pg_trigger.
2007-02-14 01:58:58 +00:00
Bruce Momjian 8b4ff8b6a1 Wording cleanup for error messages. Also change can't -> cannot.
Standard English uses "may", "can", and "might" in different ways:

        may - permission, "You may borrow my rake."

        can - ability, "I can lift that log."

        might - possibility, "It might rain today."

Unfortunately, in conversational English, their use is often mixed, as
in, "You may use this variable to do X", when in fact, "can" is a better
choice.  Similarly, "It may crash" is better stated, "It might crash".
2007-02-01 19:10:30 +00:00
Bruce Momjian ef65f6f7a4 Prevent WAL logging when COPY is done in the same transation that
created it.

Simon Riggs
2007-01-25 02:17:26 +00:00
Tom Lane 4431758229 Support ORDER BY ... NULLS FIRST/LAST, and add ASC/DESC/NULLS FIRST/NULLS LAST
per-column options for btree indexes.  The planner's support for this is still
pretty rudimentary; it does not yet know how to plan mergejoins with
nondefault ordering options.  The documentation is pretty rudimentary, too.
I'll work on improving that stuff later.

Note incompatible change from prior behavior: ORDER BY ... USING will now be
rejected if the operator is not a less-than or greater-than member of some
btree opclass.  This prevents less-than-sane behavior if an operator that
doesn't actually define a proper sort ordering is selected.
2007-01-09 02:14:16 +00:00
Bruce Momjian 29dccf5fe0 Update CVS HEAD for 2007 copyright. Back branches are typically not
back-stamped for this.
2007-01-05 22:20:05 +00:00
Bruce Momjian f99a569a2e pgindent run for 8.2. 2006-10-04 00:30:14 +00:00
Tom Lane e093dcdd28 Add the ability to create indexes 'concurrently', that is, without
blocking concurrent writes to the table.  Greg Stark, with a little help
from Tom Lane.
2006-08-25 04:06:58 +00:00
Tom Lane 09d3670df3 Change the relation_open protocol so that we obtain lock on a relation
(table or index) before trying to open its relcache entry.  This fixes
race conditions in which someone else commits a change to the relation's
catalog entries while we are in process of doing relcache load.  Problems
of that ilk have been reported sporadically for years, but it was not
really practical to fix until recently --- for instance, the recent
addition of WAL-log support for in-place updates helped.

Along the way, remove pg_am.amconcurrent: all AMs are now expected to support
concurrent update.
2006-07-31 20:09:10 +00:00
Tom Lane 6e38e34d64 Change the bootstrap sequence so that toast tables for system catalogs are
created in the bootstrap phase proper, rather than added after-the-fact
by initdb.  This is cleaner than before because it allows us to retire the
undocumented ALTER TABLE ... CREATE TOAST TABLE command, but the real reason
I'm doing it is so that toast tables of shared catalogs will now have
predetermined OIDs.  This will allow a reasonably clean solution to the
problem of locking tables before we load their relcache entries, to appear
in a forthcoming patch.
2006-07-31 01:16:38 +00:00
Alvaro Herrera 92c2ecc130 Modify snapshot definition so that lazy vacuums are ignored by other
vacuums.  This allows a OLTP-like system with big tables to continue
regular vacuuming on small-but-frequently-updated tables while the
big tables are being vacuumed.

Original patch from Hannu Krossing, rewritten by Tom Lane and updated
by me.
2006-07-30 02:07:18 +00:00
Bruce Momjian a22d76d96a Allow include files to compile own their own.
Strip unused include files out unused include files, and add needed
includes to C files.

The next step is to remove unused include files in C files.
2006-07-13 16:49:20 +00:00
Tom Lane b7b78d24f7 Code review for FILLFACTOR patch. Change WITH grammar as per earlier
discussion (including making def_arg allow reserved words), add missed
opt_definition for UNIQUE case.  Put the reloptions support code in a less
random place (I chose to make a new file access/common/reloptions.c).
Eliminate header inclusion creep.  Make the index options functions safely
user-callable (seems like client apps might like to be able to test validity
of options before trying to make an index).  Reduce overhead for normal case
with no options by allowing rd_options to be NULL.  Fix some unmaintainably
klugy code, including getting rid of Natts_pg_class_fixed at long last.
Some stylistic cleanup too, and pay attention to keeping comments in sync
with code.

Documentation still needs work, though I did fix the omissions in
catalogs.sgml and indexam.sgml.
2006-07-03 22:45:41 +00:00
Bruce Momjian 277807bd9e Add FILLFACTOR to CREATE INDEX.
ITAGAKI Takahiro
2006-07-02 02:23:23 +00:00
Tom Lane 3fdeb189e9 Clean up code associated with updating pg_class statistics columns
(relpages/reltuples).  To do this, create formal support in heapam.c for
"overwrite" tuple updates (including xlog replay capability) and use that
instead of the ad-hoc overwrites we'd been using in VACUUM and CREATE INDEX.
Take the responsibility for updating stats during CREATE INDEX out of the
individual index AMs, and do it where it belongs, in catalog/index.c.  Aside
from being more modular, this avoids having to update the same tuple twice in
some paths through CREATE INDEX.  It's probably not measurably faster, but
for sure it's a lot cleaner than before.
2006-05-10 23:18:39 +00:00
Tom Lane a8b8f4db23 Clean up WAL/buffer interactions as per my recent proposal. Get rid of the
misleadingly-named WriteBuffer routine, and instead require routines that
change buffer pages to call MarkBufferDirty (which does exactly what it says).
We also require that they do so before calling XLogInsert; this takes care of
the synchronization requirement documented in SyncOneBuffer.  Note that
because bufmgr takes the buffer content lock (in shared mode) while writing
out any buffer, it doesn't matter whether MarkBufferDirty is executed before
the buffer content change is complete, so long as the content change is
completed before releasing exclusive lock on the buffer.  So it's OK to set
the dirtybit before we fill in the LSN.
This eliminates the former kluge of needing to set the dirtybit in LockBuffer.
Aside from making the code more transparent, we can also add some new
debugging assertions, in particular that the caller of MarkBufferDirty must
hold the buffer content lock, not merely a pin.
2006-03-31 23:32:07 +00:00
Tom Lane 4e7d10c7cd Comments in IndexBuildHeapScan describe the indexing of recently-dead
tuples as needed "to keep VACUUM from complaining", but actually there is
a more compelling reason to do it: failure to do so violates MVCC semantics.
This is because a pre-existing serializable transaction might try to use
the index after we finish (re)building it, and it might fail to find tuples
it should be able to see.  We got this mostly right, but not in the case
of partial indexes: the code mistakenly discarded recently-dead tuples for
partial indexes.  Fix that, and adjust the comments.
2006-03-24 23:02:17 +00:00
Bruce Momjian f2f5b05655 Update copyright for 2006. Update scripts. 2006-03-05 15:59:11 +00:00
Bruce Momjian 436a2956d8 Re-run pgindent, fixing a problem where comment lines after a blank
comment line where output as too long, and update typedefs for /lib
directory.  Also fix case where identifiers were used as variable names
in the backend, but as typedefs in ecpg (favor the backend for
indenting).

Backpatch to 8.1.X.
2005-11-22 18:17:34 +00:00
Bruce Momjian 1dc3498251 Standard pgindent run for 8.1. 2005-10-15 02:49:52 +00:00
Tom Lane f26b91761b Arrange for indexes and toast tables to inherit their ownership from
the parent table, even if the command that creates them is executed by
someone else (such as a superuser or a member of the owning role).
Per gripe from Michael Fuhr.
2005-08-26 03:08:15 +00:00
Tom Lane 721e53785d Solve the problem of OID collisions by probing for duplicate OIDs
whenever we generate a new OID.  This prevents occasional duplicate-OID
errors that can otherwise occur once the OID counter has wrapped around.
Duplicate relfilenode values are also checked for when creating new
physical files.  Per my recent proposal.
2005-08-12 01:36:05 +00:00
Tom Lane 3acca18d28 Fix ancient memory leak in index_create(): RelationInitIndexAccessInfo
was being called twice in normal operation, leading to a leak of one set
of relcache subsidiary info.  Per report from Jeff Gold.
2005-06-25 16:53:49 +00:00
Neil Conway 9de97c5531 Trivial code clarity improvement to UpdateStats(); no functional change. 2005-06-20 02:07:47 +00:00
Tom Lane ee3b71f6bc Split the shared-memory array of PGPROC pointers out of the sinval
communication structure, and make it its own module with its own lock.
This should reduce contention at least a little, and it definitely makes
the code seem cleaner.  Per my recent proposal.
2005-05-19 21:35:48 +00:00
Neil Conway 3140437495 This patch refactors away some duplicated code in the index AM build
methods: they all invoke UpdateStats() since they have computed the
number of heap tuples, so I created a function in catalog/index.c that
each AM now calls.
2005-05-11 06:24:55 +00:00
Tom Lane 278bd0cc22 For some reason access/tupmacs.h has been #including utils/memutils.h,
which is neither needed by nor related to that header.  Remove the bogus
inclusion and instead include the header in those C files that actually
need it.  Also fix unnecessary inclusions and bad inclusion order in
tsearch2 files.
2005-05-06 17:24:55 +00:00
Tom Lane bedb78d386 Implement sharable row-level locks, and use them for foreign key references
to eliminate unnecessary deadlocks.  This commit adds SELECT ... FOR SHARE
paralleling SELECT ... FOR UPDATE.  The implementation uses a new SLRU
data structure (managed much like pg_subtrans) to represent multiple-
transaction-ID sets.  When more than one transaction is holding a shared
lock on a particular row, we create a MultiXactId representing that set
of transactions and store its ID in the row's XMAX.  This scheme allows
an effectively unlimited number of row locks, just as we did before,
while not costing any extra overhead except when a shared lock actually
has to be shared.   Still TODO: use the regular lock manager to control
the grant order when multiple backends are waiting for a row lock.

Alvaro Herrera and Tom Lane.
2005-04-28 21:47:18 +00:00
Tom Lane 162bd08b3f Completion of project to use fixed OIDs for all system catalogs and
indexes.  Replace all heap_openr and index_openr calls by heap_open
and index_open.  Remove runtime lookups of catalog OID numbers in
various places.  Remove relcache's support for looking up system
catalogs by name.  Bulky but mostly very boring patch ...
2005-04-14 20:03:27 +00:00
Tom Lane 7c13781ee7 First phase of project to use fixed OIDs for all system catalogs and
indexes.  Extend the macros in include/catalog/*.h to carry the info
about hand-assigned OIDs, and adjust the genbki script and bootstrap
code to make the relations actually get those OIDs.  Remove the small
number of RelOid_pg_foo macros that we had in favor of a complete
set named like the catname.h and indexing.h macros.  Next phase will
get rid of internal use of names for looking up catalogs and indexes;
but this completes the changes forcing an initdb, so it looks like a
good place to commit.
Along the way, I made the shared relations (pg_database etc) not be
'bootstrap' relations any more, so as to reduce the number of hardwired
entries and simplify changing those relations in future.  I'm not
sure whether they ever really needed to be handled as bootstrap
relations, but it seems to work fine to not do so now.
2005-04-14 01:38:22 +00:00
Tom Lane 70c9763d48 Convert oidvector and int2vector into variable-length arrays. This
change saves a great deal of space in pg_proc and its primary index,
and it eliminates the former requirement that INDEX_MAX_KEYS and
FUNC_MAX_ARGS have the same value.  INDEX_MAX_KEYS is still embedded
in the on-disk representation (because it affects index tuple header
size), but FUNC_MAX_ARGS is not.  I believe it would now be possible
to increase FUNC_MAX_ARGS at little cost, but haven't experimented yet.
There are still a lot of vestigial references to FUNC_MAX_ARGS, which
I will clean up in a separate pass.  However, getting rid of it
altogether would require changing the FunctionCallInfoData struct,
and I'm not sure I want to buy into that.
2005-03-29 00:17:27 +00:00
Tom Lane ee4ddac137 Convert index-related tuple handling routines from char 'n'/' ' to bool
convention for isnull flags.  Also, remove the useless InsertIndexResult
return struct from index AM aminsert calls --- there is no reason for
the caller to know where in the index the tuple was inserted, and we
were wasting a palloc cycle per insert to deliver this uninteresting
value (plus nontrivial complexity in some AMs).
I forced initdb because of the change in the signature of the aminsert
routines, even though nothing really looks at those pg_proc entries...
2005-03-21 01:24:04 +00:00
Tom Lane 354049c709 Remove unnecessary calls of FlushRelationBuffers: there is no need
to write out data that we are about to tell the filesystem to drop.
smgr_internal_unlink already had a DropRelFileNodeBuffers call to
get rid of dead buffers without a write after it's no longer possible
to roll back the deleting transaction.  Adding a similar call in
smgrtruncate simplifies callers and makes the overall division of
labor clearer.  This patch removes the former behavior that VACUUM
would write all dirty buffers of a relation unconditionally.
2005-03-20 22:00:54 +00:00
Tom Lane f97aebd162 Revise TupleTableSlot code to avoid unnecessary construction and disassembly
of tuples when passing data up through multiple plan nodes.  A slot can now
hold either a normal "physical" HeapTuple, or a "virtual" tuple consisting
of Datum/isnull arrays.  Upper plan levels can usually just copy the Datum
arrays, avoiding heap_formtuple() and possible subsequent nocachegetattr()
calls to extract the data again.  This work extends Atsushi Ogawa's earlier
patch, which provided the key idea of adding Datum arrays to TupleTableSlots.
(I believe however that something like this was foreseen way back in Berkeley
days --- see the old comment on ExecProject.)  A test case involving many
levels of join of fairly wide tables (about 80 columns altogether) showed
about 3x overall speedup, though simple queries will probably not be
helped very much.

I have also duplicated some code in heaptuple.c in order to provide versions
of heap_formtuple and friends that use "bool" arrays to indicate null
attributes, instead of the old convention of "char" arrays containing either
'n' or ' '.  This provides a better match to the convention used by
ExecEvalExpr.  While I have not made a concerted effort to get rid of uses
of the old routines, I think they should be deprecated and eventually removed.
2005-03-16 21:38:10 +00:00
Tom Lane a52b4fb131 Adjust creation/destruction of TupleDesc data structure to reduce the
number of palloc calls.  This has a salutory impact on plpgsql operations
with record variables (which create and destroy tupdescs constantly)
and probably helps a bit in some other cases too.
2005-03-07 04:42:17 +00:00
Tom Lane 5d5087363d Replace the BufMgrLock with separate locks on the lookup hashtable and
the freelist, plus per-buffer spinlocks that protect access to individual
shared buffer headers.  This requires abandoning a global freelist (since
the freelist is a global contention point), which shoots down ARC and 2Q
as well as plain LRU management.  Adopt a clock sweep algorithm instead.
Preliminary results show substantial improvement in multi-backend situations.
2005-03-04 20:21:07 +00:00
Tom Lane 0ce4d56924 Phase 1 of fix for 'SMgrRelation hashtable corrupted' problem. This
is the minimum required fix.  I want to look next at taking advantage of
it by simplifying the message semantics in the shared inval message queue,
but that part can be held over for 8.1 if it turns out too ugly.
2005-01-10 20:02:24 +00:00
PostgreSQL Daemon 2ff501590b Tag appropriate files for rc3
Also performed an initial run through of upgrading our Copyright date to
extend to 2005 ... first run here was very simple ... change everything
where: grep 1996-2004 && the word 'Copyright' ... scanned through the
generated list with 'less' first, and after, to make sure that I only
picked up the right entries ...
2004-12-31 22:04:05 +00:00
Tom Lane 5374d097de Change planner to use the current true disk file size as its estimate of
a relation's number of blocks, rather than the possibly-obsolete value
in pg_class.relpages.  Scale the value in pg_class.reltuples correspondingly
to arrive at a hopefully more accurate number of rows.  When pg_class
contains 0/0, estimate a tuple width from the column datatypes and divide
that into current file size to estimate number of rows.  This improved
methodology allows us to jettison the ancient hacks that put bogus default
values into pg_class when a table is first created.  Also, per a suggestion
from Simon, make VACUUM (but not VACUUM FULL or ANALYZE) adjust the value
it puts into pg_class.reltuples to try to represent the mean tuple density
instead of the minimal density that actually prevails just after VACUUM.
These changes alter the plans selected for certain regression tests, so
update the expected files accordingly.  (I removed join_1.out because
it's not clear if it still applies; we can add back any variant versions
as they are shown to be needed.)
2004-12-01 19:00:56 +00:00
Tom Lane 9ffc8ed58b Repair possible failure to update hint bits back to disk, per
http://archives.postgresql.org/pgsql-hackers/2004-10/msg00464.php.
This fix is intended to be permanent: it moves the responsibility for
calling SetBufferCommitInfoNeedsSave() into the tqual.c routines,
eliminating the requirement for callers to test whether t_infomask changed.
Also, tighten validity checking on buffer IDs in bufmgr.c --- several
routines were paranoid about out-of-range shared buffer numbers but not
about out-of-range local ones, which seems a tad pointless.
2004-10-15 22:40:29 +00:00
Tom Lane b1f8a37aa7 Fallout from changing index locking rules: we can reduce the strength
of locking used by REINDEX.  REINDEX needs only ShareLock on the parent
table, same as CREATE INDEX, plus an exclusive lock on the specific index
being processed.
2004-10-01 17:11:50 +00:00