Commit Graph

462 Commits

Author SHA1 Message Date
Bruce Momjian 8ad3965a11 Improve xid wraparound message (the server isn't really shut down, just
not accepting queries).

         errmsg("database is not accepting queries to avoid
	 wraparound data loss in database \"%s\"",
         errhint("Stop the postmaster and use a standalone
	 backend to VACUUM database \"%s\".",
2005-08-22 16:59:47 +00:00
Tom Lane d0096a41fa Fix some inconsistent choices of datatypes in xlog.c. Make buffer
indexes all be int, rather than variously int, uint16 and uint32;
add some casts where necessary to support large buffer arrays.
2005-08-22 00:41:28 +00:00
Tom Lane f39f6b500f Seems that the childXids list would be better based on Oid lists than
integer lists.
2005-08-20 23:45:08 +00:00
Tom Lane 0007490e09 Convert the arithmetic for shared memory size calculation from 'int'
to 'Size' (that is, size_t), and install overflow detection checks in it.
This allows us to remove the former arbitrary restrictions on NBuffers
etc.  It won't make any difference in a 32-bit machine, but in a 64-bit
machine you could theoretically have terabytes of shared buffers.
(How efficiently we could manage 'em remains to be seen.)  Similarly,
num_temp_buffers, work_mem, and maintenance_work_mem can be set above
2Gb on a 64-bit machine.  Original patch from Koichi Suzuki, additional
work by moi.
2005-08-20 23:26:37 +00:00
Tatsuo Ishii ba2fc7eb4b Make GetMultiXactIdMembers() a public function. 2005-08-20 01:29:27 +00:00
Tom Lane f8d0a82bf9 Avoid an Assert failure if OuterUserId hasn't been set yet during
AbortTransaction.  This can happen if a backend's InitPostgres transaction
fails (eg, because the given username is invalid).  Per Alvaro.
2005-08-17 22:14:34 +00:00
Tom Lane 721e53785d Solve the problem of OID collisions by probing for duplicate OIDs
whenever we generate a new OID.  This prevents occasional duplicate-OID
errors that can otherwise occur once the OID counter has wrapped around.
Duplicate relfilenode values are also checked for when creating new
physical files.  Per my recent proposal.
2005-08-12 01:36:05 +00:00
Tom Lane d90c531188 Autovacuum loose end mop-up. Provide autovacuum-specific vacuum cost
delay and limit, both as global GUCs and as table-specific entries in
pg_autovacuum.  stats_reset_on_server_start is now OFF by default,
but a reset is forced if we did WAL replay.  XID-wrap vacuums do not
ANALYZE, but do FREEZE if it's a template database.  Alvaro Herrera
2005-08-11 21:11:50 +00:00
Tom Lane 4568e0f791 Modify AtEOXact_CatCache and AtEOXact_RelationCache to assume that the
ResourceOwner mechanism already released all reference counts for the
cache entries; therefore, we do not need to scan the catcache or relcache
at transaction end, unless we want to do it as a debugging crosscheck.
Do the crosscheck only in Assert mode.  This is the same logic we had
previously installed in AtEOXact_Buffers to avoid overhead with large
numbers of shared buffers.  I thought it'd be a good idea to do it here
too, in view of Kari Lavikka's recent report showing a real-world case
where AtEOXact_CatCache is taking a significant fraction of runtime.
2005-08-08 19:17:23 +00:00
Tom Lane 2a4fad1a0e Add NOWAIT option to SELECT FOR UPDATE/SHARE.
Original patch by Hans-Juergen Schoenig, revisions by Karel Zak
and Tom Lane.
2005-08-01 20:31:16 +00:00
Tom Lane d42cf5a42a Add per-user and per-database connection limit options.
This patch also includes preliminary update of pg_dumpall for roles.
Petr Jelinek, with review by Bruce Momjian and Tom Lane.
2005-07-31 17:19:22 +00:00
Bruce Momjian 5b0bfec414 Fix compile for no O_SYNC, but introduced with O_DIRECT. 2005-07-30 14:15:44 +00:00
Tom Lane 5d5f1a79e6 Clean up a number of autovacuum loose ends. Make the stats collector
track shared relations in a separate hashtable, so that operations done
from different databases are counted correctly.  Add proper support for
anti-XID-wraparound vacuuming, even in databases that are never connected
to and so have no stats entries.  Miscellaneous other bug fixes.
Alvaro Herrera, some additional fixes by Tom Lane.
2005-07-29 19:30:09 +00:00
Bruce Momjian c6b1724c67 Update O_DIRECT comment. 2005-07-29 03:25:53 +00:00
Bruce Momjian c34bb00581 Use O_DIRECT if available when using O_SYNC for wal_sync_method.
Also, write multiple WAL buffers out in one write() operation.

ITAGAKI Takahiro

---------------------------------------------------------------------------

> If we disable writeback-cache and use open_sync, the per-page writing
> behavior in WAL module will show up as bad result. O_DIRECT is similar
> to O_DSYNC (at least on linux), so that the benefit of it will disappear
> behind the slow disk revolution.
>
> In the current source, WAL is written as:
>     for (i = 0; i < N; i++) { write(&buffers[i], BLCKSZ); }
> Is this intentional? Can we rewrite it as follows?
>    write(&buffers[0], N * BLCKSZ);
>
> In order to achieve it, I wrote a 'gather-write' patch (xlog.gw.diff).
> Aside from this, I'll also send the fixed direct io patch (xlog.dio.diff).
> These two patches are independent, so they can be applied either or both.
>
>
> I tested them on my machine and the results as follows. It shows that
> direct-io and gather-write is the best choice when writeback-cache is off.
> Are these two patches worth trying if they are used together?
>
>
>             | writeback | fsync= | fdata | open_ | fsync_ | open_
> patch       | cache     |  false |  sync |  sync | direct | direct
> ------------+-----------+--------+-------+-------+--------+---------
> direct io   | off       |  124.2 | 105.7 |  48.3 |   48.3 |  48.2
> direct io   | on        |  129.1 | 112.3 | 114.1 |  142.9 | 144.5
> gather-write| off       |  124.3 | 108.7 | 105.4 |  (N/A) | (N/A)
> both        | off       |  131.5 | 115.5 | 114.4 |  145.4 | 145.2
>
> - 20runs * pgbench -s 100 -c 50 -t 200
>    - with tuning (wal_buffers=64, commit_delay=500, checkpoint_segments=8)
> - using 2 ATA disks:
>    - hda(reiserfs) includes system and wal.
>    - hdc(jfs) includes database files. writeback-cache is always on.
>
> ---
> ITAGAKI Takahiro
2005-07-29 03:22:33 +00:00
Tom Lane e5d6b91220 Add SET ROLE. This is a partial commit of Stephen Frost's recent patch;
I'm still working on the has_role function and information_schema changes.
2005-07-25 22:12:34 +00:00
Bruce Momjian 9af9d674c6 Remove unintended code addition. 2005-07-23 15:31:16 +00:00
Bruce Momjian 4098c8867d Macro alignment cleanup. 2005-07-23 15:29:47 +00:00
Tom Lane f2bf2d2dc5 Fix a couple of bogus comments, per Alvaro. 2005-07-13 22:46:09 +00:00
Tom Lane d7207cfc6b Even though I'd like to see full_page_writes go away before 8.1,
a minimum requirement is that it not completely break the system
meanwhile.  Put the test in the right place.
2005-07-08 04:07:26 +00:00
Bruce Momjian 326a7a0788 Add GUC full_page_writes to control writing full pages to WAL. 2005-07-05 23:18:10 +00:00
Tom Lane eb5949d190 Arrange for the postmaster (and standalone backends, initdb, etc) to
chdir into PGDATA and subsequently use relative paths instead of absolute
paths to access all files under PGDATA.  This seems to give a small
performance improvement, and it should make the system more robust
against naive DBAs doing things like moving a database directory that
has a live postmaster in it.  Per recent discussion.
2005-07-04 04:51:52 +00:00
Tom Lane 401de9c8be Improve the checkpoint signaling mechanism so that the bgwriter can tell
the difference between checkpoints forced due to WAL segment consumption
and checkpoints forced for other reasons (such as CREATE DATABASE).  Avoid
generating 'checkpoints are occurring too frequently' messages when the
checkpoint wasn't caused by WAL segment consumption.  Per gripe from
Chris K-L.
2005-06-30 00:00:52 +00:00
Tom Lane b5f7cff84f Clean up the rather historically encumbered interface to now() and
current time: provide a GetCurrentTimestamp() function that returns
current time in the form of a TimestampTz, instead of separate time_t
and microseconds fields.  This is what all the callers really want
anyway, and it eliminates low-level dependencies on AbsoluteTime,
which is a deprecated datatype that will have to disappear eventually.
2005-06-29 22:51:57 +00:00
Tom Lane 7762619e95 Replace pg_shadow and pg_group by new role-capable catalogs pg_authid
and pg_auth_members.  There are still many loose ends to finish in this
patch (no documentation, no regression tests, no pg_dump support for
instance).  But I'm going to commit it now anyway so that Alvaro can
make some progress on shared dependencies.  The catalog changes should
be pretty much done.
2005-06-28 05:09:14 +00:00
Tom Lane fc654583ab Need #include <time.h> on some platforms. 2005-06-19 22:34:56 +00:00
Tom Lane 3f749924f8 Simplify uses of readdir() by creating a function ReadDir() that
includes error checking and an appropriate ereport(ERROR) message.
This gets rid of rather tedious and error-prone manipulation of errno,
as well as a Windows-specific bug workaround, at more than a dozen
call sites.  After an idea in a recent patch by Heikki Linnakangas.
2005-06-19 21:34:03 +00:00
Tom Lane e26b0abda3 Arrange to fsync two-phase-commit state files only during checkpoints;
given reasonably short lifespans for prepared transactions, this should
mean that only a small minority of state files ever need to be fsynced
at all.  Per discussion with Heikki Linnakangas.
2005-06-19 20:00:39 +00:00
Tom Lane a8d1075f27 Add a time-of-preparation column to the pg_prepared_xacts view, per an
old suggestion by Oliver Jowett.  Also, add a transaction column to the
pg_locks view to show the xid of each transaction holding or awaiting
locks; this allows prepared transactions to be properly associated with
the locks they own.  There was already a column named 'transaction',
and I chose to rename it to 'transactionid' --- since this column is
new in the current devel cycle there should be no backwards compatibility
issue to worry about.
2005-06-18 19:33:42 +00:00
Tom Lane 66b098492e Dept. of second thoughts: regular COMMIT deletes deletable files before
releasing locks, so COMMIT PREPARED should too.
2005-06-18 05:21:09 +00:00
Tom Lane d0a89683a3 Two-phase commit. Original patch by Heikki Linnakangas, with additional
hacking by Alvaro Herrera and Tom Lane.
2005-06-17 22:32:51 +00:00
Bruce Momjian f4d907ca85 Remove old *.backup files when we do pg_stop_backup(). This
prevents a large number of *.backup files from existing in pg_xlog/
2005-06-15 01:36:08 +00:00
Teodor Sigaev 37c839365c WAL for GiST. It work for online backup and so on, but on
recovery after crash (power loss etc) it may say that it can't restore
index and index should be reindexed.

Some refactoring code.
2005-06-14 11:45:14 +00:00
Bruce Momjian 51746c4549 Free buffer allocated via malloc (process is short-lived, but fix it anyway). 2005-06-09 22:36:27 +00:00
Tom Lane f5b2f60bd1 Change WAL-logging scheme for multixacts to be more like regular
transaction IDs, rather than like subtrans; in particular, the information
now survives a database restart.  Per previous discussion, this is
essential for PITR log shipping and for 2PC.
2005-06-08 15:50:28 +00:00
Tom Lane ee7ac7b11e Modify XLogInsert API to make callers specify whether pages to be backed
up have the standard layout with unused space between pd_lower and pd_upper.
When this is set, XLogInsert will omit the unused space without bothering
to scan it to see if it's zero.  That saves time in XLogInsert, and also
allows reversion of my earlier patch to make PageRepairFragmentation et al
explicitly re-zero freed space.  Per suggestion by Heikki Linnakangas.
2005-06-06 20:22:58 +00:00
Tom Lane 4c8495a1f2 Remove the mostly-stubbed-out-anyway support routines for WAL UNDO.
That code is never going to be used in the foreseeable future, and
where it's more than a stub it's making the redo routines harder to
read.
2005-06-06 17:01:25 +00:00
Tom Lane 21fda22ec4 Change CRCs in WAL records from 64bit to 32bit for performance reasons.
Instead of a separate CRC on each backup block, include backup blocks
in their parent WAL record's CRC; this is important to ensure that the
backup block really goes with the WAL record, ie there was not a page
tear right at the start of the backup block.  Implement a simple form
of compression of backup blocks: drop any run of zeroes starting at
pd_lower, so as not to store the unused 'hole' that commonly exists in
PG heap and index pages.  Tweak PageRepairFragmentation and related
routines to ensure they keep the unused space zeroed, so that the above
compression method remains effective.  All per recent discussions.
2005-06-02 05:55:29 +00:00
Tom Lane a91fa39028 Add test to WAL replay to verify that xl_prev points back to the previous
WAL record; this is necessary to be sure we recognize stale WAL records
when a WAL page was only partially written during a system crash.
2005-05-31 19:10:28 +00:00
Tom Lane e92a88272e Modify hash_search() API to prevent future occurrences of the error
spotted by Qingqing Zhou.  The HASH_ENTER action now automatically
fails with elog(ERROR) on out-of-memory --- which incidentally lets
us eliminate duplicate error checks in quite a bunch of places.  If
you really need the old return-NULL-on-out-of-memory behavior, you
can ask for HASH_ENTER_NULL.  But there is now an Assert in that path
checking that you aren't hoping to get that behavior in a palloc-based
hash table.
Along the way, remove the old HASH_FIND_SAVE/HASH_REMOVE_SAVED actions,
which were not being used anywhere anymore, and were surely too ugly
and unsafe to want to see revived again.
2005-05-29 04:23:07 +00:00
Bruce Momjian 6dc7760ac3 Add support for wal_fsync_writethrough for Darwin, and restructure the
code to better handle writethrough.

Chris Campbell
2005-05-20 14:53:26 +00:00
Tom Lane ff0c143a3b Make a comment pgindent-proof, per suggestion from Alvaro. 2005-05-19 23:58:51 +00:00
Tom Lane ee3b71f6bc Split the shared-memory array of PGPROC pointers out of the sinval
communication structure, and make it its own module with its own lock.
This should reduce contention at least a little, and it definitely makes
the code seem cleaner.  Per my recent proposal.
2005-05-19 21:35:48 +00:00
Neil Conway c891e05f26 Cleanup GiST header files. Since GiST extensions are often written as
external projects, we should be careful about what parts of the GiST
API are considered implementation details, and which are part of the
public API. Therefore, I've moved internal-only declarations into
gist_private.h -- future backward-incompatible changes to gist.h should
be made with care, to avoid needlessly breaking external GiST extensions.

Also did some related header cleanup: remove some unnecessary #includes
from gist.h, and remove some unused definitions: isAttByVal(), _gistdump(),
and GISTNStrategies.
2005-05-17 03:34:18 +00:00
Bruce Momjian 35e1651508 Back out check for unreferenced files.
Heikki Linnakangas
2005-05-10 22:27:30 +00:00
Tom Lane 4cd4ed0cc2 Fix case in which a debug printout would print already-pfreed data. 2005-05-07 18:14:25 +00:00
Tom Lane 126eaef651 Clean up MultiXactIdExpand's API by separating out the case where we
are creating a new MultiXactId from two regular XIDs.  The original
coding was unnecessarily complicated and didn't save any code anyway.
2005-05-03 19:42:41 +00:00
Bruce Momjian 76668e6eb4 Check the file system on postmaster startup and report any unreferenced
files in the server log.

Heikki Linnakangas
2005-05-02 18:26:54 +00:00
Tom Lane bedb78d386 Implement sharable row-level locks, and use them for foreign key references
to eliminate unnecessary deadlocks.  This commit adds SELECT ... FOR SHARE
paralleling SELECT ... FOR UPDATE.  The implementation uses a new SLRU
data structure (managed much like pg_subtrans) to represent multiple-
transaction-ID sets.  When more than one transaction is holding a shared
lock on a particular row, we create a MultiXactId representing that set
of transactions and store its ID in the row's XMAX.  This scheme allows
an effectively unlimited number of row locks, just as we did before,
while not costing any extra overhead except when a shared lock actually
has to be shared.   Still TODO: use the regular lock manager to control
the grant order when multiple backends are waiting for a row lock.

Alvaro Herrera and Tom Lane.
2005-04-28 21:47:18 +00:00
Tom Lane 19d127548c Add comment about checkpoint panic behavior during shutdown, per
suggestion from Qingqing Zhou.
2005-04-23 18:49:54 +00:00