identify long-running transactions. Since we already need to record
the transaction-start time (e.g. for now()), we don't need any
additional system calls to report this information.
Catversion bumped, initdb required.
StartupXLOG and ShutdownXLOG no longer need to be critical sections, because
in all contexts where they are invoked, elog(ERROR) would be translated to
elog(FATAL) anyway. (One change in bgwriter.c is needed to make this true:
set ExitOnAnyError before trying to exit. This is a good fix anyway since
the existing code would have gone into an infinite loop on elog(ERROR) during
shutdown.) That avoids a misleading report of PANIC during semi-orderly
failures. Modify the postmaster to include the startup process in the set of
processes that get SIGTERM when a fast shutdown is requested, and also fix it
to not try to restart the bgwriter if the bgwriter fails while trying to write
the shutdown checkpoint. Net result is that "pg_ctl stop -m fast" does
something reasonable for a system in warm standby mode, and so should Unix
system shutdown (ie, universal SIGTERM). Per gripe from Stephen Harris and
some corner-case testing of my own.
remove page on next level linked from next inner page, ginScanToDelete()
wrongly sets parent page. Bug reveals when many item pointers from index
was deleted ( several hundred thousands).
Bug is discovered by hubert depesz lubaczewski <depesz@gmail.com>
Suppose, we need rc2 before release...
HeapTuple that is no longer allocated as a single palloc() block; if
used carelessly, this might result in a subsequent memory leak after
heap_freetuple().
AbortTransaction, which would lead to recursion and eventual PANIC exit
as illustrated in recent report from Jeff Davis. First, in xact.c create
a special dedicated memory context for AbortTransaction to run in. This
solves the problem as long as AbortTransaction doesn't need more than 32K
(or whatever other size we create the context with). But in corner cases
it might. Second, in trigger.c arrange to keep pending after-trigger event
records in separate contexts that can be freed near the beginning of
AbortTransaction, rather than having them persist until CleanupTransaction
as before. Third, in portalmem.c arrange to free executor state data
earlier as well. These two changes should result in backing off the
out-of-memory condition before AbortTransaction needs any significant
amount of memory, at least in typical cases such as memory overrun due
to too many trigger events or too big an executor hash table. And all
the same for subtransaction abort too, of course.
Windows), arrange for each postmaster child process to be its own process
group leader, and deliver signals SIGINT, SIGTERM, SIGQUIT to the whole
process group not only the direct child process. This provides saner behavior
for archive and recovery scripts; in particular, it's possible to shut down a
warm-standby recovery server using "pg_ctl stop -m immediate", since delivery
of SIGQUIT to the startup subprocess will result in killing the waiting
recovery_command. Also, this makes Query Cancel and statement_timeout apply
to scripts being run from backends via system(). (There is no support in the
core backend for that, but it's widely done using untrusted PLs.) Per gripe
from Stephen Harris and subsequent discussion.
preference for filling pages out-of-order tends to confuse the sanity checks
in md.c, as per report from Balazs Nagy in bug #2737. The fix is to ensure
that the smgr-level code always has the same idea of the logical EOF as the
hash index code does, by using ReadBuffer(P_NEW) where we are adding a single
page to the end of the index, and using smgrextend() to reserve a large batch
of pages when creating a new splitpoint. The patch is a bit ugly because it
avoids making any changes in md.c, which seems the most prudent approach for a
backpatchable beta-period fix. After 8.3 development opens, I'll take a look
at a cleaner but more invasive patch, in particular getting rid of the now
unnecessary hack to allow reading beyond EOF in mdread().
Backpatch as far as 7.4. The bug likely exists in 7.3 as well, but because
of the magnitude of the 7.3-to-7.4 changes in hash, the later-version patch
doesn't even begin to apply. Given the other known bugs in the 7.3-era hash
code, it does not seem worth trying to develop a separate patch for 7.3.
cases where we already hold the desired lock "indirectly", either via
membership in a MultiXact or because the lock was originally taken by a
different subtransaction of the current transaction. These cases must be
accounted for to avoid needless deadlocks and/or inappropriate replacement of
an exclusive lock with a shared lock. Per report from Clarence Gardner and
subsequent investigation.
30 seconds instead of retrying forever. Also modify xlog.c so that if
it fails to rename an old xlog segment up to a future slot, it will
unlink the segment instead. Per discussion of bug #2712, in which it
became apparent that Windows can handle unlinking a file that's being
held open, but not renaming it.
in PITR scenarios. We now WAL-log the replacement of old XIDs with
FrozenTransactionId, so that such replacement is guaranteed to propagate to
PITR slave databases. Also, rather than relying on hint-bit updates to be
preserved, pg_clog is not truncated until all instances of an XID are known to
have been replaced by FrozenTransactionId. Add new GUC variables and
pg_autovacuum columns to allow management of the freezing policy, so that
users can trade off the size of pg_clog against the amount of freezing work
done. Revise the already-existing code that forces autovacuum of tables
approaching the wraparound point to make it more bulletproof; also, revise the
autovacuum logic so that anti-wraparound vacuuming is done per-table rather
than per-database. initdb forced because of changes in pg_class, pg_database,
and pg_autovacuum catalogs. Heikki Linnakangas, Simon Riggs, and Tom Lane.
deletion code to avoid the case where an upper-level btree page remains "half
dead" for a significant period of time, and to block insertions into a key
range that is in process of being re-assigned to the right sibling of the
deleted page's parent. This prevents the scenario reported by Ed L. wherein
index keys could become out-of-order in the grandparent index level.
Since this is a moderately invasive fix, I'm applying it only to HEAD.
The bug exists back to 7.4, but the back branches will get a different patch.
that conflict with the OID that we want to use for the new database.
This avoids the risk of trying to remove files that maybe we shouldn't
remove. Per gripe from Jon Lapham and subsequent discussion of 27-Sep.
remaining functions, simplify pglz_compress's API to not require a useless
data copy when compression fails. Also add a check in pglz_decompress that
the expected amount of data was decompressed.
static variables. This avoids any risk of potential non-reentrancy,
and in particular offers a much cleaner workaround for the Intel compiler
bug that was affecting ginutil.c.
even when a single relation requires more than max_fsm_pages pages. Also,
make VACUUM emit a warning in this case, since it likely means that VACUUM
FULL or other drastic corrective measure is needed. Per reports from Jeff
Frost and others of unexpected changes in the claimed max_fsm_pages need.
we probably should make them work reliably for all arrays. Fix code
to handle NULLs and multidimensional arrays, move it into arrayfuncs.c.
GIN is still restricted to indexing arrays with no null elements, however.
agreed these symbols are less easily confused. I made new pg_operator
entries (with new OIDs) for the old names, so as to provide backward
compatibility while making it pretty easy to remove the old names in
some future release cycle. This commit only touches the core datatypes,
contrib will be fixed separately.
PGPROC array into snapshots, and use this information to avoid visits
to pg_subtrans in HeapTupleSatisfiesSnapshot. This appears to solve
the pg_subtrans-related context swap storm problem that's been reported
by several people for 8.1. While at it, modify GetSnapshotData to not
take an exclusive lock on ProcArrayLock, as closer analysis shows that
shared lock is always sufficient.
Itagaki Takahiro and Tom Lane
on the same index page; we can avoid data copying as well as buffer refcount
manipulations in this common case. Makes for a small but noticeable
improvement in mergejoin speed.
Heikki Linnakangas
of the transaction ID counter. Nothing is done with the epoch except to
store it in checkpoint records, but this provides a foundation with which
add-on code can pretend that XIDs never wrap around. This is a severely
trimmed and rewritten version of the xxid patch submitted by Marko Kreen.
Per discussion, the epoch counter seems the only part of xxid that really
needs to be in the core server.
the rel, it's easy to get rid of the narrow race-condition window that
used to exist in VACUUM and CLUSTER. Did some minor code-beautification
work in the same area, too.
than N seconds apart. This allows a simple, if not very high performance,
means of guaranteeing that a PITR archive is no more than N seconds behind
real time. Also make pg_current_xlog_location return the WAL Write pointer,
add pg_current_xlog_insert_location to return the Insert pointer, and fix
pg_xlogfile_name_offset to return its results as a two-element record instead
of a smashed-together string, as per recent discussion.
Simon Riggs
plpgsql support to come later. Along the way, convert execMain's
SELECT INTO support into a DestReceiver, in order to eliminate some ugly
special cases.
Jonah Harris and Tom Lane
operation every so often. This improves the usefulness of PITR log
shipping for hot standby: formerly, if the standby server crashed, it
was necessary to restart it from the last base backup and replay all
the WAL since then. Now it will only need to reread about the same
amount of WAL as the master server would. The behavior might also
come in handy during a long PITR replay sequence. Simon Riggs,
with some editorialization by Tom Lane.
to happen automatically during pg_stop_backup(). Add some functions for
interrogating the current xlog insertion point and for easily extracting
WAL filenames from the hex WAL locations displayed by pg_stop_backup
and friends. Simon Riggs with some editorialization by Tom Lane.
(table or index) before trying to open its relcache entry. This fixes
race conditions in which someone else commits a change to the relation's
catalog entries while we are in process of doing relcache load. Problems
of that ilk have been reported sporadically for years, but it was not
really practical to fix until recently --- for instance, the recent
addition of WAL-log support for in-place updates helped.
Along the way, remove pg_am.amconcurrent: all AMs are now expected to support
concurrent update.
vacuums. This allows a OLTP-like system with big tables to continue
regular vacuuming on small-but-frequently-updated tables while the
big tables are being vacuumed.
Original patch from Hannu Krossing, rewritten by Tom Lane and updated
by me.
When we are about to split an index page to do an insertion, first look
to see if any entries marked LP_DELETE exist on the page, and if so remove
them to try to make enough space for the desired insert. This should reduce
index bloat in heavily-updated tables, although of course you still need
VACUUM eventually to clean up the heap.
Junji Teramoto
recovery. In the first place, it doesn't work because slru's
latest_page_number isn't set up yet (this is why we've been hearing reports
of strange "apparent wraparound" log messages during crash recovery, but
only from people who'd managed to advance their next-mxact counters some
considerable distance from 0). In the second place, it seems a bit unwise
to be throwing away data during crash recovery anwyway. This latter
consideration convinces me to just disable truncation during recovery,
rather than computing latest_page_number and pushing ahead.
variable (this accounts for regression failures on PPC64, and in fact
won't work on any big-endian machine). Get rid of hardwired knowledge
about datum size rules; make it look just like datumCopy().
I'm going to insist on reversion of this entire patch unless pgrminclude
is upgraded to a less broken state, but in the meantime let's get contrib
passing regression again.