Fix TablespaceCreateDbspace() to be able to create a dummy directory
in place of a dropped tablespace's symlink. This eliminates the open
problem of a PANIC during WAL replay when a replayed action attempts
to touch a file in a since-deleted tablespace. It also makes for a
significant improvement in the usability of PITR replay.
a more tolerable limit on the number of subtransactions or deleted files
in COMMIT and ABORT records. Buy back the extra space by eliminating the
xl_xact_prev field, which isn't being used for anything and is rather
unlikely ever to be used for anything.
This does not force initdb, but you do need to do pg_resetxlog if you
want to upgrade an existing 8.0 installation without initdb.
relcache entries. Also, change TransactionIdIsCurrentTransactionId()
so that if consulted during transaction abort, it will not say that
the aborted xact is still current. (It would be better to ensure that
it's never called at all during abort, but I'm not sure we can easily
guarantee that.) In combination, these fix a crash we have seen
occasionally during parallel regression tests of 8.0.
for every command executed within a transaction. For long transactions
this was a significant memory leak. Instead, we can delete a portal's
or subtransaction's ResourceOwner immediately, if we physically transfer
the information about its locks up to the parent owner. This does not
fully solve the leak problem; we need to do something about counting
multiple acquisitions of the same lock in order to fix it. But it's a
necessary step along the way.
updates are no longer WAL-logged nor even fsync'd; we do not need to,
since after a crash no old pg_subtrans data is needed again. We truncate
pg_subtrans to RecentGlobalXmin at each checkpoint. slru.c's API is
refactored a little bit to separate out the necessary decisions.
RecentXmin (== MyProc->xmin). This ensures that it will be safe to
truncate pg_subtrans at RecentGlobalXmin, which should largely eliminate
any fear of bloat. Along the way, eliminate SubTransXidsHaveCommonAncestor,
which isn't really needed and could not give a trustworthy result anyway
under the lookback restriction.
In an unrelated but nearby change, #ifdef out GetUndoRecPtr, which has
been dead code since 2001 and seems unlikely to ever be resurrected.
value of 'start' could be past the end of the page, if the page was
split by some concurrent inserting process since we visited it. In
this situation the code could look at bogus entries and possibly find
a match (since after all those entries still contain what they had
before the split). This would lead to 'specified item offset is too large'
followed by 'PANIC: failed to add item to the page', as reported by Joe
Conway for scenarios involving heavy concurrent insertion activity.
of XLogInsert had the same sort of checkpoint interlock problem as
RecordTransactionCommit, and indeed I found some. Btree index build
and ALTER TABLE SET TABLESPACE write data outside the friendly confines
of the buffer manager, and therefore they have to take their own
responsibility for checkpoint interlock. The easiest solution seems to
be to force smgrimmedsync at the end of the index build or table copy,
even when the operation is being WAL-logged. This is sufficient since
the new index or table will be of interest to no one if we don't get
as far as committing the current transaction.
and doesn't process forward slashes in the same way as external
commands. Quoting the first argument to COPY does not convert forward
to backward slashes, but COPY does properly process quoted forward
slashes in the second argument.
Win32 COPY works with quoted forward slashes in the first argument only if the
current directory is the same as the directory of the first argument.
for transaction commits that occurred just before the checkpoint. This is
an EXTREMELY serious bug --- kudos to Satoshi Okada for creating a
reproducible test case to prove its existence.
slashes to backslashes #ifdef WIN32. This is to cope with the fact
that Windows seems exceedingly unfriendly to slashes in shell commands,
as per recent discussion.
and history files as per recent discussion. While at it, remove
pg_terminate_backend, since we have decided we do not have time during
this release cycle to address the reliability concerns it creates.
Split the 'Miscellaneous Functions' documentation section into
'System Information Functions' and 'System Administration Functions',
which hopefully will draw the eyes of those looking for such things.
number of active subtransaction XIDs in each backend's PGPROC entry,
and use this to avoid expensive probes into pg_subtrans during
TransactionIdIsInProgress. Extend EOXactCallback API to allow add-on
modules to get control at subxact start/end. (This is deliberately
not compatible with the former API, since any uses of that API probably
need manual review anyway.) Add basic reference documentation for
SAVEPOINT and related commands. Minor other cleanups to check off some
of the open issues for subtransactions.
Alvaro Herrera and Tom Lane.
There are still some things that need refinement; in particular I fear
that the recognized set of error condition names probably has little in
common with what Oracle recognizes. But it's a start.
password/group files. Also allow read-only subtransactions of a read-write
parent, but not vice versa. These are the reasonably noncontroversial
parts of Alvaro's recent mop-up patch, plus further work on large objects
to minimize use of the TopTransactionResourceOwner.
SAVEPOINT/RELEASE/ROLLBACK-TO syntax. (Alvaro)
Cause COMMIT of a failed transaction to report ROLLBACK instead of
COMMIT in its command tag. (Tom)
Fix a few loose ends in the nested-transactions stuff.
recovery_target_timeline --- otherwise there is no path from the backup
to the requested timeline. This check was foreseen in the original
discussion but I forgot to implement it.
recovered from archive is not corrupt. It's not much but it will catch
one common problem, viz out-of-disk-space.
Also, force a WAL recovery scan when recovery.conf is present, even if
pg_control shows a clean shutdown. This allows recovery with a tar backup
that was taken with the postmaster shut down, as per complaint from
Mark Kirkwood.
recovery more manageable. Also, undo recent change to add FILE_HEADER
and WASTED_SPACE records to XLOG; instead make the XLOG page header
variable-size with extra fields in the first page of an XLOG file.
This should fix the boundary-case bugs observed by Mark Kirkwood.
initdb forced due to change of XLOG representation.
keep track of portal-related resources separately from transaction-related
resources. This allows cursors to work in a somewhat sane fashion with
nested transactions. For now, cursor behavior is non-subtransactional,
that is a cursor's state does not roll back if you abort a subtransaction
that fetched from the cursor. We might want to change that later.
shift support code into heapam.c accordingly. This is in service of
soon-to-be-committed ALTER TABLE SET TABLESPACE code that will want to
use this same record type for both heaps and indexes.
Theoretically I should have forced initdb for this, but in practice there
is no change in xlog contents because CVS tip will never really emit this
record type anyhow...
performance front, but with feature freeze upon us I think it's time to
drive a stake in the ground and say that this will be in 7.5.
Alvaro Herrera, with some help from Tom Lane.
There are various things left to do: contrib dbsize and oid2name modules
need work, and so does the documentation. Also someone should think about
COMMENT ON TABLESPACE and maybe RENAME TABLESPACE. Also initlocation is
dead, it just doesn't know it yet.
Gavin Sherry and Tom Lane.
cidr type bit, the same as network_eq does. This is needed for hash joins
and hash aggregation to work correctly on these types. Per bug report
from Michael Fuhr, 2004-04-13.
Also, improve hash function for int8 as suggested by Greg Stark.
of a composite type to get that type's OID as their second parameter,
in place of typelem which is useless. The actual changes are mostly
centralized in getTypeInputInfo and siblings, but I had to fix a few
places that were fetching pg_type.typelem for themselves instead of
using the lsyscache.c routines. Also, I renamed all the related variables
from 'typelem' to 'typioparam' to discourage people from assuming that
they necessarily contain array element types.
1. Solve the problem of not having TOAST references hiding inside composite
values by establishing the rule that toasting only goes one level deep:
a tuple can contain toasted fields, but a composite-type datum that is
to be inserted into a tuple cannot. Enforcing this in heap_formtuple
is relatively cheap and it avoids a large increase in the cost of running
the tuptoaster during final storage of a row.
2. Fix some interesting problems in expansion of inherited queries that
reference whole-row variables. We never really did this correctly before,
but it's now relatively painless to solve by expanding the parent's
whole-row Var into a RowExpr() selecting the proper columns from the
child.
If you dike out the preventive check in CheckAttributeType(),
composite-type columns now seem to actually work. However, we surely
cannot ship them like this --- without I/O for composite types, you
can't get pg_dump to dump tables containing them. So a little more
work still to do.
loop over the fields instead of a loop around heap_getattr. This is
considerably faster (O(N) instead of O(N^2)) when there are nulls or
varlena fields, since those prevent use of attcacheoff. Replace loops
over heap_getattr with heap_deformtuple in situations where all or most
of the fields have to be fetched, such as printtup and tuptoaster.
Profiling done more than a year ago shows that this should be a nice
win for situations involving many-column tables.
place of time_t, as per prior discussion. The behavior does not change
on machines without a 64-bit-int type, but on machines with one, which
is most, we are rid of the bizarre boundary behavior at the edges of
the 32-bit-time_t range (1901 and 2038). The system will now treat
times over the full supported timestamp range as being in your local
time zone. It may seem a little bizarre to consider that times in
4000 BC are PST or EST, but this is surely at least as reasonable as
propagating Gregorian calendar rules back that far.
I did not modify the format of the zic timezone database files, which
means that for the moment the system will not know about daylight-savings
periods outside the range 1901-2038. Given the way the files are set up,
it's not a simple decision like 'widen to 64 bits'; we have to actually
think about the range of years that need to be supported. We should
probably inquire what the plans of the upstream zic people are before
making any decisions of our own.
locking conflict against concurrent CHECKPOINT that was discussed a few
weeks ago. Also, if not using WAL archiving (which is always true ATM
but won't be if PITR makes it into this release), there's no need to
WAL-log the index build process; it's sufficient to force-fsync the
completed index before commit. This seems to gain about a factor of 2
in my tests, which is consistent with writing half as much data. I did
not try it with WAL on a separate drive though --- probably the gain would
be a lot less in that scenario.
rather than an error code, and does elog(ERROR) not elog(WARNING)
when it detects a problem. All callers were simply elog(ERROR)'ing on
failure return anyway, and I find it hard to envision a caller that would
not, so we may as well simplify the callers and produce the more useful
error message directly.
explicitly fsync'ing every (non-temp) file we have written since the
last checkpoint. In the vast majority of cases, the burden of the
fsyncs should fall on the bgwriter process not on backends. (To this
end, we assume that an fsync issued by the bgwriter will force out
blocks written to the same file by other processes using other file
descriptors. Anyone have a problem with that?) This makes the world
safe for WIN32, which ain't even got sync(2), and really makes the world
safe for Unixen as well, because sync(2) never had the semantics we need:
it offers no way to wait for the requested I/O to finish.
Along the way, fix a bug I recently introduced in xlog recovery:
file truncation replay failed to clear bufmgr buffers for the dropped
blocks, which could result in 'PANIC: heap_delete_redo: no block'
later on in xlog replay.
than being random pieces of other files. Give bgwriter responsibility
for all checkpoint activity (other than a post-recovery checkpoint);
so this child process absorbs the functionality of the former transient
checkpoint and shutdown subprocesses. While at it, create an actual
include file for postmaster.c, which for some reason never had its own
file before.
about a third, make it work on non-Windows platforms again. (But perhaps
I broke the WIN32 code, since I have no way to test that.) Fold all the
paths that fork postmaster child processes to go through the single
routine SubPostmasterMain, which takes care of resurrecting the state that
would normally be inherited from the postmaster (including GUC variables).
Clean up some places where there's no particularly good reason for the
EXEC and non-EXEC cases to work differently. Take care of one or two
FIXMEs that remained in the code.
of ThisStartUpID and RedoRecPtr into new backends. It's a lot easier just
to make them all grab the values out of shared memory during startup.
This helps to decouple the postmaster from checkpoint execution, which I
need since I'm intending to let the bgwriter do it instead, and it also
fixes a bug in the Win32 port: ThisStartUpID wasn't getting propagated at
all AFAICS. (Doesn't give me a lot of faith in the amount of testing that
port has gotten.)
In the past, we used a 'Lispy' linked list implementation: a "list" was
merely a pointer to the head node of the list. The problem with that
design is that it makes lappend() and length() linear time. This patch
fixes that problem (and others) by maintaining a count of the list
length and a pointer to the tail node along with each head node pointer.
A "list" is now a pointer to a structure containing some meta-data
about the list; the head and tail pointers in that structure refer
to ListCell structures that maintain the actual linked list of nodes.
The function names of the list API have also been changed to, I hope,
be more logically consistent. By default, the old function names are
still available; they will be disabled-by-default once the rest of
the tree has been updated to use the new API names.
and should do now that we control our own destiny for timezone handling,
but this commit gets the bulk of the picayune diffs in place.
Magnus Hagander and Tom Lane.
costing us lots more to maintain than it was worth. On shared tables
it was of exactly zero benefit because we couldn't trust it to be
up to date. On temp tables it sometimes saved an lseek, but not often
enough to be worth getting excited about. And the real problem was that
we forced an lseek on every relcache flush in order to update the field.
So all in all it seems best to lose the complexity.
conversion of basic ASCII letters. Remove all uses of strcasecmp and
strncasecmp in favor of new functions pg_strcasecmp and pg_strncasecmp;
remove most but not all direct uses of toupper and tolower in favor of
pg_toupper and pg_tolower. These functions use the same notions of
case folding already developed for identifier case conversion. I left
the straight locale-based folding in place for situations where we are
just manipulating user data and not trying to match it to built-in
strings --- for example, the SQL upper() function is still locale
dependent. Perhaps this will prove not to be what's wanted, but at
the moment we can initdb and pass regression tests in Turkish locale.
the next are handled by ReleaseAndReadBuffer rather than separate
ReleaseBuffer and ReadBuffer calls. This cuts the number of acquisitions
of the BufMgrLock by a factor of 2 (possibly more, if an indexscan happens
to pull successive rows from the same heap page). Unfortunately this
doesn't seem enough to get us out of the recently discussed context-switch
storm problem, but it's surely worth doing anyway.
* removed a few redundant defines
* get_user_name safe under win32
* rationalized pipe read EOF for win32 (UPDATED PATCH USED)
* changed all backend instances of sleep() to pg_usleep
- except for the SLEEP_ON_ASSERT in assert.c, as it would exceed a
32-bit long [Note to patcher: If a SLEEP_ON_ASSERT of 2000 seconds is
acceptable, please replace with pg_usleep(2000000000L)]
I added a comment to that part of the code:
/*
* It would be nice to use pg_usleep() here, but only does 2000 sec
* or 33 minutes, which seems too short.
*/
sleep(1000000);
Claudio Natoli
results with tuples as ordinary varlena Datums. This commit does not
in itself do much for us, except eliminate the horrid memory leak
associated with evaluation of whole-row variables. However, it lays the
groundwork for allowing composite types as table columns, and perhaps
some other useful features as well. Per my proposal of a few days ago.
boxes. Change interface to user-defined GiST support methods union and
picksplit. Now instead of bytea struct it used special GistEntryVector
structure.
and FreeDir routines modeled on the existing AllocateFile/FreeFile.
Like the latter, these routines will avoid failing on EMFILE/ENFILE
conditions whenever possible, and will prevent leakage of directory
descriptors if an elog() occurs while one is open.
Also, reduce PANIC to ERROR in MoveOfflineLogs() --- this is not
critical code and there is no reason to force a DB restart on failure.
All per recent trouble report from Olivier Hubaut.
1) Now puts in exactly the same change as the current-cvs mingw code
does. (see
http://cvs.sourceforge.net/viewcvs.py/mingw/runtime/mingwex/dirent.c?r1=
1.3&r2=1.4, second part of the patch).
2) Updates both xlog.c and slru.c in backend/access/transam/
3) Also updates pg_resetxlog, which also uses readdir() and checks the
errno value after the loop.
Magnus Hagander
wit: Add a header record to each WAL segment file so that it can be reliably
identified. Avoid splitting WAL records across segment files (this is not
strictly necessary, but makes it simpler to incorporate the header records).
Make WAL entries for file creation, deletion, and truncation (as foreseen but
never implemented by Vadim). Also, add support for making XLOG_SEG_SIZE
configurable at compile time, similarly to BLCKSZ. Fix a couple bugs I
introduced in WAL replay during recent smgr API changes. initdb is forced
due to changes in pg_control contents.
subroutine in src/port/pgsleep.c. Remove platform dependencies from
miscadmin.h and put them in port.h where they belong. Extend recent
vacuum cost-based-delay patch to apply to VACUUM FULL, ANALYZE, and
non-btree index vacuuming.
By the way, where is the documentation for the cost-based-delay patch?
the relcache, and so the notion of 'blind write' is gone. This should
improve efficiency in bgwriter and background checkpoint processes.
Internal restructuring in md.c to remove the not-very-useful array of
MdfdVec objects --- might as well just use pointers.
Also remove the long-dead 'persistent main memory' storage manager (mm.c),
since it seems quite unlikely to ever get resurrected.
Make btree index creation and initial validation of foreign-key constraints
use maintenance_work_mem rather than work_mem as their memory limit.
Add some code to guc.c to allow these variables to be referenced by their
old names in SHOW and SET commands, for backwards compatibility.
whereToSendOutput instead because they are really inquiring about
the correct client communication protocol. Update some comments.
This is pointing towards supporting regular FE/BE client protocol
in a standalone backend, per discussion a month or so back.
complete ExtendCLOG() before advancing nextXid, so that if that routine
fails, the next incoming transaction will try it again. Per trouble
report from Christopher Kings-Lynne.
should not be too eager to reject paths involving unknown schemas, since
it can't really tell whether the schemas exist in the target database.
(Also, when reading pg_dumpall output, it could be that the schemas
don't exist yet, but eventually will.) ALTER USER SET has a similar issue.
So, reduce the normal ERROR to a NOTICE when checking search_path values
for these commands. Supporting this requires changing the API for GUC
assign_hook functions, which causes the patch to touch a lot of places,
but the changes are conceptually trivial.
tuptoaster.c --- fields that are compressed in-line are not a reason
to invoke the toaster. Along the way, add a couple more htup.h macros
to eliminate confusing negated tests, and get rid of the already
vestigial TUPLE_TOASTER_ACTIVE symbol.
pointer type when it is not necessary to do so.
For future reference, casting NULL to a pointer type is only necessary
when (a) invoking a function AND either (b) the function has no prototype
OR (c) the function is a varargs function.
to step more than one entry after descending the search tree to arrive at
the correct place to start the scan. This can improve the behavior
substantially when there are many entries equal to the chosen boundary
value. Per suggestion from Dmitry Tkach, 14-Jul-03.
some concurrent changes Jan was making to the bufmgr. Here's an
updated version of the patch -- it should apply cleanly to CVS
HEAD and passes the regression tests.
This patch makes the following changes:
- remove the UnlockAndReleaseBuffer() and UnlockAndWriteBuffer()
macros, and replace uses of them with calls to the appropriate
functions.
- remove a bunch of #ifdef BMTRACE code: it is ugly & broken
(i.e. it doesn't compile)
- make BufferReplace() return a bool, not an int
- cleanup some logic in bufmgr.c; should be functionality
equivalent to the previous code, just cleaner now
- remove the BM_PRIVATE flag as it is unused
- improve a few comments, etc.
pghackers proposal of 8-Nov. All the existing cross-type comparison
operators (int2/int4/int8 and float4/float8) have appropriate support.
The original proposal of storing the right-hand-side datatype as part of
the primary key for pg_amop and pg_amproc got modified a bit in the event;
it is easier to store zero as the 'default' case and only store a nonzero
when the operator is actually cross-type. Along the way, remove the
long-since-defunct bigbox_ops operator class.
Remove the 'strategy map' code, which was a large amount of mechanism
that no longer had any use except reverse-mapping from procedure OID to
strategy number. Passing the strategy number to the index AM in the
first place is simpler and faster.
This is a preliminary step in planned support for cross-datatype index
operations. I'm committing it now since the ScanKeyEntryInitialize()
API change touches quite a lot of files, and I want to commit those
changes before the tree drifts under me.
protocol, per report from Igor Shevchenko. NOTIFY thought it could
do its thing if transaction blockState is TBLOCK_DEFAULT, but in
reality it had better check the low-level transaction state is
TRANS_DEFAULT as well. Formerly it was not possible to wait for the
client in a state where the first is true and the second is not ...
but now we can have such a state. Minor cleanup in StartTransaction()
as well.
discussion on pgsql-hackers: in READ COMMITTED mode we just have to force
a QuerySnapshot update in the trigger, but in SERIALIZABLE mode we have
to run the scan under a current snapshot and then complain if any rows
would be updated/deleted that are not visible in the transaction snapshot.
invalid (has the wrong magic number) until the build is entirely
complete. This turns out to cost no additional writes in the normal
case, since we were rewriting the metapage at the end of the process
anyway. In normal scenarios there's no real gain in security, because
a failed index build would roll back the transaction leaving an unused
index file, but for rebuilding shared system indexes this seems to add
some useful protection.
post-abort cleanup hooks. I'm surprised that we have not needed this
already, but I need it now to fix a plpgsql problem, and the usefulness
for other dynamically loaded modules seems obvious.
now able to cope with assigning new relfilenode values to nailed-in-cache
indexes, so they can be reindexed using the fully crash-safe method. This
leaves only shared system indexes as special cases. Remove the 'index
deactivation' code, since it provides no useful protection in the shared-
index case. Require reindexing of shared indexes to be done in standalone
mode, but remove other restrictions on REINDEX. -P (IgnoreSystemIndexes)
now prevents using indexes for lookups, but does not disable index updates.
It is therefore safe to allow from PGOPTIONS. Upshot: reindexing system catalogs
can be done without a standalone backend for all cases except
shared catalogs.
really general fix might be difficult, I believe the only case where
AtCommit_Notify could see an uncommitted tuple is where the other guy
has just unlistened and not yet committed. The best solution seems to
be to just skip updating that tuple, on the assumption that the other
guy does not want to hear about the notification anyway. This is not
perfect --- if the other guy rolls back his unlisten instead of committing,
then he really should have gotten this notify. But to do that, we'd have
to wait to see if he commits or not, or make UNLISTEN hold exclusive lock
on pg_listener until commit. Either of these answers is deadlock-prone,
not to mention horrible for interactive performance. Do it this way
for now. (What happened to that project to do LISTEN/NOTIFY in memory
with no table, anyway?)
pghackers. This fixes the problem recently reported by Markus KrÌutner
(hash bucket split corrupts the state of scans being done concurrently),
and I believe it also fixes all the known problems with deadlocks in
hash index operations. Hash indexes are still not really ready for prime
time (since they aren't WAL-logged), but this is a step forward.
killed items; just skip to the next item immediately. Only check for
key equality when we reach a non-killed item or the end of the index
page. This saves key comparisons when there are lots of killed items,
as for example in a heavily-updated table that's not been vacuumed lately.
Seems to be a win for pgbench anyway.
layout; therefore, this change forces REINDEX of hash indexes (though
not a full initdb). Widen hashm_ntuples to double so that hash space
management doesn't get confused by more than 4G entries; enlarge the
allowed number of free-space-bitmap pages; replace the useless bshift
field with a useful bmshift field; eliminate 4 bytes of wasted space
in the per-page special area.
scheme. A pleasant side effect is that it is *much* faster when deleting
a large fraction of the indexed tuples, because of elimination of
redundant hash_step activity induced by hash_adjscans. Various other
continuing code cleanup.
yet). Fix a couple of bugs that would only appear if multiple bitmap pages
are used, including a buffer reference leak and incorrect computation of bit
indexes. Get rid of 'overflow address' concept, which accomplished nothing
except obfuscating the code and creating a risk of failure due to limited
range of offset field. Rename some misleadingly-named fields and routines,
and improve documentation.
explanation of the remarkably confusing page addressing scheme.
The file also includes my planned-but-not-yet-implemented revision
of the hash index locking scheme.
target columns in INSERT and UPDATE targetlists. Don't rely on resname
to be accurate in ruleutils, either. This fixes bug reported by
Donald Fraser, in which renaming a column referenced in a rule did not
work very well.
index pages: when _bt_getbuf asks the FSM for a free index page, it is
possible (and, in some cases, even moderately likely) that the answer
will be the same page that _bt_split is trying to split. _bt_getbuf
already knew that the returned page might not be free, but it wasn't
prepared for the possibility that even trying to lock the page could
be problematic. Fix by doing a conditional rather than unconditional
grab of the page lock.
bottom. Otherwise we fail to moveright when the root page was split while
we were "in flight" to it. This is not a significant problem when the root
is above the leaf level, but if the root was also a leaf (ie, a single-page
index just got split) we may return the wrong leaf page to the caller,
resulting in failure to find a key that is in fact present. Bug has existed
at least since 7.1, probably forever.
fixed incorrect initial setting of StartUpID. The logic in XLogWrite()
expects that Write->curridx is advanced to the next page as soon as
LogwrtResult points to the end of the current page, but StartupXLOG()
failed to make that happen when the old WAL ended exactly on a page
boundary. Per trouble report from Hannu Krosing.
specific hash functions used by hash indexes, rather than the old
not-datatype-aware ComputeHashFunc routine. This makes it safe to do
hash joining on several datatypes that previously couldn't use hashing.
The sets of datatypes that are hash indexable and hash joinable are now
exactly the same, whereas before each had some that weren't in the other.
least-recently-used strategy from clog.c into slru.c. It doesn't
change any visible behaviour and passes all regression tests plus a
TruncateCLOG test done manually.
Apart from refactoring I made a little change to SlruRecentlyUsed,
formerly ClogRecentlyUsed: It now skips incrementing lru_counts, if
slotno is already the LRU slot, thus saving a few CPU cycles. To make
this work, lru_counts are initialised to 1 in SimpleLruInit.
SimpleLru will be used by pg_subtrans (part of the nested transactions
project), so the main purpose of this patch is to avoid future code
duplication.
Manfred Koizar
advertised in RowDescription message. Depending on the physical tuple's
column count is not really correct, since according to heap_getattr()
conventions the tuple may be short some columns, which will automatically
get read as nulls. Problem has been latent since forever, but was only
exposed by recent change to skip a projection step in SELECT * FROM...
example from Rao Kumar. This is a very corner corner-case, requiring
a minimum of three closely-spaced database crashes and an unlucky
positioning of the second recovery's checkpoint record before you'd notice
any problem. But the consequences are dire enough that it's a must-fix.
only remnant of this failed experiment is that the server will take
SET AUTOCOMMIT TO ON. Still TODO: provide some client-side autocommit
logic in libpq.
the type OID and typmod of the underlying base type. Per discussions
a few weeks ago with Andreas Pflug and others. Note that this behavioral
change affects both old- and new-protocol clients.
dead xlog segments are not considered part of a critical section. It is
not necessary to force a database-wide panic if we get a failure in these
operations. Per recent trouble reports.
handle multiple 'formats' for data I/O. Restructure CommandDest and
DestReceiver stuff one more time (it's finally starting to look a bit
clean though). Code now matches latest 3.0 protocol document as far
as message formats go --- but there is no support for binary I/O yet.
DestReceiver pointers instead of just CommandDest values. The DestReceiver
is made at the point where the destination is selected, rather than
deep inside the executor. This cleans up the original kluge implementation
of tstoreReceiver.c, and makes it easy to support retrieving results
from utility statements inside portals. Thus, you can now do fun things
like Bind and Execute a FETCH or EXPLAIN command, and it'll all work
as expected (e.g., you can Describe the portal, or use Execute's count
parameter to suspend the output partway through). Implementation involves
stuffing the utility command's output into a Tuplestore, which would be
kind of annoying for huge output sets, but should be quite acceptable
for typical uses of utility commands.
the column by table OID and column number, if it's a simple column
reference. Along the way, get rid of reskey/reskeyop fields in Resdoms.
Turns out that representation was not convenient for either the planner
or the executor; we can make the planner deliver exactly what the
executor wants with no more effort.
initdb forced due to change in stored rule representation.
Both plannable queries and utility commands are now always executed
within Portals, which have been revamped so that they can handle the
load (they used to be good only for single SELECT queries). Restructure
code to push command-completion-tag selection logic out of postgres.c,
so that it won't have to be duplicated between simple and extended queries.
initdb forced due to addition of a field to Query nodes.
for tableID/columnID in RowDescription. (The latter isn't really
implemented yet though --- the backend always sends zeroes, and libpq
just throws away the data.)