* Touch the socket and lock file at least every hour, to
* ensure that they are not removed by overzealous /tmp-cleaning
* tasks. Set to 58 minutes so a cleaner never sees the
* file as an hour old.
currently does. This is now the default Win32 wal sync method because
we perfer o_datasync to fsync.
Also, change Win32 fsync to a new wal sync method called
fsync_writethrough because that is the behavior of _commit, which is
what is used for fsync on Win32.
Backpatch to 8.0.X.
ExclusiveLock rather than AccessExclusiveLock. This will allow concurrent
SELECT queries to proceed on the table. Per discussion with Andrew at
SuperNews.
explicit paths, so that the log can be replayed in a data directory
with a different absolute path than the original had. To avoid forcing
initdb in the 8.0 branch, continue to accept the old WAL log record
types; they will never again be generated however, and the code can be
dropped after the next forced initdb. Per report from Oleg Bartunov.
We still need to think about what it really means to WAL-log CREATE
TABLESPACE commands: we more or less have to put the absolute path
into those, but how to replay in a different context??
PageIndexTupleDelete() with a single pass of compactification ---
logic mostly lifted from PageRepairFragmentation. I noticed while
profiling that a VACUUM that's cleaning up a whole lot of deleted
tuples would spend as much as a third of its CPU time in
PageIndexTupleDelete; not too surprising considering the loop method
was roughly O(N^2) in the number of tuples involved.
up-to-speed logic; in particular this will cause it to quote names that
match keywords. Remove unnecessary multibyte cruft from quote_literal
(all backend-internal encodings are 8-bit-safe).
convention for isnull flags. Also, remove the useless InsertIndexResult
return struct from index AM aminsert calls --- there is no reason for
the caller to know where in the index the tuple was inserted, and we
were wasting a palloc cycle per insert to deliver this uninteresting
value (plus nontrivial complexity in some AMs).
I forced initdb because of the change in the signature of the aminsert
routines, even though nothing really looks at those pg_proc entries...
to write out data that we are about to tell the filesystem to drop.
smgr_internal_unlink already had a DropRelFileNodeBuffers call to
get rid of dead buffers without a write after it's no longer possible
to roll back the deleting transaction. Adding a similar call in
smgrtruncate simplifies callers and makes the overall division of
labor clearer. This patch removes the former behavior that VACUUM
would write all dirty buffers of a relation unconditionally.
is still alive. This improves our odds of not getting fooled by an
unrelated process when checking a stale lock file. Other checks already
in place, plus one newly added in checkDataDir(), ensure that we cannot
attempt to usurp the place of a postmaster belonging to a different userid,
so there is no need to error out. Add comments indicating the importance
of these other checks.
grouping_planner() to preprocess_targetlist(), according to a comment
in grouping_planner(). I think the refactoring makes sense, and moves
some extraneous details out of grouping_planner().
of tuples when passing data up through multiple plan nodes. A slot can now
hold either a normal "physical" HeapTuple, or a "virtual" tuple consisting
of Datum/isnull arrays. Upper plan levels can usually just copy the Datum
arrays, avoiding heap_formtuple() and possible subsequent nocachegetattr()
calls to extract the data again. This work extends Atsushi Ogawa's earlier
patch, which provided the key idea of adding Datum arrays to TupleTableSlots.
(I believe however that something like this was foreseen way back in Berkeley
days --- see the old comment on ExecProject.) A test case involving many
levels of join of fairly wide tables (about 80 columns altogether) showed
about 3x overall speedup, though simple queries will probably not be
helped very much.
I have also duplicated some code in heaptuple.c in order to provide versions
of heap_formtuple and friends that use "bool" arrays to indicate null
attributes, instead of the old convention of "char" arrays containing either
'n' or ' '. This provides a better match to the convention used by
ExecEvalExpr. While I have not made a concerted effort to get rid of uses
of the old routines, I think they should be deprecated and eventually removed.
locale is C.
Backpatch to 8.0.X because some operating systems were throwing errors
for such operations, rather than ignoring the locale when it was C.
a tuple are being accessed via ExecEvalVar and the attcacheoff shortcut
isn't usable (due to nulls and/or varlena columns). To do this, cache
Datums extracted from a tuple in the associated TupleTableSlot.
Also some code cleanup in and around the TupleTable handling.
Atsushi Ogawa with some kibitzing by Tom Lane.
whether or not it is a security definer. Changing a function's strictness
is required by SQL2003, and the other capabilities make sense. Also, allow
an optional RESTRICT noise word to be specified, for SQL conformance.
Some trivial regression tests added and the documentation has been
updated.
database's datallowconn and datfrozenxid to the current transaction ID
instead of copying the source database's values. This is OK because we
assume the source DB contains no normal transaction IDs whatsoever.
This keeps VACUUM from immediately starting to complain about unvacuumed
databases in the situation where we are more than 2 billion transactions
out from the XID stamp of template0. Per discussion with Milen Radev
(although his complaint turned out to be due to something else, but the
problem is real anyway).
can tell whether it is being used as an aggregate or not. This allows
such a function to avoid re-pallocing a pass-by-reference transition
value; normally it would be unsafe for a function to scribble on an input,
but in the aggregate case it's safe to reuse the old transition value.
Make int8inc() do this. This gets a useful improvement in the speed of
COUNT(*), at least on narrow tables (it seems to be swamped by I/O when
the table rows are wide). Per a discussion in early December with
Neil Conway. I also fixed int_aggregate.c to check this, thereby
turning it into something approaching a supportable technique instead
of being a crude hack.
elog if the former has trouble writing its file. Code review for
Magnus' patch to redirect stderr to syslog on Windows (Bruce's version
seems right, but did some minor prettification).
Backpatch both changes to 8.0 branch.
Document use of macros for pg_printf functions.
Bump major versions of all interfaces to handle movement of get_progname
from libpq to libpgport in 8.0, and probably other libpgport changes in 8.1.
Formerly, if such a clause contained no aggregate functions we mistakenly
treated it as equivalent to WHERE. Per spec it must cause the query to
be treated as a grouped query of a single group, the same as appearance
of aggregate functions would do. Also, the HAVING filter must execute
after aggregate function computation even if it itself contains no
aggregate functions.
before we can invoke fork() -- flush stdio buffers, save and restore the
profiling timer on Linux with LINUX_PROFILE, and handle BeOS stuff. This
patch moves that code into a single function, fork_process(), instead of
duplicating it at the various callsites of fork().
This patch doesn't address the EXEC_BACKEND case; there is room for
further cleanup there.
number of palloc calls. This has a salutory impact on plpgsql operations
with record variables (which create and destroy tupdescs constantly)
and probably helps a bit in some other cases too.
Too much space is allocated for tablespace file path, I guess the
directory name used to be "pg_tablespaces" instead of "pg_tblspc" at
some point.
Heikki Linnakangas
on-the-fly, and thereby avoid blowing out memory when the planner has
underestimated the hash table size. Hash join will now obey the
work_mem limit with some faithfulness. Per my recent proposal
(hash aggregate part isn't done yet though).
the freelist, plus per-buffer spinlocks that protect access to individual
shared buffer headers. This requires abandoning a global freelist (since
the freelist is a global contention point), which shoots down ARC and 2Q
as well as plain LRU management. Adopt a clock sweep algorithm instead.
Preliminary results show substantial improvement in multi-backend situations.
of AND and OR clauses. The key point here is that an OR on the
predicate side has to be treated gingerly: we may be able to prove
that the OR is implied even when no one of its components is implied.
For example (x OR y) implies (x OR y OR z) even though no one of x,
y, or z can be individually proven. This code handles both the
example shown recently by Sergey Koshcheyev and the one shown last
October by Dawid Kuroczko.
no held locks. This maintains the invariant that proclocks are present
only for procs that are holding or awaiting a lock; when this is not
true, LockRelease will fail. Per report from Stephen Clouse.
indexscans involving partial indexes. These would always be dominated
by a simple indexscan on such an index, so there's no point in considering
them. Fixes overoptimism in a patch I applied last October.
it was in 7.4, and add some comments explaining why it has to be this way.
I broke it for OR'd index predicates in a fit of code cleanup last summer.
Per example from Sergey Koshcheyev.
in favor of looking at the flat file copy of pg_database during backend
startup. This should finally eliminate the various corner cases in which
backend startup fails unexpectedly because it isn't able to distinguish
live and dead tuples in pg_database. Simplify locking on pg_database
to be similar to the rules used with pg_shadow and pg_group, and eliminate
FlushRelationBuffers operations that were used only to reduce the odds
of failure of GetRawDatabaseInfo.
initdb forced due to addition of a trigger to pg_database.
implement the md5() SQL-level function). The old code did the
following:
1. de-toast the datum
2. convert it to a cstring via textout()
3. get the length of the cstring via strlen()
Since we are treating the datum context as a blob of binary data,
the latter two steps are unnecessary. Once the data has been
detoasted, we can just use it as-is, and derive its length from
the varlena metadata.
This patch improves some run-of-the-mill md5() computations by
just under 10% in my limited tests, and passes the regression tests.
I also noticed that md5_text() wasn't checking the return value
of md5_hash(); encountering OOM at precisely the right moment
could result in returning a random md5 hash. This patch corrects
that. A better fix would be to make md5_hash() only return on
success (and/or allocate via palloc()), but since it's used in
the frontend as well I don't see an easy way to do that.
during flat-file writing. The only difference is that SnapshotSelf
would consider tuples of the 'current command' within the current
transaction as valid, where SnapshotNow wouldn't. We can eliminate
the need for this with one extra CommandCounterIncrement call before
we start reading the catalogs.
the AMI_OVERRIDE flag. The fact that TransactionLogFetch treats
BootstrapTransactionId as always committed is sufficient to make
bootstrap work, and getting rid of extra tests in heavily used code
paths seems like a win. The files produced by initdb are demonstrably
the same after this change.
file now identifies group members by usesysid not name; this avoids
needing to depend on SearchSysCache which we can't use during startup.
(The old representation was entirely broken anyway, since we did not
regenerate the file following RENAME USER.) It's only a 95% solution
because if the group membership list is big enough to be toasted out
of line, we cannot read it during startup. I think this will do for
the moment, until we have time to implement the planned pg_role
replacement for pg_group.
in GetNewTransactionId(). Since the limit value has to be computed
before we run any real transactions, this requires adding code to database
startup to scan pg_database and determine the oldest datfrozenxid.
This can conveniently be combined with the first stage of an attack on
the problem that the 'flat file' copies of pg_shadow and pg_group are
not properly updated during WAL recovery. The code I've added to
startup resides in a new file src/backend/utils/init/flatfiles.c, and
it is responsible for rewriting the flat files as well as initializing
the XID wraparound limit value. This will eventually allow us to get
rid of GetRawDatabaseInfo too, but we'll need an initdb so we can add
a trigger to pg_database.
that return tuples (such as EXPLAIN). Per gripe from Michael Fuhr.
Side effect: fix an old bug that unintentionally disabled backward scans
for all SPI-created cursors.
column with a default expression. In that situation, we need to rewrite
the heap relation. To evaluate the new default expression, we use
ExecEvalExpr(); however, this can allocate memory in the current memory
context, and ATRewriteTable() does not switch out of the active portal's
heap memory context. The end result is a rather large memory leak (on
the order of gigabytes for a reasonably sized table).
This patch changes ATRewriteTable() to switch to the per-tuple memory
context before beginning the per-tuple loop. It also removes an explicit
heap_freetuple() in the loop, since that is no longer needed.
In an unrelated change, I noticed the code was scanning through the
attributes of the new tuple descriptor for each tuple of the old table.
I changed this to use precomputation, which should slightly speed up
the loop.
Thanks to steve@deefs.net for reporting the leak.
there are corner cases involving dropping toasted columns in which the
previous coding would fail, too: the new version of the table might not
have any TOAST table, but we'd still propagate possibly-wide values of
dropped columns forward.
form of CASE (eg, CASE 0 WHEN 1 THEN ...) can be constant-folded as it
was in 7.4. Also, avoid constant-folding result expressions that are
certainly unreachable --- the former coding was a bit cavalier about this
and could generate unexpected results for all-constant CASE expressions.
Add regression test cases. Per report from Vlad Marchenko.
tests. Contributed by Koju Iijima, review from Neil Conway, Gavin Sherry
and Tom Lane.
Also, fix error in description of WITH CHECK OPTION clause in the CREATE
VIEW reference page: it should be "CASCADED", not "CASCADE".
estimate to less than the number of values estimated for any one grouping
Var, as suggested by Manfred. This is intuitively right, and what's
more it puts the plan choices in the subselect regression test back the
way they were before ...
clamp the estimated number of groups to table row count over 10, instead
of table row count; this reflects a heuristic that people probably won't
group over a near-unique set of columns, and the knowledge that we don't
currently have any way to estimate the correlation of the columns better
than guessing. This change creates a trivial plan change in one of the
regression tests.
look at the actual aggregate transition datatypes and the actual overhead
needed by nodeAgg.c, instead of using pessimistic round numbers.
Per a discussion with Michael Tiemann.
command. This is useful because we can allow truncation of tables
referenced by foreign keys, so long as the referencing table is
truncated in the same command.
Alvaro Herrera
to avoid problems when a cursor depends on objects created or changed in
the same subtransaction. We'd like to do better someday, but this seems
the only workable answer for 8.0.1.
(1) Keep a pin on the scan's current buffer and mark buffer. This
avoids the need to do a ReadBuffer() for each tuple produced by the
scan. Since ReadBuffer() is expensive, this is a significant win.
(2) Convert a ReleaseBuffer(); ReadBuffer() pair into
ReleaseAndReadBuffer(). Surely not a huge win, but it saves a lock
acquire/release...
(3) Remove a bunch of duplicated code in rtget.c; make rtnext() handle
both the "initial result" and "subsequent result" cases.
(4) Add support for index tuple killing
(5) Remove rtscancache(): it is dead code, for the same reason that
gistscancache() is dead code (an index scan ought not be invoked with
NoMovementScanDirection).
The end result is about a 10% improvement in rtree index scan perf,
according to contrib/rtree_gist/bench.
got it wrong when the JOIN was in an outer query level. Per example from
Laurie Burrow. Also fix same issue in markTargetListOrigin. I think the
latter is only a latent bug since we currently don't apply markTargetListOrigin
except at the outer level ... but should do it right anyway.
CASE 'a' WHEN 'a' THEN 1 ELSE 2 END. This worked in 7.4 and before
but had been broken due to premature freezing of the type of the test
expression. Per gripe from GÄbor SzÃcs.
so that we can get the size of a shared inval message back down to what it
was in 7.4 (and simplify the logic too). Phase 2 of fixing the
'SMgrRelation hashtable corrupted' problem.
is the minimum required fix. I want to look next at taking advantage of
it by simplifying the message semantics in the shared inval message queue,
but that part can be held over for 8.1 if it turns out too ugly.
releases, a nonzero 'c' argument meant that the input string could be
terminated by either that character or \0. Recent refactoring broke
that, causing the thing to scan for 'c' only. This went undetected
because no part of the main code actually passes nonzero 'c'. However
it broke tsearch2 and possibly other user-written code that assumed
the old definition. Per report from Tom Hebbron.
discussion on pgsql-hackers-win32 list. Documentation still needs to
be tweaked --- I'm not sure how to refer to the APPDATA folder in
user documentation.
share lock on a buffer being written out before releasing BufMgrLock in
the BufferAlloc code path; if we do it later we might block on someone
who's re-pinned the buffer. I believe this is only an issue for BufferAlloc
and not the other places that call FlushBuffer. BufferSync must continue
to do it the old way since it may well be trying to write buffers that
other backends have pinned; but it should not be holding any conflicting
locks. FlushRelationBuffers is okay since it's got exclusive lock at the
relation level.
Also performed an initial run through of upgrading our Copyright date to
extend to 2005 ... first run here was very simple ... change everything
where: grep 1996-2004 && the word 'Copyright' ... scanned through the
generated list with 'less' first, and after, to make sure that I only
picked up the right entries ...
to shared memory as soon as possible, ie, right after read_backend_variables.
The effective difference from the original code is that this happens
before instead of after read_nondefault_variables(), which loads GUC
information and is apparently capable of expanding the backend's memory
allocation more than you'd think it should. This should fix the
failure-to-attach-to-shared-memory reports we've been seeing on Windows.
Also clean up a few bits of unnecessarily grotty EXEC_BACKEND code.
that is, files are sought in the same directory as the referencing file.
Also allow absolute paths in @file constructs. Improve documentation
to actually say what is allowed in an included file.