This can be overriden by using NOREPLICATION on the CREATE ROLE
statement, but by default they will have it, making it backwards
compatible and "less surprising" (given that superusers normally
override all checks).
Add new function pg_sequence_parameters that returns a sequence's start,
minimum, maximum, increment, and cycle values, and use that in the view.
(bug #5662; design suggestion by Tom Lane)
Also slightly adjust the view's column order and permissions after review of
SQL standard.
Foreign tables are a core component of SQL/MED. This commit does
not provide a working SQL/MED infrastructure, because foreign tables
cannot yet be queried. Support for foreign table scans will need to
be added in a future patch. However, this patch creates the necessary
system catalog structure, syntax support, and support for ancillary
operations such as COMMENT and SECURITY LABEL.
Shigeru Hanada, heavily revised by Robert Haas
This is advantageous first because it allows us to hash the smaller table
regardless of the outer-join type, and second because hash join can be more
flexible than merge join in dealing with arbitrary join quals in a FULL
join. For merge join all the join quals have to be mergejoinable, but hash
join will work so long as there's at least one hashjoinable qual --- the
others can be any condition. (This is true essentially because we don't
keep per-inner-tuple match flags in merge join, while hash join can do so.)
To do this, we need a has-it-been-matched flag for each tuple in the
hashtable, not just one for the current outer tuple. The key idea that
makes this practical is that we can store the match flag in the tuple's
infomask, since there are lots of bits there that are of no interest for a
MinimalTuple. So we aren't increasing the size of the hashtable at all for
the feature.
To write this without turning the hash code into even more of a pile of
spaghetti than it already was, I rewrote ExecHashJoin in a state-machine
style, similar to ExecMergeJoin. Other than that decision, it was pretty
straightforward.
Instead, declare a public wrapper of the sole function using it for
external callers, so that they don't have to always pass a NULL
argument.
Author: Kevin Grittner
The contents of an unlogged table are WAL-logged; thus, they are not
available on standby servers and are truncated whenever the database
system enters recovery. Indexes on unlogged tables are also unlogged.
Unlogged GiST indexes are not currently supported.
This privilege is required to do Streaming Replication, instead of
superuser, making it possible to set up a SR slave that doesn't
have write permissions on the master.
Superuser privileges do NOT override this check, so in order to
use the default superuser account for replication it must be
explicitly granted the REPLICATION permissions. This is backwards
incompatible change, in the interest of higher default security.
The "date" type supports a wider range of dates than int64 timestamps do.
However, there is pre-int64-timestamp code in the planner that assumes that
all date values can be converted to timestamp with impunity. Fortunately,
what we really need out of the conversion is always a double (float8)
value; so even when the date is out of timestamp's range it's possible to
produce a sane answer. All we need is a code path that doesn't try to
force the result into int64. Per trouble report from David Rericha.
Back-patch to all supported versions. Although this is surely a corner
case, there's not much point in advertising a date range wider than
timestamp's if we will choke on such values in unexpected places.
This is to avoid use of the C++ keywords "bitand" and "bitor" in
the header file utils/varbit.h. Note the functions' SQL-level
names are not changed, only their C-level names.
In passing, make some comments in varbit.c conform to project-standard
layout.
"private" is a keyword in C++, so this breaks the poorly-enforced policy
that header files should be include-able in C++ code. Per report from
Craig Ringer and some investigation with cpluspluscheck.
cleanup stage to finish incomplete inserts or splits anymore. There was two
reasons for the cleanup step:
1. When a new tuple was inserted to a leaf page, the downlink in the parent
needed to be updated to contain (ie. to be consistent with) the new key.
Updating the parent in turn might require recursively updating the parent of
the parent. We now handle that by updating the parent while traversing down
the tree, so that when we insert the leaf tuple, all the parents are already
consistent with the new key, and the tree is consistent at every step.
2. When a page is split, we need to insert the downlink for the new right
page(s), and update the downlink for the original page to not include keys
that moved to the right page(s). We now handle that by setting a new flag,
F_FOLLOW_RIGHT, on the non-rightmost pages in the split. When that flag is
set, scans always follow the rightlink, regardless of the NSN mechanism used
to detect concurrent page splits. That way the tree is consistent right after
split, even though the downlink is still missing. This is very similar to the
way B-tree splits are handled. When the downlink is inserted in the parent,
the flag is cleared. To keep the insertion algorithm simple, when an
insertion sees an incomplete split, indicated by the F_FOLLOW_RIGHT flag, it
finishes the split before doing anything else.
These changes allow removing the whole "invalid tuple" mechanism, but I
retained the scan code to still follow invalid tuples correctly. While we
don't create any such tuples anymore, we want to handle them gracefully in
case you pg_upgrade a GiST index that has them. If we encounter any on an
insert, though, we just throw an error saying that you need to REINDEX.
The issue that got me into doing this is that if you did a checkpoint while
an insert or split was in progress, and the checkpoint finishes quickly so
that there is no WAL record related to the insert between RedoRecPtr and the
checkpoint record, recovery from that checkpoint would not know to finish
the incomplete insert. IOW, we have the same issue we solved with the
rm_safe_restartpoint mechanism during normal operation too. It's highly
unlikely to happen in practice, and this fix is far too large to backpatch,
so we're just going to live with in previous versions, but this refactoring
fixes it going forward.
With this patch, you don't get the annoying
'index "FOO" needs VACUUM or REINDEX to finish crash recovery' notices
anymore if you crash at an unfortunate moment.
It appears that this will be faster for all but the shortest strings;
at least one some platforms, memcmp() can use word-at-a-time comparisons.
Noah Misch, somewhat pared down.
On MacOS X, and apparently also on other BSD-derived systems, attaching
a debugger causes getppid() to return the pid of the debugging process
rather than the actual parent PID. As a result, debugging the
autovacuum launcher, startup process, or WAL sender on such systems
causes it to exit, because the previous coding of PostmasterIsAlive()
detects postmaster death by testing whether getppid() == PostmasterPid.
Work around that behavior by checking the return value of getppid()
more carefully. If it's PostmasterPid, the postmaster must be alive;
if it's 1, assume the postmaster is dead. If it's any other value,
assume we've been debugged and fall through to the less-reliable
kill() test.
Review by Tom Lane.
This case can arise if a transaction has written data, but only to
temporary tables. Loss of the commit record in case of a crash won't
matter, because the temporary tables will be lost anyway.
Reviewed by Heikki Linnakangas and Simon Riggs.
Since we're not multithreaded it only provides marginally useful
information, and it does require a newer version of the Platform SDK
than we target. We may want to reconsider this in the future along
with a fix for MinGW.
eval_const_expressions() can replace CaseTestExprs with constants when
the surrounding CASE's test expression is a constant. This confuses
ruleutils.c's heuristic for deparsing simple-form CASEs, leading to
Assert failures or "unexpected CASE WHEN clause" errors. I had put in
a hack solution for that years ago (see commit
514ce7a331 of 2006-10-01), but bug #5794
from Peter Speck shows that that solution failed to cover all cases.
Fortunately, there's a much better way, which came to me upon reflecting
that Peter's "CASE TRUE WHEN" seemed pretty redundant: we can "simplify"
the simple-form CASE to the general form of CASE, by simply omitting the
constant test expression from the rebuilt CASE construct. This is
intuitively valid because there is no need for the executor to evaluate
the test expression at runtime; it will never be referenced, because any
CaseTestExprs that would have referenced it are now replaced by constants.
This won't save a whole lot of cycles, since evaluating a Const is pretty
cheap, but a cycle saved is a cycle earned. In any case it beats kluging
ruleutils.c still further. So this patch improves const-simplification
and reverts the previous change in ruleutils.c.
Back-patch to all supported branches. The bug exists in 8.1 too, but it's
out of warranty.
After parsing a parenthesized subexpression, we must pop all pending
ANDs and NOTs off the stack, just like the case for a simple operand.
Per bug #5793.
Also fix clones of this routine in contrib/intarray and contrib/ltree,
where input of types query_int and ltxtquery had the same problem.
Back-patch to all supported versions.
Add support for collecting "minidump" style crash dumps on
Windows, by setting up an exception handling filter. Crash
dumps will be generated in PGDATA/crashdumps if the directory
is created (the existance of the directory is used as on/off
switch for the generation of the dumps).
Craig Ringer and Magnus Hagander
This prevents the word "waiting" from briefly disappearing from the ps
status line when ResolveRecoveryConflictWithVirtualXIDs begins a new
iteration of the outer loop.
Along the way, remove some useless pgstat_report_waiting() calls;
the startup process doesn't appear in pg_stat_activity.
Fujii Masao
These comments were not updated when we added the EXEC_BACKEND
mechanism for Windows, even though it rendered them inaccurate.
Also unify two unnecessarily-separate #ifdef __alpha code blocks.
We don't actually need optreset, because we can easily fix the code to
ensure that it's cleanly restartable after having completed a scan over the
argv array; which is the only case we need to restart in. Getting rid of
it avoids a class of interactions with the system libraries and allows
reversion of my change of yesterday in postmaster.c and postgres.c.
Back-patch to 8.4. Before that the getopt code was a bit different anyway.
The mingw people don't appear to care about compatibility with non-GNU
versions of getopt, so force use of our own copy of getopt on Windows.
Also, ensure that we make use of optreset when using our own copy.
Per report from Andrew Dunstan. Back-patch to all versions supported
on Windows.
This commit replaces pg_class.relistemp with pg_class.relpersistence;
and also modifies the RangeVar node type to carry relpersistence rather
than istemp. It also removes removes rd_istemp from RelationData and
instead performs the correct computation based on relpersistence.
For clarity, we add three new macros: RelationNeedsWAL(),
RelationUsesLocalBuffers(), and RelationUsesTempNamespace(), so that we
can clarify the purpose of each check that previous depended on
rd_istemp.
This is intended as infrastructure for the upcoming unlogged tables
patch, as well as for future possible work on global temporary tables.
We were failing to zero out some pg_stat_database counters that have
been added since the initial pgstats coding. This is a bug, but not
back-patching the fix since changing this behavior in a minor release
seems a cure worse than the disease.
Report and patch by Tomas Vondra.
Purely cosmetic patch to make our coding standards more consistent ---
we were doing symbolic some places and octal other places. This patch
fixes all C-coded uses of mkdir, chmod, and umask. There might be some
other calls I missed. Inconsistency noted while researching tablespace
directory permissions issue.
The original coding in tuplestore_trim() was only meant to work efficiently
in cases where each trim call deleted most of the tuples in the store.
Which, in fact, was the pattern of the original usage with a Material node
supporting mark/restore operations underneath a MergeJoin. However,
WindowAgg now uses tuplestores and it has considerably less friendly
trimming behavior. In particular it can attempt to trim one tuple at a
time off a large tuplestore. tuplestore_trim() had O(N^2) runtime in this
situation because of repeatedly shifting its tuple pointer array. Fix by
avoiding shifting the array until a reasonably large number of tuples have
been deleted. This can waste some pointer space, but we do still reclaim
the tuples themselves, so the percentage wastage should be pretty small.
Per Jie Li's report of slow percent_rank() evaluation. cume_dist() and
ntile() would certainly be affected as well, along with any other window
function that has a moving frame start and requires reading substantially
ahead of the current row.
Back-patch to 8.4, where window functions were introduced. There's no
need to tweak it before that.
Hot Standby conflicts only with tuples that were visible at
some point. So ignore tuples from aborted transactions or for
tuples updated/deleted during the inserting transaction when
generating the conflict transaction ids.
Following detailed analysis and test case by Noah Misch.
Original report covered btree delete records, correctly observed
by Heikki Linnakangas that this applies to other cases also.
Fix covers all sources of cleanup records via common code.
Recent versions of the Linux system header files cause xlogdefs.h to
believe that open_datasync should be the default sync method, whereas
formerly fdatasync was the default on Linux. open_datasync is a bad
choice, first because it doesn't actually outperform fdatasync (in fact
the reverse), and second because we try to use O_DIRECT with it, causing
failures on certain filesystems (e.g., ext4 with data=journal option).
This part of the patch is largely per a proposal from Marti Raudsepp.
More extensive changes are likely to follow in HEAD, but this is as much
change as we want to back-patch.
Also clean up confusing code and incorrect documentation surrounding the
fsync_writethrough option. Those changes shouldn't result in any actual
behavioral change, but I chose to back-patch them anyway to keep the
branches looking similar in this area.
In 9.0 and HEAD, also do some copy-editing on the WAL Reliability
documentation section.
Back-patch to all supported branches, since any of them might get used
on modern Linux versions.
First, avoid scanning the whole ProcArray once we know there
are at least commit_siblings active; second, skip the check
altogether if commit_siblings = 0.
Greg Smith
an old transaction running in the master, and a lot of transactions have
started and finished since, and a WAL-record is written in the gap between
the creating the running-xacts snapshot and WAL-logging it, recovery will fail
with "too many KnownAssignedXids" error. This bug was reported by
Joachim Wieland on Nov 19th.
In the same scenario, when fewer transactions have started so that all the
xids fit in KnownAssignedXids despite the first bug, a more serious bug
arises. We incorrectly initialize the clog code with the oldest still running
transaction, and when we see the WAL record belonging to a transaction with
an XID larger than one that committed already before the checkpoint we're
recovering from, we zero the clog page containing the already committed
transaction, leading to data loss.
In hindsight, trying to track xids in the known-assigned-xids array before
seeing the running-xacts record was too complicated. To fix that, hold
XidGenLock while the running-xacts snapshot is taken and WAL-logged. That
ensures that no transaction can begin or end in that gap, so that in recvoery
we know that the snapshot contains all transactions running at that point in
WAL.
There are some code paths, such as SPI_execute(), where we invoke
copyObject() on raw parse trees before doing parse analysis on them. Since
the bison grammar is capable of building heavily nested parsetrees while
itself using only minimal stack depth, this means that copyObject() can be
the front-line function that hits stack overflow before anything else does.
Accordingly, it had better have a check_stack_depth() call. I did a bit of
performance testing and found that this slows down copyObject() by only a
few percent, so the hit ought to be negligible in the context of complete
processing of a query.
Per off-list report from Toshihide Katayama. Back-patch to all supported
branches.
This doesn't involve any user-visible change in behavior, but will be
useful when the COPY routines are exposed to allow their use by Foreign
Data Wrapper routines, which will be able to use these routines to read
irregular CSV files, for example.
Avoid eating quite so much memory for large inheritance trees, by
reclaiming the space used by temporary copies of the original parsetree and
range table, as well as the workspace needed during planning. The cost is
needing to copy the finished plan trees out of the child memory context.
Although this looks like it ought to slow things down, my testing shows
it actually is faster, apparently because fewer interactions with malloc()
are needed and/or we can do the work within a more readily cacheable amount
of memory. That result might be platform-dependent, but I'll take it.
Per a gripe from John Papandriopoulos, in which it was pointed out that the
memory consumption actually grew as O(N^2) for sufficiently many child
tables, since we were creating N copies of the N-element range table.
1. Complain, rather than silently doing nothing, if an "invalid" tuple
is found on a leaf page. Per off-list discussion with Heikki.
2. Fix oversight in code that removes a GISTSearchItem from the search
queue: we have to reset lastHeap if this was the last heap item in the
parent GISTSearchTreeItem. Otherwise subsequent additions will do the
wrong thing. This was probably masked in early testing because in typical
cases the parent item would now be completely empty and would be deleted on
next call. You'd need a queued non-leaf page at exactly the same distance
as a heap tuple to expose the bug.
This commit represents a rather heavily editorialized version of
Teodor's builtin_knngist_itself-0.8.2 and builtin_knngist_proc-0.8.1
patches. I redid the opclass API to add a separate Distance method
instead of turning the Consistent method into an illogical mess,
fixed some bit-rot in the rbtree interfaces, and generally worked over
the code style and comments.
There's still no non-code documentation to speak of, but I'll work on
that separately. Some contrib-module changes are also yet to come
(right now, point <-> point is the only KNN-ified operator).
Teodor Sigaev and Tom Lane
This eliminates some crufty, special-purpose code and, as a non-trivial
side benefit, allows recovery.conf parameters to be unquoted.
Dimitri Fontaine, with review and cleanup by Alvaro Herrera, Itagaki
Takahiro, and me.
This is a heavily revised version of builtin_knngist_core-0.9. The
ordering operators are no longer mixed in with actual quals, which would
have confused not only humans but significant parts of the planner.
Instead, ordering operators are carried separately throughout planning and
execution.
Since the API for ambeginscan and amrescan functions had to be changed
anyway, this commit takes the opportunity to rationalize that a bit.
RelationGetIndexScan no longer forces a premature index_rescan call;
instead, callers of index_beginscan must call index_rescan too. Aside from
making the AM-side initialization logic a bit less peculiar, this has the
advantage that we do not make a useless extra am_rescan call when there are
runtime key values. AMs formerly could not assume that the key values
passed to amrescan were actually valid; now they can.
Teodor Sigaev and Tom Lane
There were corner cases in which the planner would attempt to inline such
a function, which would result in a failure at runtime due to loss of
information about exactly what the result record type is. Fix by disabling
inlining when the function's recorded result type is RECORD. There might
be some sub-cases where inlining could still be allowed, but this is a
simple and backpatchable fix, so leave refinements for another day.
Per bug #5777 from Nate Carson.
Back-patch to all supported branches. 8.1 happens to avoid a core-dump
here, but it still does the wrong thing.
Formerly we looked up the operators associated with each index (caching
them in relcache) and then the planner looked up the btree opfamily
containing such operators in order to build the btree-centric pathkey
representation that describes the index's sort order. This is quite
pointless for btree indexes: we might as well just use the index's opfamily
information directly. That saves syscache lookup cycles during planning,
and furthermore allows us to eliminate the relcache's caching of operators
altogether, which may help in reducing backend startup time.
I added code to plancat.c to perform the same type of double lookup
on-the-fly if it's ever faced with a non-btree amcanorder index AM.
If such a thing actually becomes interesting for production, we should
replace that logic with some more-direct method for identifying the
corresponding btree opfamily; but it's not worth spending effort on now.
There is considerably more to do pursuant to my recent proposal to get rid
of sort-operator-based representations of sort orderings, but this patch
grabs some of the low-hanging fruit. I'll look at the remainder of that
work after the current commitfest.
removing an infrequently occurring race condition in Hot Standby.
An xid must be assigned before a lock appears in shared memory,
rather than immediately after, else GetRunningTransactionLocks()
may see InvalidTransactionId, causing assertion failures during
lock processing on standby.
Bug report and diagnosis by Fujii Masao, fix by me.
This adds support for changing the schema of a conversion, operator,
operator class, operator family, text search configuration, text search
dictionary, text search parser, or text search template.
Dimitri Fontaine, with assorted corrections and other kibitzing.
After a SQL object is created, we provide an opportunity for security
or logging plugins to get control; for example, a security label provider
could use this to assign an initial security label to newly created
objects. The basic infrastructure is (hopefully) reusable for other types
of events that might require similar treatment.
KaiGai Kohei, with minor adjustments.
Forcibly releasing all leftover buffer pins should be unnecessary now
that we have a robust ResourceOwner mechanism, and it significantly
increases the cost of process shutdown. Instead, in an assert-enabled
build, assert that no pins are held; in a non-assert-enabled build, do
nothing.
supplied, also print the IP address. This allows IPv4 and IPv6 failures
to be distinguished. Also useful when a hostname resolves to multiple
IP addresses.
Also, remove use of inet_ntoa() and use our own inet_net_ntop() in all
places, including in libpq, because it is thread-safe.
This commit adds columns amoppurpose and amopsortfamily to pg_amop, and
column amcanorderbyop to pg_am. For the moment all the entries in
amcanorderbyop are "false", since the underlying support isn't there yet.
Also, extend the CREATE OPERATOR CLASS/ALTER OPERATOR FAMILY commands with
[ FOR SEARCH | FOR ORDER BY sort_operator_family ] clauses to allow the new
columns of pg_amop to be populated, and create pg_dump support for dumping
that information.
I also added some documentation, although it's perhaps a bit premature
given that the feature doesn't do anything useful yet.
Teodor Sigaev, Robert Haas, Tom Lane
Any flavor of ALTER <whatever> .. SET SCHEMA fails if (1) the object
is already in the new schema, (2) either the old or new schema is
a temp schema, or (3) either the old or new schema is the TOAST schema.
Extraced from a patch by Dimitri Fontaine, with additional hacking by me.
Currently, three conversion format specifiers are supported: %s for a
string, %L for an SQL literal, and %I for an SQL identifier. The latter
two are deliberately designed not to overlap with what sprintf() already
supports, in case we want to add more of sprintf()'s functionality here
later.
Patch by Pavel Stehule, heavily revised by me. Reviewed by Jeff Janes
and, in earlier versions, by Itagaki Takahiro and Tom Lane.
We no longer need the terminating zero entry in opfamily[], so get rid of
it. Also replace assorted ad-hoc looping logic with simple for and foreach
constructs. This code is now noticeably more readable than it was an hour
ago; credit to Robert for seeing that it could be simplified.
Avoid depending on LL notation, which is likely to not work in pre-C99
compilers; don't pointlessly use INT32_MIN/INT64_MIN in code that has
the numerical value hard-wired into it anyway; remove some gratuitous
style inconsistencies between pg_ltoa and pg_lltoa; fix int2 test case
so it actually tests int2.
This eliminates the need for inefficient implementions of this
functionality in both contrib/dblink and contrib/tablefunc, so remove
them. The upcoming patch implementing an in-core format() function
will also require this functionality.
In passing, add some regression tests.
Use INT_MIN rather than INT32_MIN as we do elsewhere in the code, and
try to work around nonexistence of INT64_MIN if necessary. Adjust the
new regression tests to something hopefully saner, per observation by
Tom Lane.
When using default autovacuum_vac_cost_limit, autovac_balance_cost relied
on VacuumCostLimit to contain the correct global value ... but after the
first time through in a particular worker process, it didn't, because we'd
trashed it in previous iterations. Depending on the state of other autovac
workers, this could result in a steady reduction of the effective
cost_limit setting as a particular worker processed more and more tables,
causing it to go slower and slower. Spotted by Simon Poole (bug #5759).
Fix by saving and restoring the GUC variables in the loop in do_autovacuum.
In passing, improve a few comments.
Back-patch to 8.3 ... the cost rebalancing code has been buggy since it was
put in.
A hand-coded implementation turns out to be much faster than calling
printf(). In passing, add a few more regresion tests.
Andres Freund, with assorted, mostly cosmetic changes.
As per the ancient comment for set_rel_width, it really wasn't much good
for relations that aren't plain tables: it would never find any stats and
would always fall back on datatype-based estimates, which are often pretty
silly. Fix that by copying up width estimates from the subquery planning
process.
At some point we might want to do this for CTEs too, but that would be a
significantly more invasive patch because the sub-PlannerInfo is no longer
accessible by the time it's needed. I refrained from doing anything about
that, partly for fear of breaking the unmerged CTE-related patches.
In passing, also generate less bogus width estimates for whole-row Vars.
Per a gripe from Jon Nelson.
If we have Limit->Result->Sort, the Result might be projecting a tlist
that contains a set-returning function. If so, it's possible for the
SRF to sometimes return zero rows, which means we could need to fetch
more than N rows from the Sort in order to satisfy LIMIT N.
So top-N sorting cannot be used in this scenario.
Fix things so that top-N sorting can be used in child Sort nodes of a
MergeAppend node, when there is a LIMIT and no intervening joins or
grouping. Actually doing this on the executor side isn't too bad,
but it's a bit messier to get the planner to cost it properly.
Per gripe from Robert Haas.
In passing, fix an oversight in the original top-N-sorting patch:
query_planner should not assume that a LIMIT can be used to make an
explicit sort cheaper when there will be grouping or aggregation in
between. Possibly this should be back-patched, but I'm not sure the
mistake is serious enough to be a real problem in practice.
In the previous coding, we simply issued ALTER SEQUENCE RESTART commands,
which do not roll back on error. This meant that an error between
truncating and committing left the sequences out of sync with the table
contents, with potentially bad consequences as were noted in a Warning on
the TRUNCATE man page.
To fix, create a new storage file (relfilenode) for a sequence that is to
be reset due to RESTART IDENTITY. If the transaction aborts, we'll
automatically revert to the old storage file. This acts just like a
rewriting ALTER TABLE operation. A penalty is that we have to take
exclusive lock on the sequence, but since we've already got exclusive lock
on its owning table, that seems unlikely to be much of a problem.
The interaction of this with usual nontransactional behaviors of sequence
operations is a bit weird, but it's hard to see what would be completely
consistent. Our choice is to discard cached-but-unissued sequence values
both when the RESTART is executed, and at rollback if any; but to not touch
the currval() state either time.
In passing, move the sequence reset operations to happen before not after
any AFTER TRUNCATE triggers are fired. The previous ordering was not
logically sensible, but was forced by the need to minimize inconsistency
if the triggers caused an error. Transactional rollback is a much better
solution to that.
Patch by Steve Singer, rather heavily adjusted by me.
The handle to the shared memory segment containing startup
parameters was sent as 32-bit even on 64-bit systems. Since
HANDLEs appear to be allocated sequentially this shouldn't
be a problem until we reach 2^32 open handles in the postmaster,
but a 64-bit value should be sent across as 64-bit, and not
zero out the top 32 bits.
Noted by Tom Lane.
temporary indexes are not WAL-logged. We used a constant LSN for temporary
indexes, on the assumption that we don't need to worry about concurrent page
splits in temporary indexes because they're only visible to the current
session. But that assumption is wrong, it's possible to insert rows and
split pages in the same session, while a scan is in progress. For example,
by opening a cursor and fetching some rows, and INSERTing new rows before
fetching some more.
Fix by generating fake increasing LSNs, used in place of real LSNs in
temporary GiST indexes.