Describe would claim that no tuples will be returned. Only affects
SELECTs added to non-SELECT base queries by rewrite rules. If you
want to see the output of such a select, you gotta use 'simple Query'
protocol.
DestReceiver pointers instead of just CommandDest values. The DestReceiver
is made at the point where the destination is selected, rather than
deep inside the executor. This cleans up the original kluge implementation
of tstoreReceiver.c, and makes it easy to support retrieving results
from utility statements inside portals. Thus, you can now do fun things
like Bind and Execute a FETCH or EXPLAIN command, and it'll all work
as expected (e.g., you can Describe the portal, or use Execute's count
parameter to suspend the output partway through). Implementation involves
stuffing the utility command's output into a Tuplestore, which would be
kind of annoying for huge output sets, but should be quite acceptable
for typical uses of utility commands.
the column by table OID and column number, if it's a simple column
reference. Along the way, get rid of reskey/reskeyop fields in Resdoms.
Turns out that representation was not convenient for either the planner
or the executor; we can make the planner deliver exactly what the
executor wants with no more effort.
initdb forced due to change in stored rule representation.
which does the same thing. Perhaps at one time there was a reason to
allow plan nodes to store their result types in different places, but
AFAICT that's been unnecessary for a good while.
avoids 'input buffer overflow' failure on long literals, improves
performance, gives the right answer for line position in functions
containing multiline literals, suppresses annoying compiler warnings,
and generally is so much better I wonder why we didn't do it before.
Per recent discussion on pgsql-general, this is appropriate for spec
compliance, and has the nice side-effect of easing porting from old
pg_dump files that exhibit the 59.999=>60.000 roundoff problem.
implementation limits, do not issue an ERROR; instead issue a NOTICE and use
the max supported value. Per pgsql-general discussion of 28-Apr, this is
needed to allow easy porting from pre-7.3 releases where the limits were
higher.
Unrelated change in same area: accept GLOBAL TEMP/TEMPORARY as a synonym
for TEMPORARY, as per pgsql-hackers discussion of 15-Apr. We previously
rejected it, but that was based on a misreading of the spec --- SQL92's
GLOBAL temp tables are really closer to what we have than their LOCAL ones.
per report from Olivier Prenant. Also fix off-by-one space calculation
in ReadToc; this woould not have hurt us until we had more than 100
dependencies for a single object, but wrong is wrong.
(agollapudi@demandsolutions.com).
Also applied the RefCursor support patch by Nic Ferrier. This patch allows
you too return a get a result set from a function that returns a refcursor.
For example:
call.registerOutParameter(1, Types.OTHER);
call.execute();
ResultSet rs = (ResultSet) call.getObject(1);
Modified Files:
jdbc/org/postgresql/core/BaseStatement.java
jdbc/org/postgresql/jdbc1/AbstractJdbc1ResultSet.java
jdbc/org/postgresql/jdbc1/AbstractJdbc1Statement.java
jdbc/org/postgresql/jdbc1/Jdbc1CallableStatement.java
jdbc/org/postgresql/jdbc1/Jdbc1PreparedStatement.java
jdbc/org/postgresql/jdbc1/Jdbc1Statement.java
jdbc/org/postgresql/jdbc2/AbstractJdbc2ResultSet.java
jdbc/org/postgresql/jdbc2/Jdbc2CallableStatement.java
jdbc/org/postgresql/jdbc2/Jdbc2PreparedStatement.java
jdbc/org/postgresql/jdbc2/Jdbc2Statement.java
jdbc/org/postgresql/jdbc3/Jdbc3CallableStatement.java
jdbc/org/postgresql/jdbc3/Jdbc3PreparedStatement.java
jdbc/org/postgresql/jdbc3/Jdbc3Statement.java
Added Files:
jdbc/org/postgresql/PGRefCursorResultSet.java
jdbc/org/postgresql/jdbc1/Jdbc1RefCursorResultSet.java
jdbc/org/postgresql/jdbc2/Jdbc2RefCursorResultSet.java
jdbc/org/postgresql/jdbc3/Jdbc3RefCursorResultSet.java
jdbc/org/postgresql/test/jdbc2/RefCursorTest.java
Both plannable queries and utility commands are now always executed
within Portals, which have been revamped so that they can handle the
load (they used to be good only for single SELECT queries). Restructure
code to push command-completion-tag selection logic out of postgres.c,
so that it won't have to be duplicated between simple and extended queries.
initdb forced due to addition of a field to Query nodes.
Without this fix, CVS tip dumps core when running the regression tests
with geqo_threshold = 2. I would think that a similar patch might be
needed in 7.3, but cannot duplicate the failure in that branch --- so
for now, leave well enough alone.
of extended query features in new FE/BE protocol. TransactionCommandContext
is gone (PortalContext replaces it for some purposes), and QueryContext
has taken on a new meaning (MessageContext plays its old role).
that the types of untyped string-literal constants are deduced (ie,
when coerce_type is applied to 'em, that's what the type must be).
Remove the ancient hack of storing the input Param-types array as a
global variable, and put the info into ParseState instead. This touches
a lot of files because of adjustment of routine parameter lists, but
it's really not a large patch. Note: PREPARE statement still insists on
exact specification of parameter types, but that could easily be relaxed
now, if we wanted to do so.
I had inadvertently omitted it while rearranging things to support
length-counted incoming messages. Also, change the parser's API back
to accepting a 'char *' query string instead of 'StringInfo', as the
latter wasn't buying us anything except overhead. (I think when I put
it in I had some notion of making the parser API 8-bit-clean, but
seeing that flex depends on null-terminated input, that's not really
ever gonna happen.)
in all cases, leaked TopMemoryContext memory in others. Make the
interaction between SetClientEncoding and InitializeClientEncoding
cleaner and better documented. I suspect these changes should be
back-patched into 7.3, but will wait on Tatsuo's verification.
for tableID/columnID in RowDescription. (The latter isn't really
implemented yet though --- the backend always sends zeroes, and libpq
just throws away the data.)
initial values and runtime changes in selected parameters. This gets
rid of the need for an initial 'select pg_client_encoding()' query in
libpq, bringing us back to one message transmitted in each direction
for a standard connection startup. To allow server version to be sent
using the same GUC mechanism that handles other parameters, invent the
concept of a never-settable GUC parameter: you can 'show server_version'
but it's not settable by any GUC input source. Create 'lc_collate' and
'lc_ctype' never-settable parameters so that people can find out these
settings without need for pg_controldata. (These side ideas were all
discussed some time ago in pgsql-hackers, but not yet implemented.)
into a UNION that has some type coercions applied to the component
queries, so long as the qual itself does not reference any columns that
have such coercions. Per example from Jonathan Bartlett 24-Apr-03.
rewritten and the protocol is changed, but most elog calls are still
elog calls. Also, we need to contemplate mechanisms for controlling
all this functionality --- eg, how much stuff should appear in the
postmaster log? And what API should libpq expose for it?
have length words. COPY OUT reimplemented per new protocol: it doesn't
need \. anymore, thank goodness. COPY BINARY to/from frontend works,
at least as far as the backend is concerned --- libpq's PQgetline API
is not up to snuff, and will have to be replaced with something that is
null-safe. libpq uses message length words for performance improvement
(no cycles wasted rescanning long messages), but not yet for error
recovery.
Output \r\n termination on Win32.
Disallow literal carriage return as a data value,
backslash-carriage-return and \r still allowed.
Doc changes already committed.
with variable-width fields. No more truncation of long user names.
Also, libpq can now send its environment-variable-driven SET commands
as part of the startup packet, saving round trips to server.
fixing an order by problem for index metadata results.
Also includes removing some unused code as well as a fix to the toString
method on statement.
Modified Files:
jdbc/org/postgresql/jdbc1/AbstractJdbc1DatabaseMetaData.java
jdbc/org/postgresql/jdbc1/AbstractJdbc1Statement.java
account for NULLs; in hindsight this is obvious since the code for
the MCV-lists case would reduce to this when there are zero entries
in both lists. Per example from Alec Mitchell.
expressions, ARRAY(sub-SELECT) expressions, some array functions.
Polymorphic functions using ANYARRAY/ANYELEMENT argument and return
types. Some regression tests in place, documentation is lacking.
Joe Conway, with some kibitzing from Tom Lane.
on UPDATE.
This get's rid of the long standing annoyance that updating a row
that has foreign keys locks all the referenced rows even if the
foreign key values do not change.
The trick is to actually do a check identical to NO ACTION after an
eventually done UPDATE in the SET DEFAULT case. Since a SET DEFAULT
operation should have moved referencing rows to a new "home", a following
NO ACTION check can only fail if the column defaults of the referencing
table resulted in the key we actually deleted. Thanks to Stephan.
Jan
find out about it is to read the documentation that tells you how
dangerous it is. Add default_transaction_read_only to documentation;
seems to have been overlooked in patch that added read-only transactions.
Clean up check_guc comparison script, which has been suffering bit rot.
an unneeded memory context and some variables that are not used anymore.
It's pretty trivial and the regression tests pass fine. There's no
change in functionality, only deletion of unused code. I left an empty
function because maybe I'll need it for nested transactions.
Alvaro Herrera
harmless on signed-char machines but would lead to core dump in the
deadlock detection code if char is unsigned. Amazingly, this bug has
been here since 7.1 and yet wasn't reported till now. Thanks to Robert
Bruccoleri for providing the opportunity to track it down.
typing error in src/backend/libpq/be-secure.c ???
Long Description
In src/backend/libpq/be-secure.c: secure_write
on SSL_ERROR_WANT_WRITE call secure_read instead
secure_write again. May be is this a typing error?
Sergey N. Yatskevich (syatskevich@n21lab.gosniias.msk.ru)
page when it's read in, per pghackers discussion around 17-Feb. Add a
GUC variable zero_damaged_pages that causes the response to be a WARNING
followed by zeroing the page, rather than the normal ERROR; this is per
Hiroshi's suggestion that there needs to be a way to get at the data
in the rest of the table.
include output that vary depending on the python build one is
running. Basically, the order of keys in a dictionary is
non-deterministic, and that part of the test fails for me regularly.
I rewrote the test to work around this problem, and include a patch
file with that change and the change to the expected otuput as well.
Mike Meyer
Example:
test=# \d test
Table "public.test"
Column | Type | Modifiers
--------+---------+-----------
a | integer | not null
Indexes:
"test_pkey" PRIMARY KEY btree (a)
Check Constraints:
"$2" CHECK (a > 1)
Foreign Key Constraints:
"$1" FOREIGN KEY (a) REFERENCES parent(b)
Rules:
myrule AS ON INSERT TO test DO INSTEAD NOTHING
Triggers:
"asdf asdf" AFTER INSERT OR DELETE ON test FOR EACH STATEMENT EXECUTE
PROCEDURE update_pg_pwd_and_pg_group(),
mytrigger AFTER INSERT OR DELETE ON test FOR EACH ROW EXECUTE PROCEDURE
update_pg_pwd_and_pg_group()
I have minimised the double quoting of identifiers as much as I could
easily, and I will submit another patch when I have time to work on it that
will use a 'fmtId' function to determine it exactly.
I think it's a significant improvement in legibility...
Obviously the table example above is slightly degenerate in that not many
tables in production have heaps of (non-constraint) triggers and rules.
Christopher Kings-Lynne
(materialization into a tuple store) discussed on pgsql-hackers earlier.
I've updated the documentation and the regression tests.
Notes on the implementation:
- I needed to change the tuple store API slightly -- it assumes that it
won't be used to hold data across transaction boundaries, so the temp
files that it uses for on-disk storage are automatically reclaimed at
end-of-transaction. I added a flag to tuplestore_begin_heap() to control
this behavior. Is changing the tuple store API in this fashion OK?
- in order to store executor results in a tuple store, I added a new
CommandDest. This works well for the most part, with one exception: the
current DestFunction API doesn't provide enough information to allow the
Executor to store results into an arbitrary tuple store (where the
particular tuple store to use is chosen by the call site of
ExecutorRun). To workaround this, I've temporarily hacked up a solution
that works, but is not ideal: since the receiveTuple DestFunction is
passed the portal name, we can use that to lookup the Portal data
structure for the cursor and then use that to get at the tuple store the
Portal is using. This unnecessarily ties the Portal code with the
tupleReceiver code, but it works...
The proper fix for this is probably to change the DestFunction API --
Tom suggested passing the full QueryDesc to the receiveTuple function.
In that case, callers of ExecutorRun could "subclass" QueryDesc to add
any additional fields that their particular CommandDest needed to get
access to. This approach would work, but I'd like to think about it for
a little bit longer before deciding which route to go. In the mean time,
the code works fine, so I don't think a fix is urgent.
- (semi-related) I added a NO SCROLL keyword to DECLARE CURSOR, and
adjusted the behavior of SCROLL in accordance with the discussion on
-hackers.
- (unrelated) Cleaned up some SGML markup in sql.sgml, copy.sgml
Neil Conway
The first cleans up a couple of minor errors and ommissions
and adds tab completion support to more slash commands, e.g.
\dv.
The second is an attempt to add tab completion for schemas
and fully qualified relation names (e.g. public.mytable ).
I think this covers the TODO-item:
"Allow psql to do table completion for SELECT * FROM schema_part and table
completion for SELECT * FROM schema_name."
This happens via union selects querying:
- relation_name in current search path;
- schema_name;
- schema.relation_name
matching the current input string.
E.g:
SELECT p[TAB]
will produce a list of all appropriate relation names in the current search
path which begin with 'p', and also all schema names which begin with 'p';
\d pub[TAB]
will produce any relation names in the current search path and also
any schema names beginning with 'pub';
\d public.[TAB]
will produce a list of all relations in the schema 'public';
\d public.my[TAB]
produces all relation names beginning with 'my' in schema 'public'.
It seems to work for me; comments, suggestions, particularly regarding
the coding and queries, are very welcome.
Note that tables, indexes, views and sequences relations in the
'pg_catalog' namespace are excluded even though they are in
the current search path. I found not doing this produced annoying behaviour
when expanding names beginning with 'p'. People who work with system
tables a lot may not like this though; I can look for another solution
if necessary.
Ian Barwick
x[42] := whatever;
The facility is pretty primitive because it doesn't do array slicing and
it has the same semantics as array update in SQL (array must already
be non-null, etc). But it's a start.
full privs), also updated the regression test for this case.
Modified Files:
jdbc/org/postgresql/jdbc1/AbstractJdbc1DatabaseMetaData.java
jdbc/org/postgresql/test/jdbc2/DatabaseMetaDataTest.java
NULL key pointer, indicating that the existing scan key should be reused.
This behavior isn't used yet but will be needed for my planned fix to
the keys_are_unique code.
them as arrays of the internal datatype. This requires treating the
stavalues columns as 'anyarray' rather than 'text[]', which is not 100%
kosher but seems to work fine for the purposes we need for pg_statistic.
Perhaps in the future 'anyarray' will be allowed more generally.
refers to a non-DISTINCT output column of a DISTINCT ON subquery, or
if it refers to a function-returning-set, we cannot push it down.
But the old implementation refused to push down *any* quals if the
subquery had any such 'dangerous' outputs. Now we just look at the
output columns actually referenced by each qual expression. More code
than before, but probably no slower since we don't make unnecessary checks.
some of the algorithms for higher functions. I see about a factor of ten
speedup on the 'numeric' regression test, but it's unlikely that that test
is representative of real-world applications.
initdb forced due to change of on-disk representation for NUMERIC.
default datestyle. This is not portable between installations.
This patch sets DATESTYLE to ISO at the start of a pg_dump, so that the
dates written into the dump will be restorable onto any database,
regardless of how its default datestyle is set.
Oliver Elphick
Add ALTER SEQUENCE to modify min/max/increment/cache/cycle values
Also updated create sequence docs to mention NO MINVALUE, & NO MAXVALUE.
New Files:
doc/src/sgml/ref/alter_sequence.sgml
src/test/regress/expected/sequence.out
src/test/regress/sql/sequence.sql
ALTER SEQUENCE is NOT transactional. It behaves similarly to setval().
It matches the proposed SQL200N spec, as well as Oracle in most ways --
Oracle lacks RESTART WITH for some strange reason.
--
Rod Taylor <rbt@rbt.ca>
now, my changes seem to work. Some possible minor bugs got squished
on the way but I can't be sure without more feedback from people who
really put the code to the test.
The new patch mostly simplifies variable handling and reduces code
duplication. Changes in the command parser eliminate some redundant
variables (boolean state + depth counter), replaces some
"else if" constructs with switches, and so on. It is meant to be
applied together with my previous patch, although I hope they don't
conflict; I went back to the CVS version for this one.
One more thing I thought should perhaps be changed: an IGNOREEOF
value of n will ignore only n-1 EOFs. I didn't want to touch this
for fear of breaking existing applications, but it does seem a tad
illogical.
Jeroen T. Vermeulen
changes to the SQL to retrieve attributes for older versions of Postgres is
probably wise. Also, please make sure that I have mapped the storage types
to the correct storage names, as this is relatively poorly documented.
I think that this patch might need to be considered for back-porting to
7.3.3 since at the moment, people will be losing valuable information after
upgrades.
Will dump:
CREATE TABLE test (
a text,
b text,
c text,
d text
);
ALTER TABLE ONLY test ALTER COLUMN a SET STATISTICS 55;
ALTER TABLE ONLY test ALTER COLUMN a SET STORAGE PLAIN;
ALTER TABLE ONLY test ALTER COLUMN b SET STATISTICS 1000;
ALTER TABLE ONLY test ALTER COLUMN c SET STORAGE EXTERNAL;
ALTER TABLE ONLY test ALTER COLUMN d SET STORAGE MAIN;
Christopher Kings-Lynne
what is capable using integer-datatime timestamps. It does attempt
to exercise the maximum allowable timestamp range.
Also is a small error check when converting a timestamp from external
to internal format that prevents out of range timestamps from being
entered.
Files patched:
Index: src/backend/utils/adt/timestamp.c
Added range check to prevent out of range timestamps
from being used.
Index: src/test/regress/sql/horology.sql
Index: src/test/regress/expected/horology-no-DST-before-1970.out
Index: src/test/regress/expected/horology-solaris-1947.out
Limited range of timestamps being checked to
Jan 1, 4713 BC to Dec 31, 294276
In creating this patch, I have seen some definite problems with integer
timestamps and how they react when used near their limits. For example,
the following statement gives the correct result:
SELECT timestamp without time zone 'Jan 1, 4713 BC'
+ interval '109203489 days' AS "Dec 31, 294276";
However, this statement which is the logical inverse of the above
gives incorrect results:
SELECT timestamp without time zone '12/31/294276'
- timestamp without time zone 'Jan 1, 4713 BC' AS "109203489 Days";
John Cochran
7.3.2). It removes some code duplication and #ifdeffing, and some
unstructured ugliness such as tacky breaks and an unneeded continue.
Breaks up a large function into smaller functions and reduces required
nesting levels, and kills a variable or two.
Jeroen T. Vermeulen
patch fix it -- but this patch doesn't contains tests or docs fixes. I
will send it later.
Fixed outputs:
select to_char(x, '9999.999') as x,
to_char(x, 'S9999.999') as s,
to_char(x, 'SG9999.999') as sg,
to_char(x, 'MI9999.999') as mi,
to_char(x, 'PL9999.999') as pl,
to_char(x, 'PLMI9999.999') as plmi,
to_char(x, '9999.999SG') as sg2,
to_char(x, '9999.999PL') as pl2,
to_char(x, '9999.999MI') as mi2 from num;
Karel Zak
> >
> > - Add check in pg_dump to see if the value returned is the max /min
> > values and replace with NO MAXVALUE, NO MINVALUE.
> >
> > - Change START and INCREMENT to use START WITH and INCREMENT BY syntax.
> > This makes it a touch easier to port to other databases with sequences
> > (Oracle). PostgreSQL supports both syntaxes already.
>
> + char bufm[100],
> + bufx[100];
>
> This seems to be an arbitary size. Why not set it to the actual maximum
> length?
>
> Also:
>
> + snprintf(bufm, 100, INT64_FORMAT, SEQ_MINVALUE);
> + snprintf(bufx, 100, INT64_FORMAT, SEQ_MAXVALUE);
>
> sizeof(bufm), sizeof(bufx) is probably the more
> maintenance-friendly/standard way to do it.
I changed the code to use sizeof - but will wait for a response from
Peter before changing the size. It's consistent throughout the sequence
code to be 100 for this purpose.
Rod Taylor <rbt@rbt.ca>
Envrironment and Files section, explained exactly what -w
does)
This is a patch which allows pg_ctl to make an intelligent
guess as to the proper port when running 'psql -l' to
determine if the database has started up (the -w flag).
The environment variable PGPORT is used. If that is not found,
it checks if a specific port has been set inside the postgresql.conf
file. If it is has not, it uses the port that Postgres was
compiled with.
Greg Sabino Mullane greg@turnstep.com
> weird behavior across fork boundaries; (b) the additional memory space
> that has to be duplicated into child processes will cost something per
> child launch, even if the child never uses it. But these are only
> arguments that it might not *always* be a prudent thing to do, not that
> we shouldn't give the DBA the tool to do it if he wants. So fire away.
Here is a patch for the above, including a documentation update. It
creates a new GUC variable "preload_libraries", that accepts a list in
the form:
preload_libraries = '$libdir/mylib1:initfunc,$libdir/mylib2'
If ":initfunc" is omitted or not found, no initialization function is
executed, but the library is still preloaded. If "$libdir/mylib" isn't
found, the postmaster refuses to start.
In my testing with PL/R, it reduces the first call to a PL/R function
(after connecting) from almost 2 seconds, down to about 8 ms.
Joe Conway
> like that patch still needs some work...
Yeah. I'm really, really, *really* sorry for submitting it in the state
it was in. I shouldn't have done that just before moving to another
country. I found the problem last night, but couldn't get to a Net
connection until now.
The problem is in src/bin/psql/common.c, around line 250-335 somewhere
depending on the version. The 2nd and 3rd clauses of the "while" loop
condition:
(rstatus == PGRES_COPY_IN) &&
(rstatus == PGRES_COPY_OUT))
should of course be:
(rstatus != PGRES_COPY_IN) &&
(rstatus != PGRES_COPY_OUT))
Jeroen T. Vermeulen
This bug has been latent since 7.0 or maybe even further back, but it
was only exposed when parse_clause.c stopped suppressing duplicate
items (see its rev 1.96 of 18-Aug-02).
division and modulo functions, to avoid problems on OS X (which fails to
trap 0 divide at all) and Windows (which traps it in some bizarre
nonstandard fashion). Standardize on 'division by zero' as the one true
spelling of this error message. Add regression tests as suggested by
Neil Conway.
utility statement (DeclareCursorStmt) with a SELECT query dangling from
it, rather than a SELECT query with a few unusual fields in it. Add
code to determine whether a planned query can safely be run backwards.
If DECLARE CURSOR specifies SCROLL, ensure that the plan can be run
backwards by adding a Materialize plan node if it can't. Without SCROLL,
you get an error if you try to fetch backwards from a cursor that can't
handle it. (There is still some discussion about what the exact
behavior should be, but this is necessary infrastructure in any case.)
Along the way, make EXPLAIN DECLARE CURSOR work.
entire contents of the subplan into the tuplestore before we can return
any tuples. Instead, the tuplestore holds what we've already read, and
we fetch additional rows from the subplan as needed. Random access to
the previously-read rows works with the tuplestore, and doesn't affect
the state of the partially-read subplan. This is a step towards fixing
the problems with cursors over complex queries --- we don't want to
stick in Materialize nodes if they'll prevent quick startup for a cursor.
Applied patch from Rich Cullingford to fix a NPE in the absolute() method of result set.
Applied patch from Tarjei Skorgenes to fix a NPE when logging is enabled.
Modified Files:
jdbc/org/postgresql/core/BaseResultSet.java
jdbc/org/postgresql/jdbc1/AbstractJdbc1ResultSet.java
jdbc/org/postgresql/jdbc2/Array.java
jdbc/org/postgresql/util/PSQLException.java
cleaning out temp namespaces. We don't really want the server log to be
cluttered with 'Drop cascades to table foo' every time someone uses a
temp table...
problems in applications that may have a large number of files open,
such that libpq's socket number exceeds the range supported by fd_set.
From Chris Brown.
at database shutdown, and then load it again at database startup. This
preserves our hard-won knowledge of free space across restarts (given
an orderly shutdown, that is).
DELETE of an inheritance tree references another inherited relation.
This bug has been latent since 7.1; I'm still not quite sure why 7.1 and
7.2 don't manifest it (at least, they don't crash on a simple test case).
Adjustable threshold is gone in favor of keeping track of total requested
page storage and doling out proportional fractions to each relation
(with a minimum amount per relation, and some quantization of the results
to avoid thrashing with small changes in page counts). Provide special-
case code for indexes so as not to waste space storing useless page
free space counts. Restructure internal data storage to be a flat array
instead of list-of-chunks; this may cost a little more work in data
copying when reorganizing, but allows binary search to be used during
lookup_fsm_page_entry().
the join, per recent discussion on pgsql-sql. Not clear that this will
come up often in real queries, but it's not any more expensive to do it
right, so we may as well do it right.