Improve our workaround for 'TeX capacity exceeded' in building PDF files.

In commit a5ec86a7c7 I wrote a quick hack
that reduced the number of TeX string pool entries created while converting
our documentation to PDF form.  That held the fort for awhile, but as of
HEAD we're back up against the same limitation.  It turns out that the
original coding of \FlowObjectSetup actually results in *three* string pool
entries being generated for every "flow object" (that is, potential
cross-reference target) in the documentation, and my previous hack only got
rid of one of them.  With a little more care, we can reduce the string
count to one per flow object plus one per actually-cross-referenced flow
object (about 115000 + 5000 as of current HEAD); that should work until
the documentation volume roughly doubles from where it is today.

As a not-incidental side benefit, this change also causes pdfjadetex to
stop emitting unreferenced hyperlink anchors (bookmarks) into the PDF file.
It had been making one willy-nilly for every flow object; now it's just one
per actually-cross-referenced object.  This results in close to a 2X
savings in PDF file size.  We will still want to run the output through
"jpdftweak" to get it to be compressed; but we no longer need removal of
unreferenced bookmarks, so we might be able to find a quicker tool for
that step.

Although the failure only affects HEAD and US-format output at the moment,
9.5 cannot be more than a few pages short of failing likewise, so it
will inevitably fail after a few rounds of minor-version release notes.
I don't have a lot of faith that we'll never hit the limit in the older
branches; and anyway it would be nice to get rid of jpdftweak across the
board.  Therefore, back-patch to all supported branches.
This commit is contained in:
Tom Lane 2015-11-10 15:59:59 -05:00
parent 5c90a2ffdd
commit 944b41fc00
1 changed files with 69 additions and 10 deletions

View File

@ -1,14 +1,37 @@
% doc/src/sgml/jadetex.cfg
%
% This file redefines FlowObjectSetup to eliminate one of the two control
% sequences it normally creates, thereby substantially reducing string usage
% and permitting the complete Postgres documentation to be built without
% overflowing a hard-to-expand TeX limit. The only known penalty is an
% increased number of TeX warnings about ignoring duplicate definitions.
% This file redefines \FlowObjectSetup and some related macros to greatly
% reduce the number of control sequence names created, and also to avoid
% creation of many useless hyperlink anchors (bookmarks) in PDF files.
%
% Curiously, we only see the failure when building PDF output --- plain PS
% output does not come anywhere close to overflowing the string table.
% There may be another solution hidden in that observation.
% The original coding of \FlowObjectSetup defined a control sequence x@LABEL
% for pretty nearly every flow object in the file, whether that object was
% cross-referenced or not. Worse yet, it created a hyperlink anchor for
% every such object, which not only bloated the output PDF with useless
% anchors but consumed an additional control sequence name per anchor.
% This results in overrunning TeX's limited-size string pool.
%
% To fix, extend \PageLabel's already-existing mechanism whereby a p@LABEL
% control sequence is filled in only for labels that are referenced by at
% least one \Pageref call. We now also fill in p@LABEL for labels that are
% referenced by a \Link. Then, we can drop x@LABEL entirely, and use p@LABEL
% to control emission of both a hyperlink anchor and a page-number label.
% Now, both of those things are emitted for all and only the flow objects
% that have either a hyperlink reference or a page-number reference.
% We consume about one control sequence name per flow object plus one per
% referenced object, which is a lot better than three per flow object.
%
% (With a more invasive patch, we could track the need for an anchor and a
% page-number label separately, but that would probably require two control
% sequences for every flow object. Besides, many objects that have one kind
% of reference will have the other one too; that's certainly true for objects
% referenced in either the TOC or the index, for example.)
%
%
% In addition to checking p@LABEL not x@LABEL, this version of \FlowObjectSetup
% is fixed to clear \Label and \Element whether or not it emits an anchor
% and page label. Failure to do that seems to explain some pre-existing bugs
% in which certain SGML constructs weren't correctly cross-referenced.
%
\def\FlowObjectSetup#1{%
\ifDoFOBSet
@ -16,6 +39,8 @@
\ifx\Label\@empty\let\Label\Element\fi
\fi
\ifx\Label\@empty\else
\expandafter\ifx\csname p@\Label\endcsname\relax
\else
\bgroup
\ifNestedLink
\else
@ -23,8 +48,42 @@
\PageLabel{\Label}%
\fi
\egroup
\let\Label\@empty
\let\Element\@empty
\fi
\let\Label\@empty
\let\Element\@empty
\fi
\fi
}
%
% Adjust \PageLabel so that the p@NAME control sequence acquires a correct
% value immediately; this seems to be needed to avoid scenarios wherein
% additional TeX runs are needed to reach a stable state of the .aux file.
%
\def\PageLabel#1{%
\@bsphack
\expandafter\ifx\csname p@#1\endcsname\relax
\else
\protected@write\@auxout{}%
{\string\pagelabel{#1}{\thepage}}%
% Ensure the p@NAME control sequence acquires correct value immediately
\expandafter\xdef\csname p@#1\endcsname{\thepage}%
\fi
\@esphack}
%
% In \Link, add code to emit an aux-file entry if the p@NAME sequence isn't
% defined. Much as in \@Setref, this ensures we'll process the referenced
% item correctly on the next TeX run.
%
\def\Link#1{%
\begingroup
\SetupICs{#1}%
\ifx\Label\@empty\let\Label\Element\fi
% \typeout{Made a Link at \the\inputlineno, to \Label}%
\hyper@linkstart{\LinkType}{\Label}%
\NestedLinktrue
% If p@NAME control sequence isn't defined, emit dummy def to aux file
% so it will get defined properly on next run, much as in \@Setref
\expandafter\ifx\csname p@\Label\endcsname\relax
\immediate\write\@mainaux{\string\pagelabel{\Label}{qqq}}%
\fi
}