postgresql/src/tools/pgindent
Peter Geoghegan 5bf748b86b Enhance nbtree ScalarArrayOp execution.
Commit 9e8da0f7 taught nbtree to handle ScalarArrayOpExpr quals
natively.  This works by pushing down the full context (the array keys)
to the nbtree index AM, enabling it to execute multiple primitive index
scans that the planner treats as one continuous index scan/index path.
This earlier enhancement enabled nbtree ScalarArrayOp index-only scans.
It also allowed scans with ScalarArrayOp quals to return ordered results
(with some notable restrictions, described further down).

Take this general approach a lot further: teach nbtree SAOP index scans
to decide how to execute ScalarArrayOp scans (when and where to start
the next primitive index scan) based on physical index characteristics.
This can be far more efficient.  All SAOP scans will now reliably avoid
duplicative leaf page accesses (just like any other nbtree index scan).
SAOP scans whose array keys are naturally clustered together now require
far fewer index descents, since we'll reliably avoid starting a new
primitive scan just to get to a later offset from the same leaf page.

The scan's arrays now advance using binary searches for the array
element that best matches the next tuple's attribute value.  Required
scan key arrays (i.e. arrays from scan keys that can terminate the scan)
ratchet forward in lockstep with the index scan.  Non-required arrays
(i.e. arrays from scan keys that can only exclude non-matching tuples)
"advance" without the process ever rolling over to a higher-order array.

Naturally, only required SAOP scan keys trigger skipping over leaf pages
(non-required arrays cannot safely end or start primitive index scans).
Consequently, even index scans of a composite index with a high-order
inequality scan key (which we'll mark required) and a low-order SAOP
scan key (which we won't mark required) now avoid repeating leaf page
accesses -- that benefit isn't limited to simpler equality-only cases.
In general, all nbtree index scans now output tuples as if they were one
continuous index scan -- even scans that mix a high-order inequality
with lower-order SAOP equalities reliably output tuples in index order.
This allows us to remove a couple of special cases that were applied
when building index paths with SAOP clauses during planning.

Bugfix commit 807a40c5 taught the planner to avoid generating unsafe
path keys: path keys on a multicolumn index path, with a SAOP clause on
any attribute beyond the first/most significant attribute.  These cases
are now all safe, so we go back to generating path keys without regard
for the presence of SAOP clauses (just like with any other clause type).
Affected queries can now exploit scan output order in all the usual ways
(e.g., certain "ORDER BY ... LIMIT n" queries can now terminate early).

Also undo changes from follow-up bugfix commit a4523c5a, which taught
the planner to produce alternative index paths, with path keys, but
without low-order SAOP index quals (filter quals were used instead).
We'll no longer generate these alternative paths, since they can no
longer offer any meaningful advantages over standard index qual paths.
Affected queries thereby avoid all of the disadvantages that come from
using filter quals within index scan nodes.  They can avoid extra heap
page accesses from using filter quals to exclude non-matching tuples
(index quals will never have that problem).  They can also skip over
irrelevant sections of the index in more cases (though only when nbtree
determines that starting another primitive scan actually makes sense).

There is a theoretical risk that removing restrictions on SAOP index
paths from the planner will break compatibility with amcanorder-based
index AMs maintained as extensions.  Such an index AM could have the
same limitations around ordered SAOP scans as nbtree had up until now.
Adding a pro forma incompatibility item about the issue to the Postgres
17 release notes seems like a good idea.

Author: Peter Geoghegan <pg@bowt.ie>
Author: Matthias van de Meent <boekewurm+postgres@gmail.com>
Reviewed-By: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-By: Matthias van de Meent <boekewurm+postgres@gmail.com>
Reviewed-By: Tomas Vondra <tomas.vondra@enterprisedb.com>
Discussion: https://postgr.es/m/CAH2-Wz=ksvN_sjcnD1+Bt-WtifRA5ok48aDYnq3pkKhxgMQpcw@mail.gmail.com
2024-04-06 11:47:10 -04:00
..
README Change example in pgindent README on "/*-----" comments. 2023-07-05 10:02:15 +03:00
exclude_file_patterns Exclude files generated by generate-wait_event_types.pl from pgindent 2023-12-31 18:06:56 +09:00
perltidyrc Make agreed-on updates in perltidy options. 2023-05-19 16:43:57 -04:00
pgindent Activate perlcritic InputOutput::RequireCheckedSyscalls and fix resulting warnings 2024-03-19 07:09:31 +01:00
pgindent.man Rename pgindent options 2023-12-20 22:37:57 +00:00
pgperltidy Allow and require passing files on command line of pgperltidy 2023-06-21 16:20:26 +02:00
typedefs.list Enhance nbtree ScalarArrayOp execution. 2024-04-06 11:47:10 -04:00

README

pgindent'ing the PostgreSQL source tree
=======================================

We run this process at least once in each development cycle,
to maintain uniform layout style in our C and Perl code.

You might find this blog post interesting:
http://adpgtech.blogspot.com/2015/05/running-pgindent-on-non-core-code-or.html


PREREQUISITES:

1) Install pg_bsd_indent in your PATH.  Its source code is in the
   sibling directory src/tools/pg_bsd_indent; see the directions
   in that directory's README file.

2) Install perltidy.  Please be sure it is version 20230309 (older and newer
   versions make different formatting choices, and we want consistency).
   You can get the correct version from
   https://cpan.metacpan.org/authors/id/S/SH/SHANCOCK/
   To install, follow the usual install process for a Perl module
   ("man perlmodinstall" explains it).  Or, if you have cpan installed,
   this should work:
   cpan SHANCOCK/Perl-Tidy-20230309.tar.gz
   Or if you have cpanm installed, you can just use:
   cpanm https://cpan.metacpan.org/authors/id/S/SH/SHANCOCK/Perl-Tidy-20230309.tar.gz

DOING THE INDENT RUN:

1) Change directory to the top of the source tree.

2) Download the latest typedef file from the buildfarm:

	wget -O src/tools/pgindent/typedefs.list https://buildfarm.postgresql.org/cgi-bin/typedefs.pl

   (See https://buildfarm.postgresql.org/cgi-bin/typedefs.pl?show_list for a full
   list of typedef files, if you want to indent some back branch.)

3) Run pgindent on the C files:

	src/tools/pgindent/pgindent .

   If any files generate errors, restore their original versions with
   "git checkout", and see below for cleanup ideas.

4) Indent the Perl code using perltidy:

	src/tools/pgindent/pgperltidy .

   If you want to use some perltidy version that's not in your PATH,
   first set the PERLTIDY environment variable to point to it.

5) Reformat the bootstrap catalog data files:

	./configure     # "make" will not work in an unconfigured tree
	cd src/include/catalog
	make reformat-dat-files
	cd ../../..

VALIDATION:

1) Check for any newly-created files using "git status"; there shouldn't
   be any.  (pgindent leaves *.BAK files behind if it has trouble, while
   perltidy leaves *.LOG files behind.)

2) Do a full test build:

	make -s clean
	make -s all	# look for unexpected warnings, and errors of course
	make check-world

   Your configure switches should include at least --enable-tap-tests
   or else much of the Perl code won't get exercised.
   The ecpg regression tests may well fail due to pgindent's updates of
   header files that get copied into ecpg output; if so, adjust the
   expected-files to match.

3) If you have the patience, it's worth eyeballing the "git diff" output
   for any egregiously ugly changes.  See below for cleanup ideas.


When you're done, "git commit" everything including the typedefs.list file
you used.

4) Add the newly created commits to the .git-blame-ignore-revs file so
   that "git blame" ignores the commits (for anybody that has opted-in
   to using the ignore file).  Follow the instructions that appear at
   the top of the .git-blame-ignore-revs file.

Another "git commit" will be required for your ignore file changes.

---------------------------------------------------------------------------

Cleaning up in case of failure or ugly output
---------------------------------------------

If you don't like the results for any particular file, "git checkout"
that file to undo the changes, patch the file as needed, then repeat
the indent process.

pgindent will reflow any comment block that's not at the left margin.
If this messes up manual formatting that ought to be preserved, protect
the comment block with some dashes:

	/*----------
	 * Text here will not be touched by pgindent.
	 */

Odd spacing around typedef names might indicate an incomplete typedefs list.

pgindent will mangle both declaration and definition of a C function whose
name matches a typedef.  Currently the best workaround is to choose
non-conflicting names.

pgindent can get confused by #if sequences that look correct to the compiler
but have mismatched braces/parentheses when considered as a whole.  Usually
that looks pretty unreadable to humans too, so best practice is to rearrange
the #if tests to avoid it.

Sometimes, if pgindent or perltidy produces odd-looking output, it's because
of minor bugs like extra commas.  Don't hesitate to clean that up while
you're at it.

---------------------------------------------------------------------------

BSD indent
----------

We have standardized on FreeBSD's indent, and renamed it pg_bsd_indent.
pg_bsd_indent does differ slightly from FreeBSD's version, mostly in
being more easily portable to non-BSD platforms.  Find it in the
sibling directory src/tools/pg_bsd_indent.

GNU indent, version 2.2.6, has several problems, and is not recommended.
These bugs become pretty major when you are doing >500k lines of code.
If you don't believe me, take a directory and make a copy.  Run pgindent
on the copy using GNU indent, and do a diff -r. You will see what I
mean. GNU indent does some things better, but mangles too.  For details,
see:

	http://archives.postgresql.org/pgsql-hackers/2003-10/msg00374.php
	http://archives.postgresql.org/pgsql-hackers/2011-04/msg01436.php

---------------------------------------------------------------------------

Which files are processed
-------------------------

The pgindent run processes (nearly) all PostgreSQL *.c and *.h files,
but we currently exclude *.y and *.l files, as well as *.c and *.h files
derived from *.y and *.l files.  Additional exceptions are listed
in exclude_file_patterns; see the notes therein for rationale.

Note that we do not exclude ecpg's header files from the run.  Some of them
get copied verbatim into ecpg's output, meaning that ecpg's expected files
may need to be updated to match.

The perltidy run processes all *.pl and *.pm files, plus a few
executable Perl scripts that are not named that way.  See the "find"
rules in pgperltidy for details.