postgresql/src/backend/snowball
Peter Eisentraut 721856ff24 Remove distprep
A PostgreSQL release tarball contains a number of prebuilt files, in
particular files produced by bison, flex, perl, and well as html and
man documentation.  We have done this consistent with established
practice at the time to not require these tools for building from a
tarball.  Some of these tools were hard to get, or get the right
version of, from time to time, and shipping the prebuilt output was a
convenience to users.

Now this has at least two problems:

One, we have to make the build system(s) work in two modes: Building
from a git checkout and building from a tarball.  This is pretty
complicated, but it works so far for autoconf/make.  It does not
currently work for meson; you can currently only build with meson from
a git checkout.  Making meson builds work from a tarball seems very
difficult or impossible.  One particular problem is that since meson
requires a separate build directory, we cannot make the build update
files like gram.h in the source tree.  So if you were to build from a
tarball and update gram.y, you will have a gram.h in the source tree
and one in the build tree, but the way things work is that the
compiler will always use the one in the source tree.  So you cannot,
for example, make any gram.y changes when building from a tarball.
This seems impossible to fix in a non-horrible way.

Second, there is increased interest nowadays in precisely tracking the
origin of software.  We can reasonably track contributions into the
git tree, and users can reasonably track the path from a tarball to
packages and downloads and installs.  But what happens between the git
tree and the tarball is obscure and in some cases non-reproducible.

The solution for both of these issues is to get rid of the step that
adds prebuilt files to the tarball.  The tarball now only contains
what is in the git tree (*).  Getting the additional build
dependencies is no longer a problem nowadays, and the complications to
keep these dual build modes working are significant.  And of course we
want to get the meson build system working universally.

This commit removes the make distprep target altogether.  The make
dist target continues to do its job, it just doesn't call distprep
anymore.

(*) - The tarball also contains the INSTALL file that is built at make
dist time, but not by distprep.  This is unchanged for now.

The make maintainer-clean target, whose job it is to remove the
prebuilt files in addition to what make distclean does, is now just an
alias to make distprep.  (In practice, it is probably obsolete given
that git clean is available.)

The following programs are now hard build requirements in configure
(they were already required by meson.build):

- bison
- flex
- perl

Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://www.postgresql.org/message-id/flat/e07408d9-e5f2-d9fd-5672-f53354e9305e@eisentraut.org
2023-11-06 15:18:04 +01:00
..
libstemmer Update snowball 2021-12-07 07:04:05 +01:00
stopwords Sync our Snowball stemmer dictionaries with current upstream. 2018-09-24 17:29:38 -04:00
.gitignore Convert cvsignore to gitignore, and add .gitignore for build targets. 2010-09-22 12:57:04 +02:00
Makefile Remove distprep 2023-11-06 15:18:04 +01:00
README Update snowball 2021-12-07 07:04:05 +01:00
dict_snowball.c Update copyright for 2023 2023-01-02 15:00:37 -05:00
meson.build Update copyright for 2023 2023-01-02 15:00:37 -05:00
snowball.sql.in Update copyright for 2023 2023-01-02 15:00:37 -05:00
snowball_create.pl Pre-beta mechanical code beautification. 2023-05-19 17:24:48 -04:00
snowball_func.sql.in Update copyright for 2023 2023-01-02 15:00:37 -05:00

README

src/backend/snowball/README

Snowball-Based Stemming
=======================

This module uses the word stemming code developed by the Snowball project,
http://snowballstem.org (formerly http://snowball.tartarus.org)
which is released by them under a BSD-style license.

The Snowball project does not often make formal releases; it's best
to pull from their git repository

git clone https://github.com/snowballstem/snowball.git

and then building the derived files is as simple as

cd snowball
make

At least on Linux, no platform-specific adjustment is needed.

Postgres' files under src/backend/snowball/libstemmer/ and
src/include/snowball/libstemmer/ are taken directly from the Snowball
files, with only some minor adjustments of file inclusions.  Note
that most of these files are in fact derived files, not original source.
The original sources are in the Snowball language, and are built using
the Snowball-to-C compiler that is also part of the Snowball project.
We choose to include the derived files in the PostgreSQL distribution
because most installations will not have the Snowball compiler available.

We are currently synced with the Snowball git commit
48a67a2831005f49c48ec29a5837640e23e54e6b (tag v2.2.0)
of 2021-11-10.

To update the PostgreSQL sources from a new Snowball version:

0. If you didn't do it already, "make -C snowball".

1. Copy the *.c files in snowball/src_c/ to src/backend/snowball/libstemmer
with replacement of "../runtime/header.h" by "header.h", for example

for f in .../snowball/src_c/*.c
do
    sed 's|\.\./runtime/header\.h|header.h|' $f >libstemmer/`basename $f`
done

Do not copy stemmers that are listed in libstemmer/modules.txt as
nonstandard, such as "german2" or "lovins".

2. Copy the *.c files in snowball/runtime/ to
src/backend/snowball/libstemmer, and edit them to remove direct inclusions
of system headers such as <stdio.h> --- they should only include "header.h".
(This removal avoids portability problems on some platforms where <stdio.h>
is sensitive to largefile compilation options.)

3. Copy the *.h files in snowball/src_c/ and snowball/runtime/
to src/include/snowball/libstemmer.  At this writing the header files
do not require any changes.

4. Check whether any stemmer modules have been added or removed.  If so, edit
the OBJS list in Makefile, the list of #include's in dict_snowball.c, and the
stemmer_modules[] table in dict_snowball.c, as well as the list in the
documentation in textsearch.sgml.  You might also need to change
the LANGUAGES list in Makefile and tsearch_config_languages in initdb.c.

5. The various stopword files in stopwords/ must be downloaded
individually from pages on the snowballstem.org website.
Be careful that these files must be stored in UTF-8 encoding.