Commit Graph

60 Commits

Author SHA1 Message Date
Peter Eisentraut 3d59da9ccd unaccent: Make generate_unaccent_rules.py Python 3 compatible
Python 2 is still supported.

Author: Hugh Ranalli <hugh@whtc.ca>
Discussion: https://www.postgresql.org/message-id/CAAhbUMNyZ+PhNr_mQ=G161K0-hvbq13Tz2is9M3WK+yX9cQOCw@mail.gmail.com
2019-01-04 11:12:31 +01:00
Bruce Momjian 97c39498e5 Update copyright for 2019
Backpatch-through: certain files through 9.4
2019-01-02 12:44:25 -05:00
Peter Eisentraut b6f3649bba Convert unaccent tests to UTF-8
This makes it easier to add new tests that are specific to Unicode
features.  The files were previously in KOI8-R.

Discussion: https://www.postgresql.org/message-id/8506.1545111362@sss.pgh.pa.us
2019-01-02 18:36:05 +01:00
Andres Freund 578b229718 Remove WITH OIDS support, change oid catalog column visibility.
Previously tables declared WITH OIDS, including a significant fraction
of the catalog tables, stored the oid column not as a normal column,
but as part of the tuple header.

This special column was not shown by default, which was somewhat odd,
as it's often (consider e.g. pg_class.oid) one of the more important
parts of a row.  Neither pg_dump nor COPY included the contents of the
oid column by default.

The fact that the oid column was not an ordinary column necessitated a
significant amount of special case code to support oid columns. That
already was painful for the existing, but upcoming work aiming to make
table storage pluggable, would have required expanding and duplicating
that "specialness" significantly.

WITH OIDS has been deprecated since 2005 (commit ff02d0a05280e0).
Remove it.

Removing includes:
- CREATE TABLE and ALTER TABLE syntax for declaring the table to be
  WITH OIDS has been removed (WITH (oids[ = true]) will error out)
- pg_dump does not support dumping tables declared WITH OIDS and will
  issue a warning when dumping one (and ignore the oid column).
- restoring an pg_dump archive with pg_restore will warn when
  restoring a table with oid contents (and ignore the oid column)
- COPY will refuse to load binary dump that includes oids.
- pg_upgrade will error out when encountering tables declared WITH
  OIDS, they have to be altered to remove the oid column first.
- Functionality to access the oid of the last inserted row (like
  plpgsql's RESULT_OID, spi's SPI_lastoid, ...) has been removed.

The syntax for declaring a table WITHOUT OIDS (or WITH (oids = false)
for CREATE TABLE) is still supported. While that requires a bit of
support code, it seems unnecessary to break applications / dumps that
do not use oids, and are explicit about not using them.

The biggest user of WITH OID columns was postgres' catalog. This
commit changes all 'magic' oid columns to be columns that are normally
declared and stored. To reduce unnecessary query breakage all the
newly added columns are still named 'oid', even if a table's column
naming scheme would indicate 'reloid' or such.  This obviously
requires adapting a lot code, mostly replacing oid access via
HeapTupleGetOid() with access to the underlying Form_pg_*->oid column.

The bootstrap process now assigns oids for all oid columns in
genbki.pl that do not have an explicit value (starting at the largest
oid previously used), only oids assigned later by oids will be above
FirstBootstrapObjectId. As the oid column now is a normal column the
special bootstrap syntax for oids has been removed.

Oids are not automatically assigned during insertion anymore, all
backend code explicitly assigns oids with GetNewOidWithIndex(). For
the rare case that insertions into the catalog via SQL are called for
the new pg_nextoid() function can be used (which only works on catalog
tables).

The fact that oid columns on system tables are now normal columns
means that they will be included in the set of columns expanded
by * (i.e. SELECT * FROM pg_class will now include the table's oid,
previously it did not). It'd not technically be hard to hide oid
column by default, but that'd mean confusing behavior would either
have to be carried forward forever, or it'd cause breakage down the
line.

While it's not unlikely that further adjustments are needed, the
scope/invasiveness of the patch makes it worthwhile to get merge this
now. It's painful to maintain externally, too complicated to commit
after the code code freeze, and a dependency of a number of other
patches.

Catversion bump, for obvious reasons.

Author: Andres Freund, with contributions by John Naylor
Discussion: https://postgr.es/m/20180930034810.ywp2c7awz7opzcfr@alap3.anarazel.de
2018-11-20 16:00:17 -08:00
Tom Lane a5322ca10f Make contrib/unaccent's unaccent() function work when not in search path.
Since the fixes for CVE-2018-1058, we've advised people to schema-qualify
function references in order to fix failures in code that executes under
a minimal search_path setting.  However, that's insufficient to make the
single-argument form of unaccent() work, because it looks up the "unaccent"
text search dictionary using the search path.

The most expedient answer seems to be to remove the search_path dependency
by making it look in the same schema that the unaccent() function itself
is declared in.  This will definitely work for the normal usage of this
function with the unaccent dictionary provided by the extension.
It's barely possible that there are people who were relying on the
search-path-dependent behavior to select other dictionaries with the same
name; but if there are any such people at all, they can still get that
behavior by writing unaccent('unaccent', ...), or possibly
unaccent('unaccent'::text::regdictionary, ...) if the lookup has to be
postponed to runtime.

Per complaint from Gunnlaugur Thor Briem.  Back-patch to all supported
branches.

Discussion: https://postgr.es/m/CAPs+M8LCex6d=DeneofdsoJVijaG59m9V0ggbb3pOH7hZO4+cQ@mail.gmail.com
2018-09-06 10:49:45 -04:00
Thomas Munro 5e8d670c31 Add Greek characters to unaccent.rules.
Author: Tasos Maschalidis
Reviewed-by: Michael Paquier, Tom Lane
Discussion: https://postgr.es/m/153495048900.1368.11566580687623014380%40wrigleys.postgresql.org
Discussion: https://postgr.es/m/VI1PR01MB38537EBD529FE5EE3FE9A5FEB5370%40VI1PR01MB3853.eurprd01.prod.exchangelabs.com
2018-09-02 07:12:24 +12:00
Tom Lane fb8697b31a Avoid unnecessary use of pg_strcasecmp for already-downcased identifiers.
We have a lot of code in which option names, which from the user's
viewpoint are logically keywords, are passed through the grammar as plain
identifiers, and then matched to string literals during command execution.
This approach avoids making words into lexer keywords unnecessarily.  Some
places matched these strings using plain strcmp, some using pg_strcasecmp.
But the latter should be unnecessary since identifiers would have been
downcased on their way through the parser.  Aside from any efficiency
concerns (probably not a big factor), the lack of consistency in this area
creates a hazard of subtle bugs due to different places coming to different
conclusions about whether two option names are the same or different.
Hence, standardize on using strcmp() to match any option names that are
expected to have been fed through the parser.

This does create a user-visible behavioral change, which is that while
formerly all of these would work:
	alter table foo set (fillfactor = 50);
	alter table foo set (FillFactor = 50);
	alter table foo set ("fillfactor" = 50);
	alter table foo set ("FillFactor" = 50);
now the last case will fail because that double-quoted identifier is
different from the others.  However, none of our documentation says that
you can use a quoted identifier in such contexts at all, and we should
discourage doing so since it would break if we ever decide to parse such
constructs as true lexer keywords rather than poor man's substitutes.
So this shouldn't create a significant compatibility issue for users.

Daniel Gustafsson, reviewed by Michael Paquier, small changes by me

Discussion: https://postgr.es/m/29405B24-564E-476B-98C0-677A29805B84@yesql.se
2018-01-26 18:25:14 -05:00
Bruce Momjian 9d4649ca49 Update copyright for 2018
Backpatch-through: certain files through 9.3
2018-01-02 23:30:12 -05:00
Peter Eisentraut 0e1539ba0d Add some const decorations to prototypes
Reviewed-by: Fabien COELHO <coelho@cri.ensmp.fr>
2017-11-10 13:38:57 -05:00
Tom Lane ec0a69e49b Extend the default rules file for contrib/unaccent with Vietnamese letters.
Improve generate_unaccent_rules.py to handle composed characters whose base
is another composed character rather than a plain letter.  The net effect
of this is to add a bunch of multi-accented Vietnamese characters to
unaccent.rules.

Original complaint from Kha Nguyen, diagnosis of the script's shortcoming
by Thomas Munro.

Dang Minh Huong and Michael Paquier

Discussion: https://postgr.es/m/CALo3sF6EC8cy1F2JUz=GRf5h4LMUJTaG3qpdoiLrNbWEXL-tRg@mail.gmail.com
2017-08-16 16:51:56 -04:00
Tom Lane 382ceffdf7 Phase 3 of pgindent updates.
Don't move parenthesized lines to the left, even if that means they
flow past the right margin.

By default, BSD indent lines up statement continuation lines that are
within parentheses so that they start just to the right of the preceding
left parenthesis.  However, traditionally, if that resulted in the
continuation line extending to the right of the desired right margin,
then indent would push it left just far enough to not overrun the margin,
if it could do so without making the continuation line start to the left of
the current statement indent.  That makes for a weird mix of indentations
unless one has been completely rigid about never violating the 80-column
limit.

This behavior has been pretty universally panned by Postgres developers.
Hence, disable it with indent's new -lpl switch, so that parenthesized
lines are always lined up with the preceding left paren.

This patch is much less interesting than the first round of indent
changes, but also bulkier, so I thought it best to separate the effects.

Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
2017-06-21 15:35:54 -04:00
Noah Misch 3a0d473192 Use wrappers of PG_DETOAST_DATUM_PACKED() more.
This makes almost all core code follow the policy introduced in the
previous commit.  Specific decisions:

- Text search support functions with char* and length arguments, such as
  prsstart and lexize, may receive unaligned strings.  I doubt
  maintainers of non-core text search code will notice.

- Use plain VARDATA() on values detoasted or synthesized earlier in the
  same function.  Use VARDATA_ANY() on varlenas sourced outside the
  function, even if they happen to always have four-byte headers.  As an
  exception, retain the universal practice of using VARDATA() on return
  values of SendFunctionCall().

- Retain PG_GETARG_BYTEA_P() in pageinspect.  (Page images are too large
  for a one-byte header, so this misses no optimization.)  Sites that do
  not call get_page_from_raw() typically need the four-byte alignment.

- For now, do not change btree_gist.  Its use of four-byte headers in
  memory is partly entangled with storage of 4-byte headers inside
  GBT_VARKEY, on disk.

- For now, do not change gtrgm_consistent() or gtrgm_distance().  They
  incorporate the varlena header into a cache, and there are multiple
  credible implementation strategies to consider.
2017-03-12 19:35:34 -04:00
Peter Eisentraut f21a563d25 Move some things from builtins.h to new header files
This avoids that builtins.h has to include additional header files.
2017-01-20 20:29:53 -05:00
Bruce Momjian 1d25779284 Update copyright via script for 2017 2017-01-03 13:48:53 -05:00
Robert Haas 202ac08c08 Update unaccent extension for parallel query.
All functions provided by this extension are PARALLEL SAFE.

Andreas Karlsson
2016-06-14 14:55:49 -04:00
Teodor Sigaev ce91b9209f fix typo in comment 2016-03-16 17:18:14 +03:00
Teodor Sigaev 9a206d063c Improve script generating unaccent rules
Script now use the standard Unicode transliterator Latin-ASCII.

Author: Leonard Benedetti
2016-03-16 16:47:03 +03:00
Bruce Momjian ee94300446 Update copyright for 2016
Backpatch certain files through 9.1
2016-01-02 13:33:40 -05:00
Teodor Sigaev 1bbd52cb9a Make unaccent handle all diacritics known to Unicode, and expand ligatures correctly
Add Python script for buiding unaccent.rules from Unicode data. Don't
backpatch because unaccent changes may require tsvector/index
rebuild.

Thomas Munro <thomas.munro@enterprisedb.com>
2015-09-04 12:51:53 +03:00
Bruce Momjian 4baaf863ec Update copyright for 2015
Backpatch certain files through 9.0
2015-01-06 11:43:47 -05:00
Andres Freund d153b80161 Fix typos in some error messages thrown by extension scripts when fed to psql.
Some of the many error messages introduced in 458857cc missed 'FROM
unpackaged'. Also e016b724 and 45ffeb7e forgot to quote extension
version numbers.

Backpatch to 9.1, just like 458857cc which introduced the messages. Do
so because the error messages thrown when the wrong command is copy &
pasted aren't easy to understand.
2014-08-25 18:30:37 +02:00
Noah Misch 0ffc201a51 Add file version information to most installed Windows binaries.
Prominent binaries already had this metadata.  A handful of minor
binaries, such as pg_regress.exe, still lack it; efforts to eliminate
such exceptions are welcome.

Michael Paquier, reviewed by MauMau.
2014-07-14 14:07:52 -04:00
Tom Lane 5a421a47eb Fix inadequately-sized output buffer in contrib/unaccent.
The output buffer size in unaccent_lexize() was calculated as input string
length times pg_database_encoding_max_length(), which effectively assumes
that replacement strings aren't more than one character.  While that was
all that we previously documented it to support, the code actually has
always allowed replacement strings of arbitrary length; so if you tried
to make use of longer strings, you were at risk of buffer overrun.  To fix,
use an expansible StringInfo buffer instead of trying to determine the
maximum space needed a-priori.

This would be a security issue if unaccent rules files could be installed
by unprivileged users; but fortunately they can't, so in the back branches
the problem can be labeled as improper configuration by a superuser.
Nonetheless, a memory stomp isn't a nice way of reacting to improper
configuration, so let's back-patch the fix.
2014-07-01 11:23:21 -04:00
Tom Lane 03a25cec8d Issue a WARNING about invalid rule file format in contrib/unaccent.
We were already issuing a WARNING, albeit only elog not ereport, for
duplicate source strings; so warning rather than just being stoically
silent seems like the best thing to do here.  Arguably both of these
complaints should be upgraded to ERRORs, but that might be more
behavioral change than people want.

Note: the faulty line is already printed via an errcontext hook,
so there's no need for more information than these messages provide.
2014-06-30 22:03:37 -04:00
Tom Lane 1b2488731c Allow multi-character source strings in contrib/unaccent.
This could be useful in languages where diacritic signs are represented as
separate characters; more generally it supports using unaccent dictionaries
for substring substitutions beyond narrowly conceived "diacritic removal".
In any case, since the rule-file parser doesn't complain about
multi-character source strings, it behooves us to do something unsurprising
with them.
2014-06-30 21:46:29 -04:00
Tom Lane 97c40ce614 Allow empty replacement strings in contrib/unaccent.
This is useful in languages where diacritic signs are represented as
separate characters; it's also one step towards letting unaccent be used
for arbitrary substring substitutions.

In passing, improve the user documentation for unaccent, which was sadly
vague about some important details.

Mohammad Alhashash, reviewed by Abhijit Menon-Sen
2014-06-30 20:51:30 -04:00
Peter Eisentraut e7128e8dbb Create function prototype as part of PG_FUNCTION_INFO_V1 macro
Because of gcc -Wmissing-prototypes, all functions in dynamically
loadable modules must have a separate prototype declaration.  This is
meant to detect global functions that are not declared in header files,
but in cases where the function is called via dfmgr, this is redundant.
Besides filling up space with boilerplate, this is a frequent source of
compiler warnings in extension modules.

We can fix that by creating the function prototype as part of the
PG_FUNCTION_INFO_V1 macro, which such modules have to use anyway.  That
makes the code of modules cleaner, because there is one less place where
the entry points have to be listed, and creates an additional check that
functions have the right prototype.

Remove now redundant prototypes from contrib and other modules.
2014-04-18 00:03:19 -04:00
Bruce Momjian 7e04792a1c Update copyright for 2014
Update all files in head, and files COPYRIGHT and legal.sgml in all back
branches.
2014-01-07 16:05:30 -05:00
Bruce Momjian 0dbf9a6a91 unaccent: Revert patch 9299f61798
The reverted patch to change functions from strict to immutable was
incorrect and needs additional research.
2013-11-18 15:54:34 -05:00
Bruce Momjian 9299f61798 unaccent: mark unaccent() functions as immutable
Suggestion from Pavel Stehule
2013-10-08 12:20:36 -04:00
Bruce Momjian 9af4159fce pgindent run for release 9.3
This is the first run of the Perl-based pgindent script.  Also update
pgindent instructions.
2013-05-29 16:58:43 -04:00
Heikki Linnakangas 4b06c1820a The data structure used in unaccent is a trie, not suffix tree.
Fix the term used in variable and struct names, and comments.

Alexander Korotkov
2013-05-08 20:58:50 +03:00
Bruce Momjian bd61a623ac Update copyrights for 2013
Fully update git head, and update back branches in ./COPYRIGHT and
legal.sgml files.
2013-01-01 17:15:01 -05:00
Peter Eisentraut 48658a1b81 Fix some typos
Josh Kupershmidt
2012-04-22 19:23:47 +03:00
Bruce Momjian e126958c2e Update copyright notices for year 2012. 2012-01-01 18:01:58 -05:00
Tom Lane ced3a93ccb Fix assorted bugs in contrib/unaccent's configuration file parsing.
Make it use t_isspace() to identify whitespace, rather than relying on
sscanf which is known to get it wrong on some platform/locale combinations.
Get rid of fixed-size buffers.  Make it actually continue to parse the file
after ignoring a line with untranslatable characters, as was obviously
intended.

The first of these issues is per gripe from J Smith, though not exactly
either of his proposed patches.
2011-11-07 11:50:18 -05:00
Tom Lane 458857cc9d Throw a useful error message if an extension script file is fed to psql.
We have seen one too many reports of people trying to use 9.1 extension
files in the old-fashioned way of sourcing them in psql.  Not only does
that usually not work (due to failure to substitute for MODULE_PATHNAME
and/or @extschema@), but if it did work they'd get a collection of loose
objects not an extension.  To prevent this, insert an \echo ... \quit
line that prints a suitable error message into each extension script file,
and teach commands/extension.c to ignore lines starting with \echo.
That should not only prevent any adverse consequences of loading a script
file the wrong way, but make it crystal clear to users that they need to
do it differently now.

Tom Lane, following an idea of Andrew Dunstan's.  Back-patch into 9.1
... there is not going to be much value in this if we wait till 9.2.
2011-10-12 15:45:03 -04:00
Bruce Momjian 6416a82a62 Remove unnecessary #include references, per pgrminclude script. 2011-09-01 10:04:27 -04:00
Peter Eisentraut f8ebe3bcc5 Support "make check" in contrib
Added a new option --extra-install to pg_regress to arrange installing
the respective contrib directory into the temporary installation.
This is currently not yet supported for Windows MSVC builds.

Updated the .gitignore files for contrib modules to ignore the
leftovers of a temp-install check run.

Changed the exit status of "make check" in a pgxs build (which still
does nothing) to 0 from 1.

Added "make check" in contrib to top-level "make check-world".
2011-04-25 22:27:11 +03:00
Peter Eisentraut 385942f46c Refix the unaccent regression test on MSVC properly
... for some value of "properly".  Instead of overriding REGRESS_OPTS,
set the variables ENCODING and NO_LOCALE, which is more expressive and
allows overriding by the user.  Fix vcregress.pl to handle that.
2011-04-19 22:52:52 +03:00
Andrew Dunstan b7b86924c6 Attempt to remedy buildfarm breakage caused by commit f536d4194. 2011-04-18 09:27:30 -04:00
Peter Eisentraut f536d41942 Rename pg_regress option --multibyte to --encoding
Also refactor things a little bit so that the same methods for setting
test locale and encoding can be used everywhere.
2011-04-15 08:42:05 +03:00
Tom Lane 0024e34898 Fix upgrade of contrib/intarray and contrib/unaccent from 9.0.
Take care of a couple of discrepancies between what you get from a fresh
install and what the first-draft update-from-unpackaged scripts produced.
2011-02-17 17:45:09 -05:00
Tom Lane 029fac2264 Avoid use of CREATE OR REPLACE FUNCTION in extension installation files.
It was never terribly consistent to use OR REPLACE (because of the lack of
comparable functionality for data types, operators, etc), and
experimentation shows that it's now positively pernicious in the extension
world.  We really want a failure to occur if there are any conflicts, else
it's unclear what the extension-ownership state of the conflicted object
ought to be.  Most of the time, CREATE EXTENSION will fail anyway because
of conflicts on other object types, but an extension defining only
functions can succeed, with bad results.
2011-02-13 22:54:52 -05:00
Tom Lane 629b3af27d Convert contrib modules to use the extension facility.
This isn't fully tested as yet, in particular I'm not sure that the
"foo--unpackaged--1.0.sql" scripts are OK.  But it's time to get some
buildfarm cycles on it.

sepgsql is not converted to an extension, mainly because it seems to
require a very nonstandard installation process.

Dimitri Fontaine and Tom Lane
2011-02-13 22:54:49 -05:00
Bruce Momjian 5d950e3b0c Stamp copyrights for year 2011. 2011-01-01 13:18:15 -05:00
Bruce Momjian c0577c92a8 Mark unaccent functions as STABLE, rather than defaulting to VOLATILE. 2010-12-27 15:34:42 -05:00
Peter Eisentraut fc946c39ae Remove useless whitespace at end of lines 2010-11-23 22:34:55 +02:00
Tom Lane cc2c8152e6 Some more gitignore cleanups: cover contrib and PL regression test outputs.
Also do some further work in the back branches, where quite a bit wasn't
covered by Magnus' original back-patch.
2010-09-22 17:22:40 -04:00
Magnus Hagander fe9b36fd59 Convert cvsignore to gitignore, and add .gitignore for build targets. 2010-09-22 12:57:04 +02:00