was Tcl 8.4.8. The main changes are to remove the never-fully-implemented
code for multi-character collating elements, and to const-ify some stuff a
bit more fully. In combination with the recent security patch, this commit
brings us into line with Tcl 8.5.0.
Note that I didn't make any effort to duplicate a lot of cosmetic changes
that they made to bring their copy into line with their own style
guidelines, such as adding braces around single-line IF bodies. Most of
those we either had done already (such as ANSI-fication of function headers)
or there is no point because pgindent would undo the change anyway.
are shared with Tcl, since it's their code to begin with, and the patches
have been copied from Tcl 8.5.0. Problems:
CVE-2007-4769: Inadequate check on the range of backref numbers allows
crash due to out-of-bounds read.
CVE-2007-4772: Infinite loop in regex optimizer for pattern '($|^)*'.
CVE-2007-6067: Very slow optimizer cleanup for regex with a large NFA
representation, as well as crash if we encounter an out-of-memory condition
during NFA construction.
Part of the response to CVE-2007-6067 is to put a limit on the number of
states in the NFA representation of a regex. This seems needed even though
the within-the-code problems have been corrected, since otherwise the code
could try to use very large amounts of memory for a suitably-crafted regex,
leading to potential DOS by driving the system into swap, activating a kernel
OOM killer, etc.
Although there are certainly plenty of ways to drive the system into effective
DOS with poorly-written SQL queries, these problems seem worth treating as
security issues because many applications might accept regex search patterns
from untrustworthy sources.
Thanks to Will Drewry of Google for reporting these problems. Patches by Will
Drewry and Tom Lane.
Security: CVE-2007-4769, CVE-2007-4772, CVE-2007-6067
versions of gcc (I'm seeing it with Apple's gcc 4.0.1). I think the
reason we did not see this before was that the assert() macros in the
regex code were all no-ops till recently.
as we do (and upstream Tcl doesn't). The loop limit might be subject
to negotiation if anyone ever tries to do regex debugging in Far
Eastern languages, but for now 1000 seems plenty. CHR_MAX was right out :-(
Standard English uses "may", "can", and "might" in different ways:
may - permission, "You may borrow my rake."
can - ability, "I can lift that log."
might - possibility, "It might rain today."
Unfortunately, in conversational English, their use is often mixed, as
in, "You may use this variable to do X", when in fact, "can" is a better
choice. Similarly, "It may crash" is better stated, "It might crash".
comment line where output as too long, and update typedefs for /lib
directory. Also fix case where identifiers were used as variable names
in the backend, but as typedefs in ecpg (favor the backend for
indenting).
Backpatch to 8.1.X.
The specification of this function is as follows.
regexp_replace(source text, pattern text, replacement text, [flags
text])
returns text
Replace string that matches to regular expression in source text to
replacement text.
- pattern is regular expression pattern.
- replacement is replace string that can use '\1'-'\9', and '\&'.
'\1'-'\9': back reference to the n'th subexpression.
'\&' : entire matched string.
- flags can use the following values:
g: global (replace all)
i: ignore case
When the flags is not specified, case sensitive, replace the first
instance only.
Atsushi Ogawa
conversion of basic ASCII letters. Remove all uses of strcasecmp and
strncasecmp in favor of new functions pg_strcasecmp and pg_strncasecmp;
remove most but not all direct uses of toupper and tolower in favor of
pg_toupper and pg_tolower. These functions use the same notions of
case folding already developed for identifier case conversion. I left
the straight locale-based folding in place for situations where we are
just manipulating user data and not trying to match it to built-in
strings --- for example, the SQL upper() function is still locale
dependent. Perhaps this will prove not to be what's wanted, but at
the moment we can initdb and pass regression tests in Turkish locale.
(extracted from Tcl 8.4.1 release, as Henry still hasn't got round to
making it a separate library). This solves a performance problem for
multibyte, as well as upgrading our regexp support to match recent Tcl
and nearly match recent Perl.
postgres.h or c.h includes a system header (such as stdio.h or
stdlib.h), there's no need to specifically include it in any of the .c
files in the backend.
Neil Conway
Implement SQL99 SIMILAR TO as a synonym for our existing operator "~".
Implement SQL99 regular expression SUBSTRING(string FROM pat FOR escape).
Extend the definition to make the FOR clause optional.
Define textregexsubstr() to actually implement this feature.
Update the regression test to include these new string features.
All tests pass.
Rename the regular expression support routines from "pg95_xxx" to "pg_xxx".
Define CREATE CHARACTER SET in the parser per SQL99. No implementation yet.
Attached is a pacth against 7.2 which adds locale awareness to the
character classes of the regular expression engine.
...
> > I still think the xdigit class could be handled the same way the digit
> > class is (by enumeration rather than using the isxdigit function). That
> > saves you a cicle, and I don't think there's any loss.
>
> In fact, I will email you when I apply the original patch.
I miss that case :-(. Here is the pached patch.
...
Here is a patch which addresses Tatsuo's concerns (it does return an
static struct instead of constructing it).
definitions from K&R to ANSI C style, and fix broken assumption that
int and long are the same datatype. This repairs problems observed
on Alpha with regexps having between 32 and 63 states.
$(CC) $(CFLAGS) $(LDFLAGS) <object files> <extra-libraries> $(LIBS) -o $@
This form seemed to be the most portable, readable, and logical, but in any
case it's better than having a dozen different ones in the tree.
source directory. This involves mostly makefiles using $(srcdir) when they
might have used ".". (Regression tests don't work with this, yet.)
Sort out usage of CPPFLAGS, CFLAGS (and CXXFLAGS). Add "override" keyword
in most places, to preserve necessary flags even when the user overrode the
flags.
As Bruce mentioned, this is due to the conflict among changes we made.
Included patches should fix the problem(I changed all MB to
MULTIBYTE). Please let me know if you have further problem.
P.S. I did not include pathces to configure and gram.c to save the
file size(configure.in and gram.y modified).
From: t-ishii@sra.co.jp
Attached are patches to enhance the multi-byte support. (patches are
against 7/18 snapshot)
* determine encoding at initdb/createdb rather than compile time
Now initdb/createdb has an option to specify the encoding. Also, I
modified the syntax of CREATE DATABASE to accept encoding option. See
README.mb for more details.
For this purpose I have added new column "encoding" to pg_database.
Also pg_attribute and pg_class are changed to catch up the
modification to pg_database. Actually I haved added pg_database_mb.h,
pg_attribute_mb.h and pg_class_mb.h. These are used only when MB is
enabled. The reason having separate files is I couldn't find a way to
use ifdef or whatever in those files. I have to admit it looks
ugly. No way.
* support for PGCLIENTENCODING when issuing COPY command
commands/copy.c modified.
* support for SQL92 syntax "SET NAMES"
See gram.y.
* support for LATIN2-5
* add UNICODE regression test case
* new test suite for MB
New directory test/mb added.
* clean up source files
Basic idea is to have MB's own subdirectory for easier maintenance.
These are include/mb and backend/utils/mb.
I have implemented a framework of encoding translation between the
backend and the frontend. Also I have added a new variable setting
command:
SET CLIENT_ENCODING TO 'encoding';
Other features include:
Latin1 support more 8 bit cleaness
See doc/README.mb for more details. Note that the pacthes are
against May 30 snapshot.
Tatsuo Ishii
Hi, here are patches I promised (against 6.3.2):
* character_length(), position(), substring() are now aware of
multi-byte characters
* add octet_length()
* add --with-mb option to configure
* new regression tests for EUC_KR
(contributed by "Soonmyung. Hong" <hong@lunaris.hanmesoft.co.kr>)
* add some test cases to the EUC_JP regression test
* fix problem in regress/regress.sh in case of System V
* fix toupper(), tolower() to handle 8bit chars
note that:
o patches for both configure.in and configure are
included. maybe the one for configure is not necessary.
o pg_proc.h was modified to add octet_length(). I used OIDs
(1374-1379) for that. Please let me know if these numbers are not
appropriate.
Attached you'll find a (big) patch that fixes make dep and make
depend in all Makefiles where I found it to be appropriate.
It also removes the dependency in Makefile.global for NAMEDATALEN
and OIDNAMELEN by making backend/catalog/genbki.sh and bin/initdb/initdb.sh
a little smarter.
This no longer requires initdb.sh that is turned into initdb with
a sed script when installing Postgres, hence initdb.sh should be
renamed to initdb (after the patch has been applied :-) )
This patch is against the 6.3 sources, as it took a while to
complete.
Please review and apply,
Cheers,
Jeroen van Vianen
Included are patches intended for allowing PostgreSQL to handle
multi-byte charachter sets such as EUC(Extende Unix Code), Unicode and
Mule internal code. With the MB patch you can use multi-byte character
sets in regexp and LIKE. The encoding system chosen is determined at
the compile time.
To enable the MB extension, you need to define a variable "MB" in
Makefile.global or in Makefile.custom. For further information please
take a look at README.mb under doc directory.
(Note that unlike "jp patch" I do not use modified GNU regexp any
more. I changed Henry Spencer's regexp coming with PostgreSQL.)
Makefile.global.
End result, if all goes well, should allow for much easier porting, since
there will no longer be a concept of a "port". Most, if not everything,
*should* be determined by configure, or by the compiler itself. Still
work to be done though :)