As of commit d5563d7df, psql -c no longer implies -X, but not all of
our regression testing scripts had gotten that memo.
To ensure consistency of results across different developers, make
sure that *all* invocations of psql in all scripts in our tree
use -X, even where this is not what previously happened.
Michael Paquier and Tom Lane
commit 9043Fe390f4f0b4586cfe59cbd22314b9c3e2957 broke multibyte
regression tests because the commit removes the warning message when
temporary hash indexes is created, which has been added by commit
07af523870.
Back patched to 9.5 stable tree.
The expected-output files for these tests were broken by the recent
addition of a warning for hash indexes. Update them.
Also add a test case for GB18030 encoding, similar to the other ones.
This is a pretty weak test, but it's better than nothing.
1. Use new dropdb --if-exists option, to avoid alarming the user if
the database being dropped doesn't already exist.
2. Bail out if createdb fails.
3. exit 1 if the checks fail.
4. Make it executable.
Josh Kupershmidt, with some kibitzing by me.
must be used for the new database, except when copying from template0.
This is the same rule that we now enforce for locale settings, and it has
the same motivation: databases other than template0 might contain data that
would be invalid according to a different setting. This represents another
step in a continuing process of locking down ways in which encoding violations
could occur inside the backend. Per discussion of a few days ago.
In passing, fix pre-existing breakage of mbregress.sh, and fix up a couple
of ereport() calls in dbcommands.c that failed to specify sqlstate codes.
characters in all cases. Formerly we mostly just threw warnings for invalid
input, and failed to detect it at all if no encoding conversion was required.
The tighter check is needed to defend against SQL-injection attacks as per
CVE-2006-2313 (further details will be published after release). Embedded
zero (null) bytes will be rejected as well. The checks are applied during
input to the backend (receipt from client or COPY IN), so it no longer seems
necessary to check in textin() and related routines; any string arriving at
those functions will already have been validated. Conversion failure
reporting (for characters with no equivalent in the destination encoding)
has been cleaned up and made consistent while at it.
Also, fix a few longstanding errors in little-used encoding conversion
routines: win1251_to_iso, win866_to_iso, euc_tw_to_big5, euc_tw_to_mic,
mic_to_euc_tw were all broken to varying extents.
Patches by Tatsuo Ishii and Tom Lane. Thanks to Akio Ishida and Yasuo Ohgaki
for identifying the security issues.
> - corrects a bit the UTF-8 code from Tatsuo to allow Unicode 3.1
> characters (characters with values >= 0x10000, which are encoded on
> four bytes).
Also, update mb/expected/unicode.out. This is necessary since the
patches affetc the result of queries using UTF-8.
---------------------------------------------------------------
Hi,
I should have sent the patch earlier, but got delayed by other stuff.
Anyway, here is the patch:
- most of the functionality is only activated when MULTIBYTE is
defined,
- check valid UTF-8 characters, client-side only yet, and only on
output, you still can send invalid UTF-8 to the server (so, it's
only partly compliant to Unicode 3.1, but that's better than
nothing).
- formats with the correct number of columns (that's why I made it in
the first place after all), but only for UNICODE. However, the code
allows to plug-in routines for other encodings, as Tatsuo did for
the other multibyte functions.
- corrects a bit the UTF-8 code from Tatsuo to allow Unicode 3.1
characters (characters with values >= 0x10000, which are encoded on
four bytes).
- doesn't depend on the locale capabilities of the glibc (useful for
remote telnet).
I would like somebody to check it closely, as it is my first patch to
pgsql. Also, I created dummy .orig files, so that the two files I
created are included, I hope that's the right way.
Now, a lot of functionality is NOT included here, but I will keep that
for 7.3 :) That includes all string checking on the server side (which
will have to be a bit more optimised ;) ), and the input checking on
the client side for UTF-8, though that should not be difficult. It's
just to send the strings through mbvalidate() before sending them to
the server. Strong checking on UTF-8 strings is mandatory to be
compliant with Unicode 3.1+ .
Do I have time to look for a patch to include iso-8859-15 for 7.2 ?
The euro is coming 1. january 2002 (before 7.3 !) and over 280
millions people in Europe will need the euro sign and only iso-8859-15
and iso-8859-16 have it (and unfortunately, I don't think all Unices
will switch to Unicode in the meantime)....
err... yes, I know that this is not every single person in Europe that
uses PostgreSql, so it's not exactly 280m, but it's just a matter of
time ! ;)
I'll come back (on pgsql-hackers) later to ask a few questions
regarding the full unicode support (normalisation, collation,
regexes,...) on the server side :)
Here is the patch !
Patrice.
--
Patrice HÉDÉ ------------------------------- patrice à islande org -----
-- Isn't it weird how scientists can imagine all the matter of the
universe exploding out of a dot smaller than the head of a pin, but they
can't come up with a more evocative name for it than "The Big Bang" ?
-- What would _you_ call the creation of the universe ?
-- "The HORRENDOUS SPACE KABLOOIE !" - Calvin and Hobbes
------------------------------------------ http://www.islande.org/ -----
just use the portable form,
tr ABCDEFGHIJKLMNOPQRSTUVWXYZ abcdefghijklmnopqrstuvwxyz
There were a bunch of places that weren't paying attention to configure's
result anyway (including configure itself!?); clean them up too.
o allow to use Big5 (a Chinese encoding used in Taiwan) as a client
encoding. In this case the server side encoding should be EUC_TW
o add EUC_TW and Big5 test cases to the regression and the mb test
(contributed by Jonah Kuo)
o fix mistake in include/mb/pg_wchar.h. An encoding id for EUC_TW was
not correct (was 3 and now is 4)
o update documents (doc/README.mb and README.mb.jp)
o update psql helpfile (bin/psql/psqlHelp.h)
--
Tatsuo Ishii
t-ishii@sra.co.jp
As Bruce mentioned, this is due to the conflict among changes we made.
Included patches should fix the problem(I changed all MB to
MULTIBYTE). Please let me know if you have further problem.
P.S. I did not include pathces to configure and gram.c to save the
file size(configure.in and gram.y modified).
From: t-ishii@sra.co.jp
Attached are patches to enhance the multi-byte support. (patches are
against 7/18 snapshot)
* determine encoding at initdb/createdb rather than compile time
Now initdb/createdb has an option to specify the encoding. Also, I
modified the syntax of CREATE DATABASE to accept encoding option. See
README.mb for more details.
For this purpose I have added new column "encoding" to pg_database.
Also pg_attribute and pg_class are changed to catch up the
modification to pg_database. Actually I haved added pg_database_mb.h,
pg_attribute_mb.h and pg_class_mb.h. These are used only when MB is
enabled. The reason having separate files is I couldn't find a way to
use ifdef or whatever in those files. I have to admit it looks
ugly. No way.
* support for PGCLIENTENCODING when issuing COPY command
commands/copy.c modified.
* support for SQL92 syntax "SET NAMES"
See gram.y.
* support for LATIN2-5
* add UNICODE regression test case
* new test suite for MB
New directory test/mb added.
* clean up source files
Basic idea is to have MB's own subdirectory for easier maintenance.
These are include/mb and backend/utils/mb.