postgresql

Commit Graph

Author	SHA1	Message	Date
Tom Lane	febd60bf5d	Fix pg_wchar_table[] to match revised ordering of the encoding ID enum. Add some comments so hopefully the next poor sod doesn't fall into the same trap. (Wrong comments are worse than none at all...)	2007-10-15 22:46:27 +00:00
Tom Lane	8468146b03	Fix the inadvertent libpq ABI breakage discovered by Martin Pitt: the renumbering of encoding IDs done between 8.2 and 8.3 turns out to break 8.2 initdb and psql if they are run with an 8.3beta1 libpq.so. For the moment we can rearrange the order of enum pg_enc to keep the same number for everything except PG_JOHAB, which isn't a problem since there are no direct references to it in the 8.2 programs anyway. (This does force initdb unfortunately.) Going forward, we want to fix things so that encoding IDs can be changed without an ABI break, and this commit includes the changes needed to allow libpq's encoding IDs to be treated as fully independent of the backend's. The main issue is that libpq clients should not include pg_wchar.h or otherwise assume they know the specific values of libpq's encoding IDs, since they might encounter version skew between pg_wchar.h and the libpq.so they are using. To fix, have libpq officially export functions needed for encoding name<=>ID conversion and validity checking; it was doing this anyway unofficially. It's still the case that we can't renumber backend encoding IDs until the next bump in libpq's major version number, since doing so will break the 8.2-era client programs. However the code is now prepared to avoid this type of problem in future. Note that initdb is no longer a libpq client: we just pull in the two source files we need directly. The patch also fixes a few places that were being sloppy about checking for an unrecognized encoding name.	2007-10-13 20:18:42 +00:00
Andrew Dunstan	55613bf9cd	Close previously open holes for invalidly encoded data to enter the database via builtin functions, as recently discussed on -hackers. chr() now returns a character in the database encoding. For UTF8 encoded databases the argument is treated as a Unicode code point. For other multi-byte encodings the argument must designate a strict ascii character, or an error is raised, as is also the case if the argument is 0. ascii() is adjusted so that it remains the inverse of chr(). The two argument form of convert() is gone, and the three argument form now takes a bytea first argument and returns a bytea. To cover this loss three new functions are introduced: . convert_from(bytea, name) returns text - converts the first argument from the named encoding to the database encoding . convert_to(text, name) returns bytea - converts the first argument from the database encoding to the named encoding . length(bytea, name) returns int - gives the length of the first argument in characters in the named encoding	2007-09-18 17:41:17 +00:00
Tatsuo Ishii	6041b92238	Make JOHAB client only encoding per discussions in pgsql-hackers "Server-side support of all encodings" around 2007/3/26. initdb required.	2007-04-15 10:56:30 +00:00
Tatsuo Ishii	75c6519ff6	Add new encoding EUC_JIS_2004 and SHIFT_JIS_2004, along with new conversions among EUC_JIS_2004, SHIFT_JIS_2004 and UTF-8. catalog version has been bump up.	2007-03-25 11:56:04 +00:00
Tom Lane	e9da20ab4d	Fix machine-dependent crash in sqlchar_to_unicode(). Get rid of bletcherous and unsafe manipulation of global encoding setting. Clean up libxml reporting mechanism a bit (it still looks like a dangling-pointer crash waiting to happen, though, not to mention being far less than sane from a localization standpoint).	2006-12-24 00:57:48 +00:00
Bruce Momjian	f99a569a2e	pgindent run for 8.2.	2006-10-04 00:30:14 +00:00
Tom Lane	c61a2f5841	Change the backend to reject strings containing invalidly-encoded multibyte characters in all cases. Formerly we mostly just threw warnings for invalid input, and failed to detect it at all if no encoding conversion was required. The tighter check is needed to defend against SQL-injection attacks as per CVE-2006-2313 (further details will be published after release). Embedded zero (null) bytes will be rejected as well. The checks are applied during input to the backend (receipt from client or COPY IN), so it no longer seems necessary to check in textin() and related routines; any string arriving at those functions will already have been validated. Conversion failure reporting (for characters with no equivalent in the destination encoding) has been cleaned up and made consistent while at it. Also, fix a few longstanding errors in little-used encoding conversion routines: win1251_to_iso, win866_to_iso, euc_tw_to_big5, euc_tw_to_mic, mic_to_euc_tw were all broken to varying extents. Patches by Tatsuo Ishii and Tom Lane. Thanks to Akio Ishida and Yasuo Ohgaki for identifying the security issues.	2006-05-21 20:05:21 +00:00
Peter Eisentraut	1b658473ea	Add support for Windows codepages 1253, 1254, 1255, and 1257 and clean up a bunch of the support utilities. In src/backend/utils/mb/Unicode remove nearly duplicate copies of the UCS_to_XXX perl script and replace with one version to handle all generic files. Update the Makefile so that it knows about all the map files. This produces a slight difference in some of the map files, using a uniform naming convention and not mapping the null character. In src/backend/utils/mb/conversion_procs create a master utf8<->win codepage function like the ISO 8859 versions instead of having a separate handler for each conversion. There is an externally visible change in the name of the win1258 to utf8 conversion. According to the documentation notes, it was named incorrectly and this changes it to a standard name. Running the Unicode mapping perl scripts has shown some additional mapping changes in koi8r and iso8859-7.	2006-02-18 16:15:23 +00:00
Bruce Momjian	9b28021cc6	Previous commit message should have been: Add comment marker for PG_ENCODING_BE_LAST.	2005-12-24 18:23:02 +00:00
Bruce Momjian	1aecda002e	Add	2005-12-24 18:21:34 +00:00
Bruce Momjian	e5392a43f8	Alignment cleanup.	2005-12-24 18:11:30 +00:00
Bruce Momjian	1dc3498251	Standard pgindent run for 8.1.	2005-10-15 02:49:52 +00:00
Tom Lane	8889685555	Suppress signed-vs-unsigned-char warnings.	2005-09-24 17:53:28 +00:00
Tom Lane	28d3ee4771	Actually, this macro had worse problems than a bogus name ...	2005-08-05 15:01:48 +00:00
Tom Lane	848c30a501	Fix misspelled macro name. Doesn't appear to be used anywhere yet, so no one noticed.	2005-08-05 14:36:43 +00:00
Bruce Momjian	5955945828	Support 3 and 4-byte unicode characters. John Hansen	2005-06-15 00:15:08 +00:00
Bruce Momjian	e7fb9f18bf	Add support for Win1252 encoding. Roland Volkmann	2005-03-14 18:31:25 +00:00
Bruce Momjian	e3d7de6b99	Rename canonical encodings, per Peter: UNICODE => UTF8 ALT => WIN866 WIN => WIN1251 TCVN => WIN1258 The old codes continue to work.	2005-03-07 04:30:55 +00:00
Bruce Momjian	e09567d850	Back out addition of Win1252 encoding.	2004-12-04 18:19:33 +00:00
Bruce Momjian	08e0b34bad	Back out fix for Unicode characters above 0x10000	2004-12-03 01:20:33 +00:00
Bruce Momjian	4ea4f8bd06	Fix for Unicode characters above 0x10000. John Hansen	2004-12-02 22:37:14 +00:00
Bruce Momjian	7af770d005	Add Charset WIN1252 support. Roland Volkmann	2004-12-02 22:14:38 +00:00
Peter Eisentraut	152a101f2b	Allow WIN1250 as server encoding.	2004-09-17 21:59:57 +00:00
Bruce Momjian	b6b71b85bc	Pgindent run for 8.0.	2004-08-29 05:07:03 +00:00
Tatsuo Ishii	e8c3205037	Add PQmbdsplen() which returns the "display length" of a character. Still some works needed: - UTF-8, MULE_INTERNAL always returns 1	2004-03-15 10:41:26 +00:00
PostgreSQL Daemon	55b113257c	make sure the $Id tags are converted to $PostgreSQL as well ...	2003-11-29 22:41:33 +00:00
Bruce Momjian	089003fb46	pgindent run.	2003-08-04 00:43:34 +00:00
Tom Lane	b6a1d25b0a	Error message editing in utils/adt. Again thanks to Joe Conway for doing the bulk of the heavy lifting ...	2003-07-27 04:53:12 +00:00
Bruce Momjian	cb36e74ee6	In src/include/mb/pg_wchar.h we have: #define PG_ENCODING_BE_LAST PG_ISO_8859_8 #define PG_ENCODING_FE_LAST PG_WIN1256 but the last client encoding in the enum list is actually PG_GB18030 and it seems that #define PG_ENCODING_IS_CLIEN_ONLY(_enc) \ (((_enc) > PG_ENCODING_BE_LAST && (_enc) <= PG_ENCODING_FE_LAST) can never be true. I think the define should read #define PG_ENCODING_FE_LAST PG_GB18030 On the other hand, perhaps no-one cares, because PG_ENCODING_IS_CLIEN_ONLY is never used. -- Oliver Elphick Oliver.Elphick@lfix.co.uk	2003-06-02 18:59:25 +00:00
Tatsuo Ishii	e2a618fe25	Fix for GUC client_encoding variable not being handled correctly. See following thread for more details. Subject: [HACKERS] client_encoding directive is ignored in postgresql.conf From: Tatsuo Ishii <t-ishii@sra.co.jp> Date: Wed, 29 Jan 2003 22:24:04 +0900 (JST)	2003-02-19 14:31:26 +00:00
Bruce Momjian	e50f52a074	pgindent run.	2002-09-04 20:31:48 +00:00
Peter Eisentraut	77f7763b55	Remove all traces of multibyte and locale options. Clean up comments referring to "multibyte" where it really means character encoding.	2002-09-03 21:45:44 +00:00
Tatsuo Ishii	969e0246ed	Add Cyrillic and other encodings for encoding conversion. Patches submitted by Kaori Inaba (i-kaori@sra.co.jp).	2002-08-14 02:45:10 +00:00
Tatsuo Ishii	c6b2838685	Fix typo. Remove #ifdef MULTIBYTE	2002-07-29 08:04:55 +00:00
Tatsuo Ishii	eb335a034b	I have committed many support files for CREATE CONVERSION. Default conversion procs and conversions are added in initdb. Currently supported conversions are: UTF-8(UNICODE) <--> SQL_ASCII, ISO-8859-1 to 16, EUC_JP, EUC_KR, EUC_CN, EUC_TW, SJIS, BIG5, GBK, GB18030, UHC, JOHAB, TCVN EUC_JP <--> SJIS EUC_TW <--> BIG5 MULE_INTERNAL <--> EUC_JP, SJIS, EUC_TW, BIG5 Note that initial contents of pg_conversion system catalog are created in the initdb process. So doing initdb required is ideal, it's possible to add them to your databases by hand, however. To accomplish this: psql -f your_postgresql_install_path/share/conversion_create.sql your_database So I did not bump up the version in cataversion.h. TODO: Add more conversion procs Add [CASCADE\|RESTRICT] to DROP CONVERSION Add tuples to pg_depend Add regression tests Write docs Add SQL99 CONVERT command? -- Tatsuo Ishii	2002-07-18 02:02:30 +00:00
Tatsuo Ishii	14f72b9a4d	Add GB18030 support. Contributed by Bill Huang <bill_huanghb@ybb.ne.jp> (ODBC support has not been committed yet. left for Hiroshi...)	2002-06-13 08:30:22 +00:00
Bruce Momjian	a8bd7e1c6e	> Tatsuo Ishii wrote: > > > > It was made to cope with encoding such as an Asian bloc in 7.2Beta2. > > > > > > > > Added ServerEncoding > > > > Korean (JOHAB), Thai (WIN874), > > > > Vietnamese (TCVN), Arabic (WIN1256) > > > > > > > > Added ClientEncoding > > > > Simplified Chinese (GBK), Korean (UHC) > > > > > > > > > > > > > http://www.sankyo-unyu.co.jp/Pool/postgresql-7.2b2.newencoding.diff.tar.gz > > > > (608K) > > > > > > Looks good. I need some people to review this for me. > > > > For me they look good too. The only missing part is a > > documentation. I will ask him to write it up. If he couldn't, I will > > do it for him. > > > The diff is 3mb > > > but appears to address only additions to multibyte. I have attached a > > > list of files it modifies. Also, look at the sizes of the mb/ > > > directory. It is getting large: > > > > > > 4 ./CVS > > > 6 ./Unicode/CVS > > > 3433 ./Unicode > > > 6197 . > > > > Yes. We definitely need the on-the-fly encoding addition capability: > > i.e. CREATE CHRACTER SET in the future... > > -- > > Tatsuo Ishii > > > > Address chainge. http://www.sankyo-unyu.co.jp/Pool/postgresql-7.2.newencoding.diff.gz Add PsqlODBC and document ...etc patch. Eiji Tokuya	2002-03-05 05:52:50 +00:00
Bruce Momjian	ea08e6cd55	New pgindent run with fixes suggested by Tom. Patch manually reviewed, initdb/regression tests pass.	2001-11-05 17:46:40 +00:00
Bruce Momjian	6783b2372e	Another pgindent run. Fixes enum indenting, and improves #endif spacing. Also adds space for one-line comments.	2001-10-28 06:26:15 +00:00
Bruce Momjian	b81844b173	pgindent run on all C files. Java run to follow. initdb/regression tests pass.	2001-10-25 05:50:21 +00:00
Tatsuo Ishii	cfe01796e6	Ok, here is the modified encoding table (column1 is the standard name, 2 is our "official" name, and 3 is alias). If there's no objection, I will change them. ASCII SQL_ASCII UTF-8 UNICODE UTF_8 MULE-INTERNAL MULE_INTERNAL ISO-8859-1 LATIN1 ISO_8859_1 ISO-8859-2 LATIN2 ISO_8859_2 ISO-8859-3 LATIN3 ISO_8859_3 ISO-8859-4 LATIN4 ISO_8859_4 ISO-8859-5 ISO_8859_5 ISO-8859-6 ISO_8859_6 ISO-8859-7 ISO_8859_7 ISO-8859-8 ISO_8859_8 ISO-8859-9 LATIN5 ISO_8859_9 ISO-8859-10 LATIN6 ISO_8859_10 ISO-8859-13 LATIN7 ISO_8859_13 ISO-8859-14 LATIN8 ISO_8859_14 ISO-8859-15 LATIN9 ISO_8859_15 ISO-8859-16 LATIN10 ISO_8859_16	2001-10-16 10:09:17 +00:00
Tatsuo Ishii	51053d3216	Add support for ISO-8859-6 to 16	2001-10-11 14:20:35 +00:00
Tatsuo Ishii	be629abfc8	Add pg_database_encoding_max_length() function.	2001-09-23 10:59:45 +00:00
Tom Lane	e3f5bc3492	Fix type_maximum_size() to give the right answer in MULTIBYTE cases. Avoid use of prototype-less function pointers in MB code.	2001-09-21 15:27:38 +00:00
Tatsuo Ishii	e1de3e0833	Implement following item in TODO: * Reject character sequences those are not valid in their charset	2001-09-11 04:50:36 +00:00
Tatsuo Ishii	227767112c	Commit Karel's patch. ------------------------------------------------------------------- Subject: Re: [PATCHES] encoding names From: Karel Zak <zakkr@zf.jcu.cz> To: Peter Eisentraut <peter_e@gmx.net> Cc: pgsql-patches <pgsql-patches@postgresql.org> Date: Fri, 31 Aug 2001 17:24:38 +0200 On Thu, Aug 30, 2001 at 01:30:40AM +0200, Peter Eisentraut wrote: > > - convert encoding 'name' to 'id' > > I thought we decided not to add functions returning "new" names until we > know exactly what the new names should be, and pending schema Ok, the patch not to add functions. > better > > ...(): encoding name too long Fixed. I found new bug in command/variable.c in parse_client_encoding(), nobody probably never see this error: if (pg_set_client_encoding(encoding)) { elog(ERROR, "Conversion between %s and %s is not supported", value, GetDatabaseEncodingName()); } because pg_set_client_encoding() returns -1 for error and 0 as true. It's fixed too. IMHO it can be apply. Karel PS: * following files are renamed: src/utils/mb/Unicode/KOI8_to_utf8.map --> src/utils/mb/Unicode/koi8r_to_utf8.map src/utils/mb/Unicode/WIN_to_utf8.map --> src/utils/mb/Unicode/win1251_to_utf8.map src/utils/mb/Unicode/utf8_to_KOI8.map --> src/utils/mb/Unicode/utf8_to_koi8r.map src/utils/mb/Unicode/utf8_to_WIN.map --> src/utils/mb/Unicode/utf8_to_win1251.map * new file: src/utils/mb/encname.c * removed file: src/utils/mb/common.c -- Karel Zak <zakkr@zf.jcu.cz> http://home.zf.jcu.cz/~zakkr/ C, PostgreSQL, PHP, WWW, http://docs.linux.cz, http://mape.jcu.cz	2001-09-06 04:57:30 +00:00
Tatsuo Ishii	ab9b6c45cf	Add conver/convert2 functions. They are similar to the SQL99's convert.	2001-08-15 07:07:40 +00:00
Tatsuo Ishii	1032445e5d	TODO item: * Make n of CHAR(n)/VARCHAR(n) the number of letters, not bytes	2001-07-15 11:07:37 +00:00
Bruce Momjian	0cec2bb0cd	BTW it does not add encodign it just patches existing one (KOI8) to support two - KOI8-R and KOI8-U (latter is superset of the former if not to take to the account pseudographics) Andy Rysin	2001-05-03 21:38:45 +00:00

1 2

75 Commits