postgresql

Commit Graph

Author	SHA1	Message	Date
Bruce Momjian	9af4159fce	pgindent run for release 9.3 This is the first run of the Perl-based pgindent script. Also update pgindent instructions.	2013-05-29 16:58:43 -04:00
Andrew Dunstan	3717f0837b	Tidy up from frontend Assert change. Quiet compiler warnings noted by Peter Eisentraut.	2012-12-16 12:22:57 -05:00
Tom Lane	60e9c224a1	Fix ASCII case in pg_wchar2mule_with_len. Also some cosmetic improvements for wchar-to-mblen patch.	2012-07-10 15:59:39 -04:00
Robert Haas	f6a05fd973	Fix failure of new wchar->mb functions to advance from pointer. Bug spotted by Tom Lane.	2012-07-05 23:47:53 -04:00
Robert Haas	72dd6291f2	Add wchar -> mb conversion routines. This is infrastructure for Alexander Korotkov's work on indexing regular expression searches. Alexander Korotkov, with a bit of further hackery on the MULE conversion by me	2012-07-04 17:10:10 -04:00
Tom Lane	09022de1f5	Improve documentation about MULE encoding. This commit improves the comments in pg_wchar.h and creates #define symbols for some formerly hard-coded values. No substantive code changes. Tatsuo Ishii and Tom Lane	2012-07-04 00:29:57 -04:00
Bruce Momjian	927d61eeff	Run pgindent on 9.2 source tree in preparation for first 9.3 commit-fest.	2012-06-10 15:20:04 -04:00
Robert Haas	5d4b60f2f2	Lots of doc corrections. Josh Kupershmidt	2012-04-23 22:43:09 -04:00
Tom Lane	eb5834d5af	Further improvement of make_greater_string. Make sure that it considers all the possibilities that the old code did, instead of trying only one possibility per character position. To keep the runtime in bounds, instead tweak the character incrementers to not try every possible multibyte character code. Remove unnecessary logic to restore the old character value on failure. Additional comment and formatting cleanup.	2011-10-30 12:22:11 -04:00
Robert Haas	78d523b633	Improve make_greater_string() with encoding-specific incrementers. This infrastructure doesn't in any way guarantee that the character we produce will sort before the one we incremented; but it does at least make it much more likely that we'll end up with something that is a valid character, which improves our chances. Kyotaro Horiguchi, with various adjustments by me.	2011-10-29 14:22:20 -04:00
Peter Eisentraut	a2a5ce6826	Improve "invalid byte sequence for encoding" message It used to say ERROR: invalid byte sequence for encoding "UTF8": 0xdb24 Change this to ERROR: invalid byte sequence for encoding "UTF8": 0xdb 0x24 to make it clear that this is a byte sequence and not a code point. Also fix the adjacent "character has no equivalent" message that has the same issue.	2011-09-05 23:38:27 +03:00
Magnus Hagander	9f2e211386	Remove cvs keywords from all files.	2010-09-20 22:08:53 +02:00
Tom Lane	2d8314bd43	Rename utf2ucs() to utf8_to_unicode(), and export it so it can be used elsewhere. Similarly rename the version in mbprint.c, not because this affects anything but just to keep the two copies in exact sync. There was some discussion of having only one copy in src/port/ instead, but this function is so small and unlikely to change that that seems like overkill. Slightly editorialized version of a patch by Joseph Adams. (The bug-fix aspect of his patch was applied separately, and back-patched.)	2010-08-18 19:54:01 +00:00
Andrew Dunstan	fc09fb7bcf	Remove sometimes inaccurate error hint about source of wrongly encoded data.	2010-01-04 20:38:31 +00:00
Bruce Momjian	d747140279	8.4 pgindent run, with new combined Linux/FreeBSD/MinGW typedef list provided by Andrew.	2009-06-11 14:49:15 +00:00
Tom Lane	fd9e2accef	When we are in error recursion trouble, arrange to suppress translation and encoding conversion of any elog/ereport message being sent to the frontend. This generalizes a patch that I put in last October, which suppressed translation of only specific messages known to be associated with recursive can't-translate-the-message behavior. As shown in bug #4680, we need a more general answer in order to have some hope of coping with broken encoding conversion setups. This approach seems a good deal less klugy anyway. Patch in all supported branches.	2009-03-02 21:18:43 +00:00
Peter Eisentraut	8b9dd6b5fd	Support for KOI8U encoding	2009-02-10 19:29:39 +00:00
Peter Eisentraut	1cb54c2860	Remove the encoding numbers from the comments. They are useless, and make maintenance harder.	2009-02-10 16:44:44 +00:00
Tom Lane	0d65eea3da	Replace argument-checking Asserts with regular test-and-elog checks in all encoding conversion functions. These are not can't-happen cases because it's possible to create a conversion with the wrong conversion function for the specified encoding pair. That would lead to an Assert crash in an Assert-enabled build, or incorrect conversion otherwise, neither of which is desirable. This would be a DOS issue if production databases were customarily built with asserts enabled, but fortunately that's not so. Per an observation by Heikki. Back-patch to all supported branches.	2009-01-29 19:23:42 +00:00
Peter Eisentraut	06735e3256	Unicode escapes in strings and identifiers	2008-10-29 08:04:54 +00:00
Tom Lane	b0169bb124	Install a more robust solution for the problem of infinite error-processing recursion when we are unable to convert a localized error message to the client's encoding. We've been over this ground before, but as reported by Ibrar Ahmed, it still didn't work in the case of conversion failures for the conversion-failure message itself :-(. Fix by installing a "circuit breaker" that disables attempts to localize this message once we get into recursion trouble. Patch all supported branches, because it is in fact broken in all of them; though I had to add some missing translations to the older branches in order to expose the failure in the particular test case I was using.	2008-10-27 19:37:22 +00:00
Bruce Momjian	fdf5a5efb7	pgindent run for 8.3.	2007-11-15 21:14:46 +00:00
Tom Lane	febd60bf5d	Fix pg_wchar_table[] to match revised ordering of the encoding ID enum. Add some comments so hopefully the next poor sod doesn't fall into the same trap. (Wrong comments are worse than none at all...)	2007-10-15 22:46:27 +00:00
Andrew Dunstan	55613bf9cd	Close previously open holes for invalidly encoded data to enter the database via builtin functions, as recently discussed on -hackers. chr() now returns a character in the database encoding. For UTF8 encoded databases the argument is treated as a Unicode code point. For other multi-byte encodings the argument must designate a strict ascii character, or an error is raised, as is also the case if the argument is 0. ascii() is adjusted so that it remains the inverse of chr(). The two argument form of convert() is gone, and the three argument form now takes a bytea first argument and returns a bytea. To cover this loss three new functions are introduced: . convert_from(bytea, name) returns text - converts the first argument from the named encoding to the database encoding . convert_to(text, name) returns bytea - converts the first argument from the database encoding to the named encoding . length(bytea, name) returns int - gives the length of the first argument in characters in the named encoding	2007-09-18 17:41:17 +00:00
Tom Lane	4dbbef2845	Suppress an integer-overflow warning.	2007-07-12 21:17:09 +00:00
Tatsuo Ishii	6041b92238	Make JOHAB client only encoding per discussions in pgsql-hackers "Server-side support of all encodings" around 2007/3/26. initdb required.	2007-04-15 10:56:30 +00:00
Tatsuo Ishii	a6fbd2f12a	Fix pg_wchar_table's maxmblen field of EUC_CN, EUC_TW, MULE_INTERNAL and GB18030. patches from ITAGAKI Takahiro.	2007-03-26 11:15:13 +00:00
Tatsuo Ishii	75c6519ff6	Add new encoding EUC_JIS_2004 and SHIFT_JIS_2004, along with new conversions among EUC_JIS_2004, SHIFT_JIS_2004 and UTF-8. catalog version has been bump up.	2007-03-25 11:56:04 +00:00
Tom Lane	0887fa1117	Get pg_utf_mblen(), pg_utf2wchar_with_len(), and utf2ucs() all on the same page about the maximum UTF8 sequence length we support (4 bytes since 8.1, 3 before that). pg_utf2wchar_with_len never got updated to support 4-byte characters at all, and in any case had a buffer-overrun risk in that it could produce multiple pg_wchars from what mblen claims to be just one UTF8 character. The only reason we don't have a major security hole is that most callers allocate worst-case output buffers; the sole exception in released versions appears to be pre-8.2 iwchareq() (ie, ILIKE), which can be crashed due to zeroing out its return address --- but AFAICS that can't be exploited for anything more than a crash, due to inability to control what gets written there. Per report from James Russell and Michael Fuhr. Pre-8.1 the risk is much less, but I still think pg_utf2wchar_with_len's behavior given an incomplete final character risks buffer overrun, so back-patch that logic change anyway. This patch also makes sure that UTF8 sequences exceeding the supported length (whichever it is) are consistently treated as error cases, rather than being treated like a valid shorter sequence in some places.	2007-01-24 17:12:17 +00:00
Bruce Momjian	f99a569a2e	pgindent run for 8.2.	2006-10-04 00:30:14 +00:00
Bruce Momjian	a3132359fd	In new "invalid byte sequence" error hint, call it "error", not "failure".	2006-08-22 12:11:28 +00:00
Bruce Momjian	e11cab650c	Add hint for "invalid byte sequence for encoding" error message, suggesting review of client_encoding.	2006-08-22 03:30:20 +00:00
Tom Lane	c61a2f5841	Change the backend to reject strings containing invalidly-encoded multibyte characters in all cases. Formerly we mostly just threw warnings for invalid input, and failed to detect it at all if no encoding conversion was required. The tighter check is needed to defend against SQL-injection attacks as per CVE-2006-2313 (further details will be published after release). Embedded zero (null) bytes will be rejected as well. The checks are applied during input to the backend (receipt from client or COPY IN), so it no longer seems necessary to check in textin() and related routines; any string arriving at those functions will already have been validated. Conversion failure reporting (for characters with no equivalent in the destination encoding) has been cleaned up and made consistent while at it. Also, fix a few longstanding errors in little-used encoding conversion routines: win1251_to_iso, win866_to_iso, euc_tw_to_big5, euc_tw_to_mic, mic_to_euc_tw were all broken to varying extents. Patches by Tatsuo Ishii and Tom Lane. Thanks to Akio Ishida and Yasuo Ohgaki for identifying the security issues.	2006-05-21 20:05:21 +00:00
Peter Eisentraut	1b658473ea	Add support for Windows codepages 1253, 1254, 1255, and 1257 and clean up a bunch of the support utilities. In src/backend/utils/mb/Unicode remove nearly duplicate copies of the UCS_to_XXX perl script and replace with one version to handle all generic files. Update the Makefile so that it knows about all the map files. This produces a slight difference in some of the map files, using a uniform naming convention and not mapping the null character. In src/backend/utils/mb/conversion_procs create a master utf8<->win codepage function like the ISO 8859 versions instead of having a separate handler for each conversion. There is an externally visible change in the name of the win1258 to utf8 conversion. According to the documentation notes, it was named incorrectly and this changes it to a standard name. Running the Unicode mapping perl scripts has shown some additional mapping changes in koi8r and iso8859-7.	2006-02-18 16:15:23 +00:00
Bruce Momjian	c01999a557	Allow psql multi-line column values to align in the proper columns If the second output column value is 'a\nb', the 'b' should appear in the second display column, rather than the first column as it does now. Change libpq's PQdsplen() to return more useful values. > Note: this changes the PQdsplen function, it can now return zero or > minus one which was not possible before. It doesn't appear anyone is > actually using the functions other than psql but it is a change. The > functions are not actually documentated anywhere so it's not like we're > breaking a defined interface. The new semantics follow the Unicode > standard. BACKWARD COMPATIBLE CHANGE. The only user-visible change I saw in the regression tests is that a SELECT * on a table where all the columns have been dropped doesn't return a blank line like before. This seems like a step forward. Martijn van Oosterhout	2006-02-10 00:39:04 +00:00
Bruce Momjian	a2384d008a	More uses of IS_HIGHBIT_SET() macro.	2005-12-26 19:30:45 +00:00
Bruce Momjian	261114a23f	I have added these macros to c.h: #define HIGHBIT (0x80) #define IS_HIGHBIT_SET(ch) ((unsigned char)(ch) & HIGHBIT) and removed CSIGNBIT and mapped it uses to HIGHBIT. I have also added uses for IS_HIGHBIT_SET where appropriate. This change is purely for code clarity.	2005-12-25 02:14:19 +00:00
Bruce Momjian	d8a8183456	Formatting cleanups.	2005-12-24 17:19:40 +00:00
Bruce Momjian	0658a6a634	Formatting cleanup.	2005-12-24 16:49:48 +00:00
Tatsuo Ishii	804f6b8fc9	Fix long standing Asian multibyte charsets bug. See: Subject: [HACKERS] bugs with certain Asian multibyte charsets From: Tatsuo Ishii <ishii@sraoss.co.jp> To: pgsql-hackers@postgresql.org Date: Sat, 24 Dec 2005 18:25:33 +0900 (JST) for more details/	2005-12-24 09:35:36 +00:00
Peter Eisentraut	07bb9f086b	Message corrections	2005-10-29 00:31:52 +00:00
Bruce Momjian	1dc3498251	Standard pgindent run for 8.1.	2005-10-15 02:49:52 +00:00
Tom Lane	8889685555	Suppress signed-vs-unsigned-char warnings.	2005-09-24 17:53:28 +00:00
Bruce Momjian	5955945828	Support 3 and 4-byte unicode characters. John Hansen	2005-06-15 00:15:08 +00:00
Bruce Momjian	e7fb9f18bf	Add support for Win1252 encoding. Roland Volkmann	2005-03-14 18:31:25 +00:00
Bruce Momjian	41e2a80f57	Update comments for new encoding names.	2005-03-14 00:19:13 +00:00
Bruce Momjian	e3d7de6b99	Rename canonical encodings, per Peter: UNICODE => UTF8 ALT => WIN866 WIN => WIN1251 TCVN => WIN1258 The old codes continue to work.	2005-03-07 04:30:55 +00:00
Bruce Momjian	08e0b34bad	Back out fix for Unicode characters above 0x10000	2004-12-03 01:20:33 +00:00
Bruce Momjian	4ea4f8bd06	Fix for Unicode characters above 0x10000. John Hansen	2004-12-02 22:37:14 +00:00
Peter Eisentraut	152a101f2b	Allow WIN1250 as server encoding.	2004-09-17 21:59:57 +00:00

1 2

87 Commits