Commit Graph

103 Commits

Author SHA1 Message Date
Bruce Momjian e126958c2e Update copyright notices for year 2012. 2012-01-01 18:01:58 -05:00
Robert Haas 78d523b633 Improve make_greater_string() with encoding-specific incrementers.
This infrastructure doesn't in any way guarantee that the character
we produce will sort before the one we incremented; but it does at least
make it much more likely that we'll end up with something that is a valid
character, which improves our chances.

Kyotaro Horiguchi, with various adjustments by me.
2011-10-29 14:22:20 -04:00
Tom Lane 2ab0796d7a Fix char2wchar/wchar2char to support collations properly.
These functions should take a pg_locale_t, not a collation OID, and should
call mbstowcs_l/wcstombs_l where available.  Where those functions are not
available, temporarily select the correct locale with uselocale().

This change removes the bogus assumption that all locales selectable in
a given database have the same wide-character conversion method; in
particular, the collate.linux.utf8 regression test now passes with
LC_CTYPE=C, so long as the database encoding is UTF8.

I decided to move the char2wchar/wchar2char functions out of mbutils.c and
into pg_locale.c, because they work on wchar_t not pg_wchar_t and thus
don't really belong with the mbutils.c functions.  Keeping them where they
were would have required importing pg_locale_t into pg_wchar.h somehow,
which did not seem like a good plan.
2011-04-23 12:35:41 -04:00
Tom Lane 2594cf0e8c Revise the API for GUC variable assign hooks.
The previous functions of assign hooks are now split between check hooks
and assign hooks, where the former can fail but the latter shouldn't.
Aside from being conceptually clearer, this approach exposes the
"canonicalized" form of the variable value to guc.c without having to do
an actual assignment.  And that lets us fix the problem recently noted by
Bernd Helmle that the auto-tune patch for wal_buffers resulted in bogus
log messages about "parameter "wal_buffers" cannot be changed without
restarting the server".  There may be some speed advantage too, because
this design lets hook functions avoid re-parsing variable values when
restoring a previous state after a rollback (they can store a pre-parsed
representation of the value instead).  This patch also resolves a
longstanding annoyance about custom error messages from variable assign
hooks: they should modify, not appear separately from, guc.c's own message
about "invalid parameter value".
2011-04-07 00:12:02 -04:00
Itagaki Takahiro 3cba8240a1 Add ENCODING option to COPY TO/FROM and file_fdw.
File encodings can be specified separately from client encoding.
If not specified, client encoding is used for backward compatibility.

Cases when the encoding doesn't match client encoding are slower
than matched cases because we don't have conversion procs for other
encodings. Performance improvement would be be a future work.

Original patch by Hitoshi Harada, and modified by me.
2011-02-21 14:32:40 +09:00
Peter Eisentraut 414c5a2ea6 Per-column collation support
This adds collation support for columns and domains, a COLLATE clause
to override it per expression, and B-tree index support.

Peter Eisentraut
reviewed by Pavel Stehule, Itagaki Takahiro, Robert Haas, Noah Misch
2011-02-08 23:04:18 +02:00
Bruce Momjian 5d950e3b0c Stamp copyrights for year 2011. 2011-01-01 13:18:15 -05:00
Magnus Hagander 9f2e211386 Remove cvs keywords from all files. 2010-09-20 22:08:53 +02:00
Tom Lane 2d8314bd43 Rename utf2ucs() to utf8_to_unicode(), and export it so it can be used
elsewhere.

Similarly rename the version in mbprint.c, not because this affects anything
but just to keep the two copies in exact sync.  There was some discussion of
having only one copy in src/port/ instead, but this function is so small
and unlikely to change that that seems like overkill.

Slightly editorialized version of a patch by Joseph Adams.  (The bug-fix
aspect of his patch was applied separately, and back-patched.)
2010-08-18 19:54:01 +00:00
Bruce Momjian 65e806cba1 pgindent run for 9.0 2010-02-26 02:01:40 +00:00
Bruce Momjian 0239800893 Update copyright for the year 2010. 2010-01-02 16:58:17 +00:00
Magnus Hagander 748771379b Write to the Windows eventlog in UTF16, converting the message encoding
as necessary.

Itagaki Takahiro with some changes from me
2009-10-17 00:24:51 +00:00
Bruce Momjian d747140279 8.4 pgindent run, with new combined Linux/FreeBSD/MinGW typedef list
provided by Andrew.
2009-06-11 14:49:15 +00:00
Magnus Hagander 420ea68817 Move gettext encoding names into encnames.c, so we only have one place to update.
Per discussion.
2009-04-24 08:43:51 +00:00
Heikki Linnakangas 1fe5020558 Tell gettext which codeset to use by calling bind_textdomain_codeset(). We
already did that on Windows, but it's needed on other platforms too when
LC_CTYPE=C. With other locales, we enforce (or trust) that the codeset of
the locale matches the server encoding so we don't need to bind it
explicitly. It should do no harm in that case either, but I don't have
full faith in the PG encoding -> OS codeset mapping table yet. Per recent
discussion on pgsql-hackers.
2009-04-08 09:50:48 +00:00
Tom Lane c26ffb1ead Fix SetClientEncoding() to maintain a cache of previously selected encoding
conversion functions.  This allows transaction rollback to revert to a
previous client_encoding setting without doing fresh catalog lookups.
I believe that this explains and fixes the recent report of "failed to commit
client_encoding" failures.

This bug is present in 8.3.x, but it doesn't seem prudent to back-patch
the fix, at least not till it's had some time for field testing in HEAD.

In passing, remove SetDefaultClientEncoding(), which was used nowhere.
2009-04-02 17:30:53 +00:00
Alvaro Herrera e43fd89762 Revert pg_bind_textdomain_codeset to a existant-but-empty function when
ENABLE_NLS is not defined, for better compatibility of the backend with
modules compiled the other way.

Per note from Tom after my previous commit.
2009-03-09 00:01:32 +00:00
Alvaro Herrera 4022f94c24 pg_bind_textdomain_codeset must exist only on ENABLE_NLS. 2009-03-08 18:10:17 +00:00
Alvaro Herrera c3b5d2f138 On Windows, call bind_textdomain_codeset on domains other than the default one,
too, so that the codeset is properly mapped on the newly added PL domains.
2009-03-08 16:07:12 +00:00
Peter Eisentraut 8b9dd6b5fd Support for KOI8U encoding 2009-02-10 19:29:39 +00:00
Tom Lane 0d65eea3da Replace argument-checking Asserts with regular test-and-elog checks in all
encoding conversion functions.  These are not can't-happen cases because
it's possible to create a conversion with the wrong conversion function
for the specified encoding pair.  That would lead to an Assert crash in
an Assert-enabled build, or incorrect conversion otherwise, neither of
which is desirable.  This would be a DOS issue if production databases
were customarily built with asserts enabled, but fortunately that's not so.
Per an observation by Heikki.

Back-patch to all supported branches.
2009-01-29 19:23:42 +00:00
Tom Lane 1efd5ff89b Add a pg_encoding_mbcliplen() function that is just like pg_mbcliplen()
except the caller can specify the encoding to work in; this will be needed
for pg_stat_statements.  In passing, do some marginal efficiency hacking
and clean up some comments.  Also, prevent the single-byte-encoding code
path from fetching one byte past the stated length of the string (this
last is a bug that might need to be back-patched at some point).
2009-01-04 18:37:36 +00:00
Bruce Momjian 511db38ace Update copyright for 2009. 2009-01-01 17:24:05 +00:00
Peter Eisentraut 06735e3256 Unicode escapes in strings and identifiers 2008-10-29 08:04:54 +00:00
Bruce Momjian 9de09c087d Move wchar2char() and char2wchar() from tsearch into /mb to be easier to
use for other modules;  also move pnstrdup().

Clean up code slightly.
2008-06-18 18:42:54 +00:00
Bruce Momjian 9098ab9e32 Update copyrights in source tree to 2008. 2008-01-01 19:46:01 +00:00
Bruce Momjian f6e8730d11 Re-run pgindent with updated list of typedefs. (Updated README should
avoid this problem in the future.)
2007-11-15 22:25:18 +00:00
Bruce Momjian fdf5a5efb7 pgindent run for 8.3. 2007-11-15 21:14:46 +00:00
Tom Lane febd60bf5d Fix pg_wchar_table[] to match revised ordering of the encoding ID enum.
Add some comments so hopefully the next poor sod doesn't fall into the
same trap.  (Wrong comments are worse than none at all...)
2007-10-15 22:46:27 +00:00
Tom Lane 8468146b03 Fix the inadvertent libpq ABI breakage discovered by Martin Pitt: the
renumbering of encoding IDs done between 8.2 and 8.3 turns out to break 8.2
initdb and psql if they are run with an 8.3beta1 libpq.so.  For the moment
we can rearrange the order of enum pg_enc to keep the same number for
everything except PG_JOHAB, which isn't a problem since there are no direct
references to it in the 8.2 programs anyway.  (This does force initdb
unfortunately.)

Going forward, we want to fix things so that encoding IDs can be changed
without an ABI break, and this commit includes the changes needed to allow
libpq's encoding IDs to be treated as fully independent of the backend's.
The main issue is that libpq clients should not include pg_wchar.h or
otherwise assume they know the specific values of libpq's encoding IDs,
since they might encounter version skew between pg_wchar.h and the libpq.so
they are using.  To fix, have libpq officially export functions needed for
encoding name<=>ID conversion and validity checking; it was doing this
anyway unofficially.

It's still the case that we can't renumber backend encoding IDs until the
next bump in libpq's major version number, since doing so will break the
8.2-era client programs.  However the code is now prepared to avoid this
type of problem in future.

Note that initdb is no longer a libpq client: we just pull in the two
source files we need directly.  The patch also fixes a few places that
were being sloppy about checking for an unrecognized encoding name.
2007-10-13 20:18:42 +00:00
Andrew Dunstan 55613bf9cd Close previously open holes for invalidly encoded data to enter the
database via builtin functions, as recently discussed on -hackers.

chr() now returns a character in the database encoding. For UTF8 encoded databases
the argument is treated as a Unicode code point. For other multi-byte encodings
the argument must designate a strict ascii character, or an error is raised,
as is also the case if the argument is 0.

ascii() is adjusted so that it remains the inverse of chr().

The two argument form of convert() is gone, and the three argument form now
takes a bytea first argument and returns a bytea. To cover this loss three new
functions are introduced:
. convert_from(bytea, name) returns text - converts the first argument from the
  named encoding to the database encoding
. convert_to(text, name) returns bytea - converts the first argument from the
  database encoding to the named encoding
. length(bytea, name) returns int - gives the length of the first argument in
  characters in the named encoding
2007-09-18 17:41:17 +00:00
Tatsuo Ishii 6041b92238 Make JOHAB client only encoding per discussions in pgsql-hackers
"Server-side support of all encodings" around 2007/3/26.
initdb required.
2007-04-15 10:56:30 +00:00
Tatsuo Ishii 75c6519ff6 Add new encoding EUC_JIS_2004 and SHIFT_JIS_2004,
along with new conversions among EUC_JIS_2004, SHIFT_JIS_2004 and UTF-8.
catalog version has been bump up.
2007-03-25 11:56:04 +00:00
Tom Lane e9da20ab4d Fix machine-dependent crash in sqlchar_to_unicode(). Get rid of
bletcherous and unsafe manipulation of global encoding setting.
Clean up libxml reporting mechanism a bit (it still looks like a
dangling-pointer crash waiting to happen, though, not to mention
being far less than sane from a localization standpoint).
2006-12-24 00:57:48 +00:00
Bruce Momjian f99a569a2e pgindent run for 8.2. 2006-10-04 00:30:14 +00:00
Tom Lane c61a2f5841 Change the backend to reject strings containing invalidly-encoded multibyte
characters in all cases.  Formerly we mostly just threw warnings for invalid
input, and failed to detect it at all if no encoding conversion was required.
The tighter check is needed to defend against SQL-injection attacks as per
CVE-2006-2313 (further details will be published after release).  Embedded
zero (null) bytes will be rejected as well.  The checks are applied during
input to the backend (receipt from client or COPY IN), so it no longer seems
necessary to check in textin() and related routines; any string arriving at
those functions will already have been validated.  Conversion failure
reporting (for characters with no equivalent in the destination encoding)
has been cleaned up and made consistent while at it.

Also, fix a few longstanding errors in little-used encoding conversion
routines: win1251_to_iso, win866_to_iso, euc_tw_to_big5, euc_tw_to_mic,
mic_to_euc_tw were all broken to varying extents.

Patches by Tatsuo Ishii and Tom Lane.  Thanks to Akio Ishida and Yasuo Ohgaki
for identifying the security issues.
2006-05-21 20:05:21 +00:00
Peter Eisentraut 1b658473ea Add support for Windows codepages 1253, 1254, 1255, and 1257 and clean
up a bunch of the support utilities.

In src/backend/utils/mb/Unicode remove nearly duplicate copies of the
UCS_to_XXX perl script and replace with one version to handle all generic
files.  Update the Makefile so that it knows about all the map files.
This produces a slight difference in some of the map files, using a
uniform naming convention and not mapping the null character.

In src/backend/utils/mb/conversion_procs create a master utf8<->win
codepage function like the ISO 8859 versions instead of having a separate
handler for each conversion.

There is an externally visible change in the name of the win1258 to utf8
conversion.  According to the documentation notes, it was named
incorrectly and this changes it to a standard name.

Running the Unicode mapping perl scripts has shown some additional mapping
changes in koi8r and iso8859-7.
2006-02-18 16:15:23 +00:00
Bruce Momjian 9b28021cc6 Previous commit message should have been:
Add comment marker for PG_ENCODING_BE_LAST.
2005-12-24 18:23:02 +00:00
Bruce Momjian 1aecda002e Add 2005-12-24 18:21:34 +00:00
Bruce Momjian e5392a43f8 Alignment cleanup. 2005-12-24 18:11:30 +00:00
Bruce Momjian 1dc3498251 Standard pgindent run for 8.1. 2005-10-15 02:49:52 +00:00
Tom Lane 8889685555 Suppress signed-vs-unsigned-char warnings. 2005-09-24 17:53:28 +00:00
Tom Lane 28d3ee4771 Actually, this macro had worse problems than a bogus name ... 2005-08-05 15:01:48 +00:00
Tom Lane 848c30a501 Fix misspelled macro name. Doesn't appear to be used anywhere yet,
so no one noticed.
2005-08-05 14:36:43 +00:00
Bruce Momjian 5955945828 Support 3 and 4-byte unicode characters.
John Hansen
2005-06-15 00:15:08 +00:00
Bruce Momjian e7fb9f18bf Add support for Win1252 encoding.
Roland Volkmann
2005-03-14 18:31:25 +00:00
Bruce Momjian e3d7de6b99 Rename canonical encodings, per Peter:
UNICODE => UTF8
	ALT => WIN866
	WIN => WIN1251
	TCVN => WIN1258

The old codes continue to work.
2005-03-07 04:30:55 +00:00
Bruce Momjian e09567d850 Back out addition of Win1252 encoding. 2004-12-04 18:19:33 +00:00
Bruce Momjian 08e0b34bad Back out fix for Unicode characters above 0x10000 2004-12-03 01:20:33 +00:00
Bruce Momjian 4ea4f8bd06 Fix for Unicode characters above 0x10000.
John Hansen
2004-12-02 22:37:14 +00:00