Commit Graph

268 Commits

Author SHA1 Message Date
Peter Eisentraut
c138b966d4 Replace useless uses of := by = in makefiles. 2007-02-09 15:56:00 +00:00
Tom Lane
0887fa1117 Get pg_utf_mblen(), pg_utf2wchar_with_len(), and utf2ucs() all on the same
page about the maximum UTF8 sequence length we support (4 bytes since 8.1,
3 before that).  pg_utf2wchar_with_len never got updated to support 4-byte
characters at all, and in any case had a buffer-overrun risk in that it
could produce multiple pg_wchars from what mblen claims to be just one UTF8
character.  The only reason we don't have a major security hole is that most
callers allocate worst-case output buffers; the sole exception in released
versions appears to be pre-8.2 iwchareq() (ie, ILIKE), which can be crashed
due to zeroing out its return address --- but AFAICS that can't be exploited
for anything more than a crash, due to inability to control what gets written
there.  Per report from James Russell and Michael Fuhr.

Pre-8.1 the risk is much less, but I still think pg_utf2wchar_with_len's
behavior given an incomplete final character risks buffer overrun, so
back-patch that logic change anyway.

This patch also makes sure that UTF8 sequences exceeding the supported
length (whichever it is) are consistently treated as error cases, rather
than being treated like a valid shorter sequence in some places.
2007-01-24 17:12:17 +00:00
Peter Eisentraut
2cc01004c6 Remove remains of old depend target. 2007-01-20 17:16:17 +00:00
Bruce Momjian
29dccf5fe0 Update CVS HEAD for 2007 copyright. Back branches are typically not
back-stamped for this.
2007-01-05 22:20:05 +00:00
Tom Lane
e9da20ab4d Fix machine-dependent crash in sqlchar_to_unicode(). Get rid of
bletcherous and unsafe manipulation of global encoding setting.
Clean up libxml reporting mechanism a bit (it still looks like a
dangling-pointer crash waiting to happen, though, not to mention
being far less than sane from a localization standpoint).
2006-12-24 00:57:48 +00:00
Peter Eisentraut
8c1de5fb00 Initial SQL/XML support: xml data type and initial set of functions. 2006-12-21 16:05:16 +00:00
Peter Eisentraut
3cd318a8d1 Fix gratuitous message spelling differences 2006-11-27 15:50:55 +00:00
Peter Eisentraut
b9b4f10b5b Message style improvements 2006-10-06 17:14:01 +00:00
Bruce Momjian
f99a569a2e pgindent run for 8.2. 2006-10-04 00:30:14 +00:00
Bruce Momjian
a3132359fd In new "invalid byte sequence" error hint, call it "error", not
"failure".
2006-08-22 12:11:28 +00:00
Bruce Momjian
e11cab650c Add hint for "invalid byte sequence for encoding" error message,
suggesting review of client_encoding.
2006-08-22 03:30:20 +00:00
Bruce Momjian
e0522505bd Remove 576 references of include files that were not needed. 2006-07-14 14:52:27 +00:00
Bruce Momjian
ac230e7431 Alphabetically order reference to include files, "S"-"Z". 2006-07-11 18:26:11 +00:00
Bruce Momjian
3a534ade39 Alphabetically order reference to include files, "G" - "M". 2006-07-11 17:04:13 +00:00
Bruce Momjian
399a36a75d Prepare code to be built by MSVC:
o  remove many WIN32_CLIENT_ONLY defines
	o  add WIN32_ONLY_COMPILER define
	o  add 3rd argument to open() for portability
	o  add include/port/win32_msvc directory for
	   system includes

Magnus Hagander
2006-06-07 22:24:46 +00:00
Tom Lane
a0ffab351e Magic blocks don't do us any good unless we use 'em ... so install one
in every shared library.
2006-05-30 22:12:16 +00:00
Tom Lane
c61a2f5841 Change the backend to reject strings containing invalidly-encoded multibyte
characters in all cases.  Formerly we mostly just threw warnings for invalid
input, and failed to detect it at all if no encoding conversion was required.
The tighter check is needed to defend against SQL-injection attacks as per
CVE-2006-2313 (further details will be published after release).  Embedded
zero (null) bytes will be rejected as well.  The checks are applied during
input to the backend (receipt from client or COPY IN), so it no longer seems
necessary to check in textin() and related routines; any string arriving at
those functions will already have been validated.  Conversion failure
reporting (for characters with no equivalent in the destination encoding)
has been cleaned up and made consistent while at it.

Also, fix a few longstanding errors in little-used encoding conversion
routines: win1251_to_iso, win866_to_iso, euc_tw_to_big5, euc_tw_to_mic,
mic_to_euc_tw were all broken to varying extents.

Patches by Tatsuo Ishii and Tom Lane.  Thanks to Akio Ishida and Yasuo Ohgaki
for identifying the security issues.
2006-05-21 20:05:21 +00:00
Bruce Momjian
f3d99d160d Add CVS tag lines to files that were lacking them. 2006-03-11 04:38:42 +00:00
Bruce Momjian
f2f5b05655 Update copyright for 2006. Update scripts. 2006-03-05 15:59:11 +00:00
Tatsuo Ishii
b3d0442ab3 Tighten up SJIS byte sequence check. Now we reject invalid SJIS byte
sequence such as "0x95 0x27". Patches from Akio Ishida.
Also update copyright notice.
2006-03-04 10:57:35 +00:00
Peter Eisentraut
7f4f42fa10 Clean up CREATE FUNCTION syntax usage in contrib and elsewhere, in
particular get rid of single quotes around language names and old WITH ()
construct.
2006-02-27 16:09:50 +00:00
Peter Eisentraut
268c1b6077 The Makefile was invoking perl scripts as ./script.pl. This fails when
the script is not executable as UCS_to_most.pl is in CVS.  It also won't
pick up any custom setting of the perl version/location to use.  This
patch calls perl scripts like $(PERL) $(srcdir)/script.pl.

Kris Jurka
2006-02-24 13:25:44 +00:00
Peter Eisentraut
1b658473ea Add support for Windows codepages 1253, 1254, 1255, and 1257 and clean
up a bunch of the support utilities.

In src/backend/utils/mb/Unicode remove nearly duplicate copies of the
UCS_to_XXX perl script and replace with one version to handle all generic
files.  Update the Makefile so that it knows about all the map files.
This produces a slight difference in some of the map files, using a
uniform naming convention and not mapping the null character.

In src/backend/utils/mb/conversion_procs create a master utf8<->win
codepage function like the ISO 8859 versions instead of having a separate
handler for each conversion.

There is an externally visible change in the name of the win1258 to utf8
conversion.  According to the documentation notes, it was named
incorrectly and this changes it to a standard name.

Running the Unicode mapping perl scripts has shown some additional mapping
changes in koi8r and iso8859-7.
2006-02-18 16:15:23 +00:00
Tom Lane
226a980bb0 Fix bug that allowed any logged-in user to SET ROLE to any other database user
id (CVE-2006-0553).  Also fix related bug in SET SESSION AUTHORIZATION that
allows unprivileged users to crash the server, if it has been compiled with
Asserts enabled.  The escalation-of-privilege risk exists only in 8.1.0-8.1.2.
However, the Assert-crash risk exists in all releases back to 7.3.
Thanks to Akio Ishida for reporting this problem.
2006-02-12 22:32:43 +00:00
Bruce Momjian
2a5180c26e Throw a warning rather than an error on invalid character from UTF8 to
Latin1, like we do for other Latin encodings.
2006-02-12 21:15:19 +00:00
Bruce Momjian
c01999a557 Allow psql multi-line column values to align in the proper columns
If the second output column value is 'a\nb', the 'b' should appear
  in the second display column, rather than the first column as it
  does now.

Change libpq's PQdsplen() to return more useful values.

> Note: this changes the PQdsplen function, it can now return zero or
> minus one which was not possible before. It doesn't appear anyone is
> actually using the functions other than psql but it is a change. The
> functions are not actually documentated anywhere so it's not like we're
> breaking a defined interface. The new semantics follow the Unicode
> standard.

BACKWARD COMPATIBLE CHANGE.

The only user-visible change I saw in the regression tests is that a
SELECT * on a table where all the columns have been dropped doesn't
return a blank line like before.  This seems like a step forward.

Martijn van Oosterhout
2006-02-10 00:39:04 +00:00
Neil Conway
d3a4d63387 mbutils was previously doing some allocations, including invoking
fmgr_info(), in the TopMemoryContext. I couldn't see that the code
actually leaked, but in general I think it's fragile to assume that
pfree'ing an FmgrInfo along with its fn_extra field is enough to
reclaim all the resources allocated by fmgr_info().  I changed the
code to do its allocations in a new child context of
TopMemoryContext, MbProcContext. When we want to release the
allocations we can just reset the context, which is cleaner.
2006-01-12 22:04:02 +00:00
Neil Conway
fb627b76cc Cosmetic code cleanup: fix a bunch of places that used "return (expr);"
rather than "return expr;" -- the latter style is used in most of the
tree. I kept the parentheses when they were necessary or useful because
the return expression was complex.
2006-01-11 08:43:13 +00:00
Neil Conway
762bcbdba2 Remove a confusing pair of parentheses. 2006-01-11 06:59:22 +00:00
Bruce Momjian
a2384d008a More uses of IS_HIGHBIT_SET() macro. 2005-12-26 19:30:45 +00:00
Bruce Momjian
261114a23f I have added these macros to c.h:
#define HIGHBIT                 (0x80)
        #define IS_HIGHBIT_SET(ch)      ((unsigned char)(ch) & HIGHBIT)

and removed CSIGNBIT and mapped it uses to HIGHBIT.  I have also added
uses for IS_HIGHBIT_SET where appropriate.  This change is
purely for code clarity.
2005-12-25 02:14:19 +00:00
Bruce Momjian
d8a8183456 Formatting cleanups. 2005-12-24 17:19:40 +00:00
Bruce Momjian
0658a6a634 Formatting cleanup. 2005-12-24 16:49:48 +00:00
Tatsuo Ishii
804f6b8fc9 Fix long standing Asian multibyte charsets bug.
See:

Subject: [HACKERS] bugs with certain Asian multibyte charsets
From: Tatsuo Ishii <ishii@sraoss.co.jp>
To: pgsql-hackers@postgresql.org
Date: Sat, 24 Dec 2005 18:25:33 +0900 (JST)

for more details/
2005-12-24 09:35:36 +00:00
Tatsuo Ishii
dcc7da8d5e Fix for rearranging encoding id ISO-8859-5 to ISO-8859-8.
Also make the code more robust by searching for target encoding
in the internal charset map.

Problem reported by Sagi Bashari on 2005/12/21.
See "[BUGS] BUG #2120: Crash when doing UTF8<->ISO_8859_8 encoding conversion"
on pgsql-bugs list for more details.
2005-12-23 02:11:02 +00:00
Peter Eisentraut
a29c04a541 Allow installation into directories containing spaces in the name. 2005-12-09 21:19:36 +00:00
Bruce Momjian
436a2956d8 Re-run pgindent, fixing a problem where comment lines after a blank
comment line where output as too long, and update typedefs for /lib
directory.  Also fix case where identifiers were used as variable names
in the backend, but as typedefs in ecpg (favor the backend for
indenting).

Backpatch to 8.1.X.
2005-11-22 18:17:34 +00:00
Peter Eisentraut
07bb9f086b Message corrections 2005-10-29 00:31:52 +00:00
Bruce Momjian
1dc3498251 Standard pgindent run for 8.1. 2005-10-15 02:49:52 +00:00
Tom Lane
8889685555 Suppress signed-vs-unsigned-char warnings. 2005-09-24 17:53:28 +00:00
Tom Lane
d78397d301 Change typreceive function API so that receive functions get the same
optional arguments as text input functions, ie, typioparam OID and
atttypmod.  Make all the datatypes that use typmod enforce it the same
way in typreceive as they do in typinput.  This fixes a problem with
failure to enforce length restrictions during COPY FROM BINARY.
2005-07-10 21:14:00 +00:00
Tatsuo Ishii
e2d088de03 Allow direct conversion between EUC_JP and SJIS to improve
performance. patches submitted by Atsushi Ogawa.
2005-06-24 13:56:39 +00:00
Bruce Momjian
5955945828 Support 3 and 4-byte unicode characters.
John Hansen
2005-06-15 00:15:08 +00:00
Tatsuo Ishii
b4cbd60fcf Fix bug in MIC -> EUC_JP conversion. Per Atsushi Ogawa. 2005-06-10 16:43:56 +00:00
Tom Lane
893b57c871 Alter the signature for encoding conversion functions to declare the
output area as INTERNAL not CSTRING.  This is to prevent people from
calling the functions by hand.  This is a permanent solution for the
back branches but I hope it is just a stopgap for HEAD.
2005-05-03 19:17:59 +00:00
Bruce Momjian
e7fb9f18bf Add support for Win1252 encoding.
Roland Volkmann
2005-03-14 18:31:25 +00:00
Bruce Momjian
41e2a80f57 Update comments for new encoding names. 2005-03-14 00:19:13 +00:00
Bruce Momjian
ee1bd33dd0 Document aliases for our supported encodings.
Add a few encodings that were not documented.
2005-03-13 01:26:30 +00:00
Neil Conway
4cd2fd66f8 Unbreak out-of-tree builds, by fixing a typo. 2005-03-07 23:18:06 +00:00
Bruce Momjian
e3d7de6b99 Rename canonical encodings, per Peter:
UNICODE => UTF8
	ALT => WIN866
	WIN => WIN1251
	TCVN => WIN1258

The old codes continue to work.
2005-03-07 04:30:55 +00:00
Tom Lane
7e1c8ef4fc Some more missed copyright notices. Many of these look like they
should have been caught by the src/tools/copyright script ... why
weren't they?
2005-01-01 20:44:34 +00:00
PostgreSQL Daemon
2ff501590b Tag appropriate files for rc3
Also performed an initial run through of upgrading our Copyright date to
extend to 2005 ... first run here was very simple ... change everything
where: grep 1996-2004 && the word 'Copyright' ... scanned through the
generated list with 'less' first, and after, to make sure that I only
picked up the right entries ...
2004-12-31 22:04:05 +00:00
Bruce Momjian
e09567d850 Back out addition of Win1252 encoding. 2004-12-04 18:19:33 +00:00
Bruce Momjian
08e0b34bad Back out fix for Unicode characters above 0x10000 2004-12-03 01:20:33 +00:00
Bruce Momjian
4ea4f8bd06 Fix for Unicode characters above 0x10000.
John Hansen
2004-12-02 22:37:14 +00:00
Bruce Momjian
7af770d005 Add Charset WIN1252 support.
Roland Volkmann
2004-12-02 22:14:38 +00:00
Neil Conway
7069dbcc31 More minor cosmetic improvements:
- remove another senseless "extern" keyword that was applied to a
function definition
- change a foo more function signatures from "some_type foo()" to
"some_type foo(void)"
- rewrite another K&R style function definition
- make the type of the "action" function pointer in the KeyWord struct
in src/backend/utils/adt/formatting.c more precise
2004-10-13 01:25:13 +00:00
Neil Conway
0e72b9d440 Cosmetic improvements/code cleanup:
- replace some function signatures of the form "some_type foo()" with
"some_type foo(void)"
- replace a few instances of a literal 0 being used as a NULL pointer;
there are more instances of this in the code, but I just fixed a few
- in src/backend/utils/mb/wstrncmp.c, replace K&R style function
declarations with ANSI style, remove use of 'register' keyword
- remove an "extern" modifier that was applied to a function definition
(rather than a declaration)
2004-10-10 23:37:45 +00:00
Bruce Momjian
e1c8b37afb Add new macro as shorthand for MS VC and Borland C++:
+ #if   defined(_MSC_VER) || defined(__BORLANDC__)
+ #define       WIN32_CLIENT_ONLY
+ #endif
2004-09-27 23:24:45 +00:00
Peter Eisentraut
152a101f2b Allow WIN1250 as server encoding. 2004-09-17 21:59:57 +00:00
Bruce Momjian
15d3f9f6b7 Another pgindent run with lib typedefs added. 2004-08-30 02:54:42 +00:00
Bruce Momjian
b6b71b85bc Pgindent run for 8.0. 2004-08-29 05:07:03 +00:00
Bruce Momjian
da9a8649d8 Update copyright to 2004. 2004-08-29 04:13:13 +00:00
Tatsuo Ishii
e8c3205037 Add PQmbdsplen() which returns the "display length" of a character.
Still some works needed:
- UTF-8, MULE_INTERNAL always returns 1
2004-03-15 10:41:26 +00:00
Tom Lane
ecb156d484 If we don't have shared libraries, we don't have conversions. Make
conversion_create.sql be empty (except for a helpful comment) in this
case.  Allows initdb to succeed with --disable-shared.
2004-01-21 19:22:19 +00:00
Tom Lane
a4f8f124b7 Fix bit-rot in support for building with --disable-shared. This patch
gets us past 'make install', but initdb still fails for lack of conversion
libraries ...
2004-01-21 19:04:11 +00:00
PostgreSQL Daemon
55b113257c make sure the $Id tags are converted to $PostgreSQL as well ... 2003-11-29 22:41:33 +00:00
PostgreSQL Daemon
969685ad44 $Header: -> $PostgreSQL Changes ... 2003-11-29 19:52:15 +00:00
Peter Eisentraut
feb4f44d29 Message editing: remove gratuitous variations in message wording, standardize
terms, add some clarifications, fix some untranslatable attempts at dynamic
message building.
2003-09-25 06:58:07 +00:00
Tatsuo Ishii
0c9f978c0c Fix GB18030 to UTF-8 mapping table 2003-08-25 01:46:16 +00:00
Tatsuo Ishii
b4ab39ff05 Fix GB18030 to UTF-8 mapping table 2003-08-24 05:18:04 +00:00
Peter Eisentraut
200b7d11af Fix uninstall target. 2003-08-23 04:22:34 +00:00
Tom Lane
f65643771b Conversion functions must be STRICT to prevent them from getting null inputs. 2003-08-08 14:31:12 +00:00
Tom Lane
2f9c859ea1 Fix some copyright notices that weren't updated. Improve copyright tool
so it won't miss 'em again.
2003-08-04 23:59:41 +00:00
Bruce Momjian
f3c3deb7d0 Update copyrights to 2003. 2003-08-04 02:40:20 +00:00
Bruce Momjian
089003fb46 pgindent run. 2003-08-04 00:43:34 +00:00
Tom Lane
b6a1d25b0a Error message editing in utils/adt. Again thanks to Joe Conway for doing
the bulk of the heavy lifting ...
2003-07-27 04:53:12 +00:00
Tom Lane
689eb53e47 Error message editing in backend/utils (except /adt). 2003-07-25 20:18:01 +00:00
Bruce Momjian
b14295cfe4 Attached is the complete diff against current CVS.
Compiles on BCC 5.5 and VC++ 6.0 (with warnings).

Karl Waclawek
2003-06-12 08:15:29 +00:00
Bruce Momjian
dc4ee8a833 Back out patch that got bundled into another patch. 2003-06-12 08:11:07 +00:00
Bruce Momjian
a647e30ba3 New patch with corrected README attached.
Also quickly added mention that it may be a qualified schema name.

Rod Taylor
2003-06-12 08:02:57 +00:00
Bruce Momjian
12c9423832 Allow Win32 to compile under MinGW. Major changes are:
Win32 port is now called 'win32' rather than 'win'
        add -lwsock32 on Win32
        make gethostname() be only used when kerberos4 is enabled
        use /port/getopt.c
        new /port/opendir.c routines
        disable GUC unix_socket_group on Win32
        convert some keywords.c symbols to KEYWORD_P to prevent conflict
        create new FCNTL_NONBLOCK macro to turn off socket blocking
        create new /include/port.h file that has /port prototypes, move
          out of c.h
        new /include/port/win32_include dir to hold missing include files
        work around ERROR being defined in Win32 includes
2003-05-15 16:35:30 +00:00
Tom Lane
351372e585 Department of second thoughts: probably still need an IsTransactionState
test in there...
2003-04-27 18:01:46 +00:00
Tom Lane
5f15fa8d06 Clean up some problems in SetClientEncoding: failed to honor doit flag
in all cases, leaked TopMemoryContext memory in others.  Make the
interaction between SetClientEncoding and InitializeClientEncoding
cleaner and better documented.  I suspect these changes should be
back-patched into 7.3, but will wait on Tatsuo's verification.
2003-04-27 17:31:25 +00:00
Tatsuo Ishii
35a0995992 Fix encoding conversion function bug.
See following posting for more details.

Subject: Re: [HACKERS] [BUGS] Bug #943: Server-Encoding from EUC_TW to UTF-8 doesn't
From: Tatsuo Ishii <t-ishii@sra.co.jp>
To: michael.enke@wincor-nixdorf.com, pgsql-bugs@postgresql.org
Cc: pgsql-hackers@postgresql.org
Date: Sat, 12 Apr 2003 10:51:45 +0900 (JST)
2003-04-12 07:53:57 +00:00
Tom Lane
1d650da2e5 This is a derived file and should never have been added to CVS. 2003-04-02 00:58:08 +00:00
Bruce Momjian
4b0b8dadd2 Add new files. 2003-03-27 16:53:15 +00:00
Tom Lane
e4704001ea This patch fixes a bunch of spelling mistakes in comments throughout the
PostgreSQL source code.

Neil Conway
2003-03-10 22:28:22 +00:00
Tatsuo Ishii
e2a618fe25 Fix for GUC client_encoding variable not being handled
correctly. See following thread for more details.

Subject: [HACKERS] client_encoding directive is ignored in postgresql.conf
From: Tatsuo Ishii <t-ishii@sra.co.jp>
Date: Wed, 29 Jan 2003 22:24:04 +0900 (JST)
2003-02-19 14:31:26 +00:00
Tom Lane
b8add56ed0 Fix array subscript overruns identified by Yichen Xie. 2003-01-29 01:01:05 +00:00
Tatsuo Ishii
38535f8e32 Fix typo in an error message 2003-01-11 06:55:11 +00:00
Peter Eisentraut
4ed6be54e2 Fix Latin9/Unicode conversion by selecting the right table. 2002-12-09 19:47:21 +00:00
Bruce Momjian
ceab6f7283 As far as I figured from the source code this function only deals with
cleaning up locale names and nothing else. Since all the locale names
are in plain  ASCII I think it will be safe to use ASCII-only lower-case
conversion.

Nicolai Tufar
2002-12-05 23:21:07 +00:00
Tatsuo Ishii
ac47950238 Guard against 0 length string encoding conversion case. 2002-11-26 02:22:29 +00:00
Tatsuo Ishii
90a06dba16 Fix broken GB18030 <--> UTF-8 conversion map 2002-11-12 11:33:40 +00:00
Tom Lane
5123139210 Remove encoding lookups from grammar stage, push them back to places
where it's safe to do database access.  Along the way, fix core dump
for 'DEFAULT' parameters to CREATE DATABASE.  initdb forced due to
change in pg_proc entry.
2002-11-02 18:41:22 +00:00
Tom Lane
3518fbe86f Add missing semicolons to a few PG_FUNCTION_INFO_V1 calls. 2002-10-26 15:01:01 +00:00
Peter Eisentraut
8c3ab663ab Tweak conversion names to follow the established naming scheme, and
document that scheme.
2002-09-24 20:14:59 +00:00
Tatsuo Ishii
4b23f05c4f Fix bug in encoding conversion map. 2002-09-18 02:10:10 +00:00
Tatsuo Ishii
4c0bdd1ba8 Update Japanese README so that it reflects the changes made to the
conversion function interface.
2002-09-18 01:21:28 +00:00