Automatic Character Set Conversion Between Server and Client
- PostgreSQL supports automatic
- character set conversion between server and client for certain
- character set combinations. The conversion information is stored in the
- pg_conversion system catalog. PostgreSQL
- comes with some predefined conversions, as shown in . You can create a new
- conversion using the SQL command CREATE CONVERSION.
+ PostgreSQL supports automatic character
+ set conversion between server and client for many combinations of
+ character sets (
+ shows which ones).
-
-
To enable automatic character set conversion, you have to
tell PostgreSQL the character set
@@ -1905,6 +1604,970 @@ RESET client_encoding;
+
+ Available Character Set Conversions
+
+
+ PostgreSQL allows conversion between any
+ two character sets for which a conversion function is listed in the
+ pg_conversion
+ system catalog. PostgreSQL comes with
+ some predefined conversions, as summarized in
+ and shown in more
+ detail in . You can
+ create a new conversion using the SQL command
+ . (To be used for automatic
+ client/server conversions, a conversion must be marked
+ as default for its character set pair.)
+
+
+
+
+
Further Reading
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 72072e7545..6c4359dc7b 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -1455,8 +1455,7 @@
and text. Unless otherwise noted, all
of the functions listed below work on all of these types, but be
wary of potential effects of automatic space-padding when using the
- character type. Some functions also exist
- natively for the bit-string types.
+ character type.
@@ -1628,7 +1627,8 @@
text
- Extract substring
+ Extract substring (provide at least one of from
+ and for)
substring('Thomas' from 2 for 3)hom
@@ -1638,9 +1638,9 @@
substring(string from pattern)text
- Extract substring matching POSIX regular expression. See
+ Extract substring matching POSIX regular expression (see
for more information on pattern
- matching.
+ matching)
substring('Thomas' from '...$')mas
@@ -1650,9 +1650,9 @@
substring(string from pattern for escape)text
- Extract substring matching SQL regular expression.
- See for more information on
- pattern matching.
+ Extract substring matching SQL regular expression
+ (see for more information on
+ pattern matching)
substring('Thomas' from '%#"o_a#"_' for '#')oma
@@ -1682,9 +1682,8 @@
trim(leading | trailing
| bothfrom
- string
- , characters
- )
+ string,
+ characters)
text
@@ -1822,101 +1821,6 @@
-
- convert
-
- convert(stringbytea,
- src_encodingname,
- dest_encodingname)
-
- bytea
-
- Convert string to dest_encoding. The
- original encoding is specified by
- src_encoding. The
- string must be valid in this encoding.
- Conversions can be defined by CREATE CONVERSION.
- Also there are some predefined conversions. See for available conversions.
-
- convert('text_in_utf8', 'UTF8', 'LATIN1')
- text_in_utf8 represented in Latin-1
- encoding (ISO 8859-1)
-
-
-
-
-
- convert_from
-
- convert_from(stringbytea,
- src_encodingname)
-
- text
-
- Convert string to the database encoding. The original encoding
- is specified by src_encoding. The
- string must be valid in this encoding.
-
- convert_from('text_in_utf8', 'UTF8')
- text_in_utf8 represented in the current database encoding
-
-
-
-
-
- convert_to
-
- convert_to(stringtext,
- dest_encodingname)
-
- bytea
-
- Convert string to dest_encoding.
-
- convert_to('some text', 'UTF8')
- some text represented in the UTF8 encoding
-
-
-
-
-
- decode
-
- decode(stringtext,
- formattext)
-
- bytea
-
- Decode binary data from textual representation in string.
- Options for format are same as in encode.
-
- decode('MTIzAAE=', 'base64')
- \x3132330001
-
-
-
-
-
- encode
-
- encode(databytea,
- formattext)
-
- text
-
- Encode binary data into a textual representation. Supported
- formats are: base64, hex, escape.
- escape converts zero bytes and high-bit-set bytes to
- octal sequences (\nnn) and
- doubles backslashes.
-
- encode('123\000\001', 'base64')
- MTIzAAE=
-
-
-
- format
@@ -1955,13 +1859,14 @@
left
- left(strtext,
+ left(stringtext,
nint)text
- Return first n characters in the string. When n
- is negative, return all but last |n| characters.
+ Return first n characters in the
+ string, or when n is negative, return
+ all but last |n| characters
left('abcde', 2)ab
@@ -1982,19 +1887,6 @@
4
-
- length(stringbytea,
- encodingname )
- int
-
- Number of characters in string in the given
- encoding. The string
- must be valid in this encoding.
-
- length('jose', 'UTF8')
- 4
-
-
@@ -2006,7 +1898,7 @@
text
- Fill up the string to length
+ Extend the string to length
length by prepending the characters
fill (a space by default). If the
string is already longer than
@@ -2044,8 +1936,8 @@
text
- Calculates the MD5 hash of string,
- returning the result in hexadecimal
+ MD5 hash, with
+ the result written in hexadecimal
md5('abc')900150983cd24fb0 d6963f7d28e17f72
@@ -2182,8 +2074,8 @@
text[]
Return captured substring(s) resulting from the first match of a POSIX
- regular expression to the string. See
- for more information.
+ regular expression to the string (see
+ for more information)
regexp_match('foobarbequebaz', '(bar)(beque)'){bar,beque}
@@ -2199,8 +2091,8 @@
setof text[]
Return captured substring(s) resulting from matching a POSIX regular
- expression to the string. See
- for more information.
+ expression to the string (see
+ for more information)
regexp_matches('foobarbequebaz', 'ba.', 'g'){bar}{baz} (2 rows)
@@ -2215,8 +2107,8 @@
text
- Replace substring(s) matching a POSIX regular expression. See
- for more information.
+ Replace substring(s) matching a POSIX regular expression (see
+ for more information)
regexp_replace('Thomas', '.[mN]a.', 'M')ThM
@@ -2232,8 +2124,8 @@
text[]
Split string using a POSIX regular expression as
- the delimiter. See for more
- information.
+ the delimiter (see for more
+ information)
regexp_split_to_array('hello world', '\s+'){hello,world}
@@ -2249,8 +2141,8 @@
setof text
Split string using a POSIX regular expression as
- the delimiter. See for more
- information.
+ the delimiter (see for more
+ information)
regexp_split_to_table('hello world', '\s+')helloworld (2 rows)
@@ -2296,7 +2188,7 @@
text
- Return reversed string.
+ Reverse the order of the characters in stringreverse('abcde')edcba
@@ -2307,13 +2199,14 @@
right
- right(strtext,
+ right(stringtext,
nint)text
- Return last n characters in the string. When n
- is negative, return all but first |n| characters.
+ Return last n characters in the string,
+ or when n is negative, return all but
+ first |n| characters
right('abcde', 2)de
@@ -2330,7 +2223,7 @@
text
- Fill up the string to length
+ Extend the string to length
length by appending the characters
fill (a space by default). If the
string is already longer than
@@ -2418,7 +2311,8 @@
bool
- Returns true if string starts with prefix.
+ Return true if string starts
+ with prefixstarts_with('alphabet', 'alph')t
@@ -2497,781 +2391,12 @@
- See also the aggregate function string_agg in
- .
+ See also the aggregate function string_agg in
+ , and the functions for
+ converting between strings and the bytea type in
+ .
-
-
format
@@ -3494,7 +2619,9 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
This section describes functions and operators for examining and
- manipulating values of type bytea.
+ manipulating binary strings, that is values of type bytea.
+ Many of these are equivalent, in purpose and syntax, to the
+ text-string functions described in the previous section.
@@ -3512,6 +2639,8 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
The sample results shown on this page assume that the server parameter
bytea_output is set
to escape (the traditional PostgreSQL format).
+ The output will look different when using the default setting
+ (hex).
@@ -3530,11 +2659,11 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
- string||
- string
+ bytes||
+ bytesbytea
- String concatenation
+ Binary string concatenation
binary stringconcatenation
@@ -3544,12 +2673,25 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
\\Post'gres\000
+
+
+
+ bit_length
+
+ bit_length(bytes)
+
+ int
+ Number of bits in binary string
+ bit_length('jo\000se':bytea)
+ 40
+
+
octet_length
- octet_length(string)
+ octet_length(bytes)intNumber of bytes in binary string
@@ -3562,7 +2704,7 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
overlay
- overlay(string placing string from intfor int)
+ overlay(bytes placing bytes from intfor int)bytea
@@ -3577,7 +2719,7 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
position
- position(substring in string)
+ position(bytesubstring in bytes)intLocation of specified substring
@@ -3590,11 +2732,12 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
substring
- substring(stringfrom intfor int)
+ substring(bytesfrom intfor int)bytea
- Extract substring
+ Extract substring (provide at least one of from
+ and for)
substring('Th\000omas'::bytea from 2 for 3)h\000o
@@ -3606,18 +2749,32 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
trimtrim(both
- bytes from
- string)
+ bytesremoved from
+ bytes)
bytea
Remove the longest string containing only bytes appearing in
- bytes from the start
- and end of string
+ bytesremoved from the start
+ and end of bytestrim('\000\001'::bytea from '\000Tom\001'::bytea)Tom
+
+
+
+ trim(bothfrom
+ bytes,
+ bytesremoved)
+
+ bytea
+
+ Non-standard syntax for trim()
+
+ trim(both from 'yxTomxx'::bytea, 'xyz'::bytea)
+ Tom
+
@@ -3649,66 +2806,30 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
btrim
- btrim(string
- bytea, bytesbytea)
+ btrim(bytes
+ bytea, bytesremovedbytea)bytea
Remove the longest string containing only bytes appearing in
- bytes from the start and end of
- string
-
+ bytesremoved from the start and end of
+ bytes
+
btrim('\000trim\001'::bytea, '\000\001'::bytea)trim
-
-
-
- decode
-
- decode(stringtext,
- formattext)
-
- bytea
-
- Decode binary data from textual representation in string.
- Options for format are same as in encode.
-
- decode('123\000456', 'escape')
- 123\000456
-
-
-
-
-
- encode
-
- encode(databytea,
- formattext)
-
- text
-
- Encode binary data into a textual representation. Supported
- formats are: base64, hex, escape.
- escape converts zero bytes and high-bit-set bytes to
- octal sequences (\nnn) and
- doubles backslashes.
-
- encode('123\000456'::bytea, 'escape')
- 123\000456
-
-
get_bit
- get_bit(string, offset)
+ get_bit(bytesbytea, offsetint)int
- Extract bit from string
+ Extract n'th bit
+ from binary string
get_bit('Th\000omas'::bytea, 45)1
@@ -3719,67 +2840,82 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
get_byte
- get_byte(string, offset)
+ get_byte(bytesbytea, offsetint)int
- Extract byte from string
+ Extract n'th byte
+ from binary string
get_byte('Th\000omas'::bytea, 4)109
-
-
-
- length
-
- length(string)
-
- int
-
- Length of binary string
-
- binary string
- length
-
-
- length
- of a binary string
- binary strings, length
-
-
- length('jo\000se'::bytea)
- 5
-
+
+
+
+ length
+
+ length(bytesbytea)
+
+ int
+
+ Number of bytes in binary string
+
+ binary string
+ length
+
+
+ length
+ of a binary string
+ binary strings, length
+
+
+ length('jo\000se'::bytea)
+ 5
+
-
-
-
- md5
-
- md5(string)
-
- text
-
- Calculates the MD5 hash of string,
- returning the result in hexadecimal
-
- md5('Th\000omas'::bytea)
- 8ab2d3c9689aaf18b4958c334c82d8b1
-
+
+ length(bytesbytea,
+ encodingname)
+ int
+
+ Number of characters in bytes, assuming
+ that it is text in the given encoding
+
+ length('jose'::bytea, 'UTF8')
+ 4
+
+
+
+
+
+ md5
+
+ md5(bytesbytea)
+
+ text
+
+ MD5 hash, with
+ the result written in hexadecimal
+
+ md5('Th\000omas'::bytea)
+ 8ab2d3c9689aaf18b4958c334c82d8b1
+ set_bit
- set_bit(string,
- offset, newvalue)
+ set_bit(bytesbytea,
+ offsetint,
+ newvalueint)bytea
- Set bit in string
+ Set n'th bit in
+ binary string
set_bit('Th\000omas'::bytea, 45, 0)Th\000omAs
@@ -3790,12 +2926,14 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
set_byte
- set_byte(string,
- offset, newvalue)
+ set_byte(bytesbytea,
+ offsetint,
+ newvalueint)bytea
- Set byte in string
+ Set n'th byte in
+ binary string
set_byte('Th\000omas'::bytea, 4, 64)Th\000o@as
@@ -3806,13 +2944,13 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
sha224
- sha224(bytea)
+ sha224(bytesbytea)bytea
- SHA-224 hash
+ SHA-224 hash
- sha224('abc')
+ sha224('abc'::bytea)\x23097d223405d8228642a477bda255b32aadbce4bda0b3f7e36c9da7
@@ -3821,13 +2959,13 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
sha256
- sha256(bytea)
+ sha256(bytesbytea)bytea
- SHA-256 hash
+ SHA-256 hash
- sha256('abc')
+ sha256('abc'::bytea)\xba7816bf8f01cfea414140de5dae2223b00361a396177a9cb410ff61f20015ad
@@ -3836,13 +2974,13 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
sha384
- sha384(bytea)
+ sha384(bytesbytea)bytea
- SHA-384 hash
+ SHA-384 hash
- sha384('abc')
+ sha384('abc'::bytea)\xcb00753f45a35e8bb5a03d699ac65007272c32ab0eded1631a8b605a43ff5bed8086072ba1e7cc2358baeca134c825a7
@@ -3851,34 +2989,258 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
sha512
- sha512(bytea)
+ sha512(bytesbytea)bytea
- SHA-512 hash
+ SHA-512 hash
- sha512('abc')
+ sha512('abc'::bytea)\xddaf35a193617abacc417349ae20413112e6fa4e89a97ea20a9eeee64b55d39a2192992a274fc1a836ba3c23a3feebbd454d4423643ce80e2a9ac94fa54ca49f
+
+
+
+
+ substr
+
+ substr(bytesbytea, fromint, countint)
+
+ bytea
+
+ Extract substring (same as
+ substring(bytea from from for count))
+
+ substr('alphabet', 3, 2)
+ ph
+
-
- get_byte and set_byte number the first byte
- of a binary string as byte 0.
- get_bit and set_bit number bits from the
- right within each byte; for example bit 0 is the least significant bit of
- the first byte, and bit 15 is the most significant bit of the second byte.
+
+ Functions get_byte and set_byte
+ number the first byte of a binary string as byte 0.
+ Functions get_bit and set_bit
+ number bits from the right within each byte; for example bit 0 is the least
+ significant bit of the first byte, and bit 15 is the most significant bit
+ of the second byte.
+
+
+
+ For historical reasons, the function md5
+ returns a hex-encoded value of type text whereas the SHA-2
+ functions return type bytea. Use the functions
+ encode
+ and decode to
+ convert between the two. For example write encode(sha256('abc'),
+ 'hex') to get a hex-encoded text representation,
+ or decode(md5('abc'), 'hex') to get
+ a bytea value.
- Note that for historic reasons, the function md5
- returns a hex-encoded value of type text whereas the SHA-2
- functions return type bytea. Use the functions
- encode and decode to convert
- between the two, for example encode(sha256('abc'),
- 'hex') to get a hex-encoded text representation.
+
+ character string
+ converting to binary string
+
+
+ binary string
+ converting to character string
+
+ Functions for converting strings between different character sets
+ (encodings), and for representing arbitrary binary data in textual
+ form, are shown in
+ . For these
+ functions, an argument or result of type text is expressed
+ in the database's default encoding, while arguments or results of
+ type bytea are in an encoding named by another argument.
+
+
+
+ Text/Binary String Conversion Functions
+
+
+
+ Function
+ Return Type
+ Description
+ Example
+ Result
+
+
+
+
+
+
+
+ convert
+
+ convert(bytesbytea,
+ src_encodingname,
+ dest_encodingname)
+
+ bytea
+
+ Convert binary string representing text in
+ encoding src_encoding
+ to a binary string in encoding dest_encoding
+ (see for
+ available conversions)
+
+ convert('text_in_utf8', 'UTF8', 'LATIN1')
+ text_in_utf8 represented in Latin-1 encoding
+
+
+
+
+
+ convert_from
+
+ convert_from(bytesbytea,
+ src_encodingname)
+
+ text
+
+ Convert binary string representing text in
+ encoding src_encoding
+ to text in the database encoding
+ (see for
+ available conversions)
+
+ convert_from('text_in_utf8', 'UTF8')
+ text_in_utf8 represented in the
+ database encoding
+
+
+
+
+
+ convert_to
+
+ convert_to(stringtext,
+ dest_encodingname)
+
+ bytea
+
+ Convert text (in the database encoding) to a binary
+ string encoded in encoding dest_encoding
+ (see for
+ available conversions)
+
+ convert_to('some text', 'UTF8')
+ some text represented in UTF8 encoding
+
+
+
+
+
+ encode
+
+ encode(bytesbytea,
+ formattext)
+
+ text
+
+ Encode binary data into a textual representation; supported
+ format values are:
+ base64,
+ escape,
+ hex
+
+ encode('123\000\001', 'base64')
+ MTIzAAE=
+
+
+
+
+
+ decode
+
+ decode(stringtext,
+ formattext)
+
+ bytea
+
+ Decode binary data from a textual representation given
+ in string; allowed
+ format values are the same as
+ for encode
+
+ decode('MTIzAAE=', 'base64')
+ \x3132330001
+
+
+
+
+
+
+ The encode and decode
+ functions support the following textual formats:
+
+
+
+ base64
+
+ base64 format
+
+
+
+ The base64 format is that
+ of RFC
+ 2045 Section 6.8. As per the RFC, encoded lines are
+ broken at 76 characters. However instead of the MIME CRLF
+ end-of-line marker, only a newline is used for end-of-line.
+ The decode function ignores carriage-return,
+ newline, space, and tab characters. Otherwise, an error is
+ raised when decode is supplied invalid
+ base64 data — including when trailing padding is incorrect.
+
+
+
+
+
+ escape
+
+ escape format
+
+
+
+ The escape format converts zero bytes and
+ bytes with the high bit set into octal escape sequences
+ (\nnn), and it doubles
+ backslashes. Other byte values are represented literally.
+ The decode function will raise an error if a
+ backslash is not followed by either a second backslash or three
+ octal digits; it accepts other byte values unchanged.
+
+
+
+
+
+ hex
+
+ hex format
+
+
+
+ The hex format represents each 4 bits of
+ data as one hexadecimal digit, 0
+ through f, writing the higher-order digit of
+ each byte first. The encode function outputs
+ the a-f hex digits in lower
+ case. Because the smallest unit of data is 8 bits, there are
+ always an even number of characters returned
+ by encode.
+ The decode function
+ accepts the a-f characters in
+ either upper or lower case. An error is raised
+ when decode is given invalid hex data
+ — including when given an odd number of characters.
+
+
+
+
@@ -3976,22 +3338,20 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
- The following SQL-standard functions work on bit
- strings as well as character strings:
+ Some of the binary-string functions shown in
+ and
+ are also available
+ for bit strings, specifically:
length,
bit_length,
octet_length,
position,
substring,
- overlay.
-
-
-
- The following functions work on bit strings as well as binary
- strings:
+ overlay,
get_bit,
set_bit.
- When working with a bit string, these functions number the first
+ When working with a bit string, get_bit
+ and set_bit number the first
(leftmost) bit of the string as bit 0.