postgresql/src/backend
Alvaro Herrera fa2fa99552 Permit dump/reload of not-too-large >1GB tuples
Our documentation states that our maximum field size is 1 GB, and that
our maximum row size of 1.6 TB.  However, while this might be attainable
in theory with enough contortions, it is not workable in practice; for
starters, pg_dump fails to dump tables containing rows larger than 1 GB,
even if individual columns are well below the limit; and even if one
does manage to manufacture a dump file containing a row that large, the
server refuses to load it anyway.

This commit enables dumping and reloading of such tuples, provided two
conditions are met:

1. no single column is larger than 1 GB (in output size -- for bytea
   this includes the formatting overhead)
2. the whole row is not larger than 2 GB

There are three related changes to enable this:

a. StringInfo's API now has two additional functions that allow creating
a string that grows beyond the typical 1GB limit (and "long" string).
ABI compatibility is maintained.  We still limit these strings to 2 GB,
though, for reasons explained below.

b. COPY now uses long StringInfos, so that pg_dump doesn't choke
trying to emit rows longer than 1GB.

c. heap_form_tuple now uses the MCXT_ALLOW_HUGE flag in its allocation
for the input tuple, which means that large tuples are accepted on
input.  Note that at this point we do not apply any further limit to the
input tuple size.

The main reason to limit to 2 GB is that the FE/BE protocol uses 32 bit
length words to describe each row; and because the documentation is
ambiguous on its signedness and libpq does consider it signed, we cannot
use the highest-order bit.  Additionally, the StringInfo API uses "int"
(which is 4 bytes wide in most platforms) in many places, so we'd need
to change that API too in order to improve, which has lots of fallout.

Backpatch to 9.5, which is the oldest that has
MemoryContextAllocExtended, a necessary piece of infrastructure.  We
could apply to 9.4 with very minimal additional effort, but any further
than that would require backpatching "huge" allocations too.

This is the largest set of changes we could find that can be
back-patched without breaking compatibility with existing systems.
Fixing a bigger set of problems (for example, dumping tuples bigger than
2GB, or dumping fields bigger than 1GB) would require changing the FE/BE
protocol and/or changing the StringInfo API in an ABI-incompatible way,
neither of which would be back-patchable.

Authors: Daniel Vérité, Álvaro Herrera
Reviewed by: Tomas Vondra
Discussion: https://postgr.es/m/20160229183023.GA286012@alvherre.pgsql
2016-12-02 00:34:01 -03:00
..
access Permit dump/reload of not-too-large >1GB tuples 2016-12-02 00:34:01 -03:00
bootstrap Support condition variables. 2016-11-22 14:27:11 -05:00
catalog Fix test about ignoring extension dependencies during extension scripts. 2016-11-26 13:31:35 -05:00
commands Permit dump/reload of not-too-large >1GB tuples 2016-12-02 00:34:01 -03:00
executor User narrower representative tuples in the hash-agg hashtable. 2016-11-30 17:30:09 -08:00
foreign Remove GetUserMappingId() and GetUserMappingById(). 2016-07-22 11:32:23 -04:00
lib Permit dump/reload of not-too-large >1GB tuples 2016-12-02 00:34:01 -03:00
libpq Consistently mention 'SELECT pg_reload_conf()' in config files 2016-10-25 11:26:15 -04:00
main Remove barrier.h 2016-11-22 20:28:24 -05:00
nodes Implement syntax for transition tables in AFTER triggers. 2016-11-04 10:49:50 -05:00
optimizer Fix bogus handling of JOIN_UNIQUE_OUTER/INNER cases for parallel joins. 2016-11-29 19:32:35 -05:00
parser Add aggregate_with_argtypes and use it consistently 2016-12-01 17:38:49 -05:00
po Translation updates 2016-08-08 11:08:00 -04:00
port Try to find out the actual hugepage size when making a MAP_HUGETLB request. 2016-10-13 15:06:46 -04:00
postmaster Use latch instead of select() in walreceiver 2016-12-01 20:23:28 -05:00
regex Make locale-dependent regex character classes work for large char codes. 2016-09-05 17:06:29 -04:00
replication Refactor libpqwalreceiver 2016-12-01 20:23:28 -05:00
rewrite Cleanup of rewriter and planner handling of Query.hasRowSecurity flag. 2016-11-10 16:16:33 -05:00
snowball Update copyright for 2016 2016-01-02 13:33:40 -05:00
storage Remove barrier.h 2016-11-22 20:28:24 -05:00
tcop Remove or reduce verbosity of some debug messages. 2016-11-17 17:05:16 -05:00
tsearch Add macros to make AllocSetContextCreate() calls simpler and safer. 2016-08-27 17:50:38 -04:00
utils Improve hash index bucket split behavior. 2016-11-30 15:39:21 -05:00
.gitignore Add .gitignore entries for AIX-specific intermediate build artifacts. 2015-07-08 20:44:22 -04:00
Makefile Straighten out some whitespace 2016-11-29 15:08:14 -05:00
common.mk
nls.mk Remove trailing slashes from directories in find command 2015-09-18 22:06:54 -04:00