Avoid unnecessary out-of-memory errors during encoding conversion.

Encoding conversion uses the very simplistic rule that the output
can't be more than 4X longer than the input, and palloc's a buffer
of that size.  This results in failure to convert any string longer
than 1/4 GB, which is becoming an annoying limitation.

As a band-aid to improve matters, allow the allocated output buffer
size to exceed 1GB.  We still insist that the final result fit into
MaxAllocSize (1GB), though.  Perhaps it'd be safe to relax that
restriction, but it'd require close analysis of all callers, which
is daunting (not least because external modules might call these
functions).  For the moment, this should allow a 2X to 4X improvement
in the longest string we can convert, which is a useful gain in
return for quite a simple patch.

Also, once we have successfully converted a long string, repalloc
the output down to the actual string length, returning the excess
to the malloc pool.  This seems worth doing since we can usually
expect to give back several MB if we take this path at all.

This still leaves much to be desired, most notably that the assumption
that MAX_CONVERSION_GROWTH == 4 is very fragile, and yet we have no
guard code verifying that the output buffer isn't overrun.  Fixing
that would require significant changes in the encoding conversion
APIs, so it'll have to wait for some other day.

The present patch seems safely back-patchable, so patch all supported
branches.

Alvaro Herrera and Tom Lane

Discussion: https://postgr.es/m/20190816181418.GA898@alvherre.pgsql
Discussion: https://postgr.es/m/3614.1569359690@sss.pgh.pa.us
This commit is contained in:
Tom Lane 2019-10-03 17:34:25 -04:00
parent c477f3e449
commit 8e10405c74
1 changed files with 56 additions and 6 deletions

View File

@ -349,16 +349,24 @@ pg_do_encoding_conversion(unsigned char *src, int len,
pg_encoding_to_char(dest_encoding))));
/*
* Allocate space for conversion result, being wary of integer overflow
* Allocate space for conversion result, being wary of integer overflow.
*
* len * MAX_CONVERSION_GROWTH is typically a vast overestimate of the
* required space, so it might exceed MaxAllocSize even though the result
* would actually fit. We do not want to hand back a result string that
* exceeds MaxAllocSize, because callers might not cope gracefully --- but
* if we just allocate more than that, and don't use it, that's fine.
*/
if ((Size) len >= (MaxAllocSize / (Size) MAX_CONVERSION_GROWTH))
if ((Size) len >= (MaxAllocHugeSize / (Size) MAX_CONVERSION_GROWTH))
ereport(ERROR,
(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
errmsg("out of memory"),
errdetail("String of %d bytes is too long for encoding conversion.",
len)));
result = palloc(len * MAX_CONVERSION_GROWTH + 1);
result = (unsigned char *)
MemoryContextAllocHuge(CurrentMemoryContext,
(Size) len * MAX_CONVERSION_GROWTH + 1);
OidFunctionCall5(proc,
Int32GetDatum(src_encoding),
@ -366,6 +374,26 @@ pg_do_encoding_conversion(unsigned char *src, int len,
CStringGetDatum(src),
CStringGetDatum(result),
Int32GetDatum(len));
/*
* If the result is large, it's worth repalloc'ing to release any extra
* space we asked for. The cutoff here is somewhat arbitrary, but we
* *must* check when len * MAX_CONVERSION_GROWTH exceeds MaxAllocSize.
*/
if (len > 1000000)
{
Size resultlen = strlen((char *) result);
if (resultlen >= MaxAllocSize)
ereport(ERROR,
(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
errmsg("out of memory"),
errdetail("String of %d bytes is too long for encoding conversion.",
len)));
result = (unsigned char *) repalloc(result, resultlen + 1);
}
return result;
}
@ -682,16 +710,19 @@ perform_default_encoding_conversion(const char *src, int len,
return unconstify(char *, src);
/*
* Allocate space for conversion result, being wary of integer overflow
* Allocate space for conversion result, being wary of integer overflow.
* See comments in pg_do_encoding_conversion.
*/
if ((Size) len >= (MaxAllocSize / (Size) MAX_CONVERSION_GROWTH))
if ((Size) len >= (MaxAllocHugeSize / (Size) MAX_CONVERSION_GROWTH))
ereport(ERROR,
(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
errmsg("out of memory"),
errdetail("String of %d bytes is too long for encoding conversion.",
len)));
result = palloc(len * MAX_CONVERSION_GROWTH + 1);
result = (char *)
MemoryContextAllocHuge(CurrentMemoryContext,
(Size) len * MAX_CONVERSION_GROWTH + 1);
FunctionCall5(flinfo,
Int32GetDatum(src_encoding),
@ -699,6 +730,25 @@ perform_default_encoding_conversion(const char *src, int len,
CStringGetDatum(src),
CStringGetDatum(result),
Int32GetDatum(len));
/*
* Release extra space if there might be a lot --- see comments in
* pg_do_encoding_conversion.
*/
if (len > 1000000)
{
Size resultlen = strlen(result);
if (resultlen >= MaxAllocSize)
ereport(ERROR,
(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
errmsg("out of memory"),
errdetail("String of %d bytes is too long for encoding conversion.",
len)));
result = (char *) repalloc(result, resultlen + 1);
}
return result;
}