Commit Graph

7 Commits

Author SHA1 Message Date
David Rowley b4dbf3e924 Fix various typos
This fixes many spelling mistakes in comments, but a few references to
invalid parameter names, function names and option names too in comments
and also some in string constants

Also, fix an #undef that was undefining the incorrect definition

Author: Alexander Lakhin
Reviewed-by: Justin Pryzby
Discussion: https://postgr.es/m/d5f68d19-c0fc-91a9-118d-7c6a5a3f5fad@gmail.com
2023-04-18 13:23:23 +12:00
Tomas Vondra 0070b66fef pg_dump: Use only LZ4 frame format for compression
After 0da243fed0 got committed, it was reported that in some cases the
compression ratio is rather poor - particularly for custom format with
narrow tables - due to writing the LZ4 header/footer for each row.

This commit switches to LZ4F (LZ4 frame format), eliminating most of the
overhead and greatly improving the compression ratio. This makes the
compressed size about the same for plain and custom formats (just like
for gzip, for example).

LZ4F is now used by both compression APIs, which allowed refactoring and
reusing more of the code. For consistency this also renames the LZ4File
struct to LZ4State, and a number of functions are now prefixed with
LZ4Stream_ (instead of LZ4File_).

Patch by Georgios Kokolatos, based on report and initial patch by Justin
Pryzby. Review and minor cleanups by me.

Author: Georgios Kokolatos, Justin Pryzby
Reported-by: Justin Pryzby
Reviewed-by: Tomas Vondra
Discussion: https://postgr.es/m/20230227044910.GO1653%40telsasoft.com
2023-04-01 00:54:50 +02:00
Tomas Vondra d0160ca11e Minor comment improvements for compress_lz4
Author: Tomas Vondra
Reviewed-by: Georgios Kokolatos, Justin Pryzby
Discussion: https://postgr.es/m/33496f7c-3449-1426-d568-63f6bca2ac1f@gmail.com
2023-03-23 17:55:52 +01:00
Tomas Vondra f081a48f9a Unify buffer sizes in pg_dump compression API
Prior to the introduction of the compression API in e9960732a9, pg_dump
would use the ZLIB_IN_SIZE/ZLIB_OUT_SIZE to size input/output buffers.
Commit 0da243fed0 introduced similar constants for LZ4, but while gzip
defined both buffers to be 4kB, LZ4 used 4kB and 16kB without any clear
reasoning why that's desirable.

Furthermore, parts of the code unaware of which compression is used
(e.g. pg_backup_directory.c) continued to use ZLIB_OUT_SIZE directly.

Simplify by replacing the various constants with DEFAULT_IO_BUFFER_SIZE,
set to 4kB. The compression implementations still have an option to use
a custom value, but considering 4kB was fine for 20+ years, I find that
unlikely (and we'd probably just increase the default buffer size).

Author: Georgios Kokolatos
Reviewed-by: Tomas Vondra, Justin Pryzby
Discussion: https://postgr.es/m/33496f7c-3449-1426-d568-63f6bca2ac1f@gmail.com
2023-03-23 17:55:52 +01:00
Tomas Vondra d3b57755e6 Improve type handling in pg_dump's compress file API
After 0da243fed0 got committed, we've received a report about a compiler
warning, related to the new LZ4File_gets() function:

  compress_lz4.c: In function 'LZ4File_gets':
  compress_lz4.c:492:19: warning: comparison of unsigned expression in
                                  '< 0' is always false [-Wtype-limits]
    492 |         if (dsize < 0)

The reason is very simple - dsize is declared as size_t, which is an
unsigned integer, and thus the check is pointless and we might fail to
notice an error in some cases (or fail in a strange way a bit later).

The warning could have been silenced by simply changing the type, but we
realized the API mostly assumes all the libraries use the same types and
report errors the same way (e.g. by returning 0 and/or negative value).

But we can't make this assumption - the gzip/lz4 libraries already
disagree on some of this, and even if they did a library added in the
future might not.

The right solution is to define what the API does, and translate the
library-specific behavior in consistent way (so that the internal errors
are not exposed to users of our compression API). So this adjusts the
data types in a couple places, so that we don't miss library errors, and
simplifies and unifies the error reporting to simply return true/false
(instead of e.g. size_t).

While at it, make sure LZ4File_open_write() does not clobber errno in
case open_func() fails.

Author: Georgios Kokolatos
Reported-by: Alexander Lakhin
Reviewed-by: Tomas Vondra, Justin Pryzby
Discussion: https://postgr.es/m/33496f7c-3449-1426-d568-63f6bca2ac1f@gmail.com
2023-03-23 17:55:17 +01:00
Tomas Vondra 6095069b40 Improve wording in pg_dump compression docs
A couple minor corrections in pg_dump comments and docs, related to the
recently introduced compression API.

Reported-by: Justin Pryzby
Discussion: https://postgr.es/m/20230227044910.GO1653@telsasoft.com
2023-03-01 16:11:38 +01:00
Tomas Vondra 0da243fed0 Add LZ4 compression to pg_dump
Expand pg_dump's compression streaming and file APIs to support the lz4
algorithm. The newly added compress_lz4.{c,h} files cover all the
functionality of the aforementioned APIs. Minor changes were necessary
in various pg_backup_* files, where code for the 'lz4' file suffix has
been added, as well as pg_dump's compression option parsing.

Author: Georgios Kokolatos
Reviewed-by: Michael Paquier, Rachel Heaton, Justin Pryzby, Shi Yu, Tomas Vondra
Discussion: https://postgr.es/m/faUNEOpts9vunEaLnmxmG-DldLSg_ql137OC3JYDmgrOMHm1RvvWY2IdBkv_CRxm5spCCb_OmKNk2T03TMm0fBEWveFF9wA1WizPuAgB7Ss%3D%40protonmail.com
2023-02-23 21:19:26 +01:00