Up to now, ACL checks for large objects happened at the level of
the SQL-callable functions, which led to CVE-2017-7548 because of a
missing check. Push them down to be enforced in inv_api.c as much
as possible, in hopes of preventing future bugs. This does have the
effect of moving read and write permission errors to happen at lo_open
time not loread or lowrite time, but that seems acceptable.
Michael Paquier and Tom Lane
Discussion: https://postgr.es/m/CAB7nPqRHmNOYbETnc_2EjsuzSM00Z+BWKv9sy6tnvSd5gWT_JA@mail.gmail.com
Change pg_bsd_indent to follow upstream rules for placement of comments
to the right of code, and remove pgindent hack that caused comments
following #endif to not obey the general rule.
Commit e3860ffa4d wasn't actually using
the published version of pg_bsd_indent, but a hacked-up version that
tried to minimize the amount of movement of comments to the right of
code. The situation of interest is where such a comment has to be
moved to the right of its default placement at column 33 because there's
code there. BSD indent has always moved right in units of tab stops
in such cases --- but in the previous incarnation, indent was working
in 8-space tab stops, while now it knows we use 4-space tabs. So the
net result is that in about half the cases, such comments are placed
one tab stop left of before. This is better all around: it leaves
more room on the line for comment text, and it means that in such
cases the comment uniformly starts at the next 4-space tab stop after
the code, rather than sometimes one and sometimes two tabs after.
Also, ensure that comments following #endif are indented the same
as comments following other preprocessor commands such as #else.
That inconsistency turns out to have been self-inflicted damage
from a poorly-thought-through post-indent "fixup" in pgindent.
This patch is much less interesting than the first round of indent
changes, but also bulkier, so I thought it best to separate the effects.
Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
It's critical that the backend's idea of LOBLKSIZE match the way data has
actually been divided up in pg_largeobject. While we don't provide any
direct way to adjust that value, doing so is a one-line source code change
and various people have expressed interest recently in changing it. So,
just as with TOAST_MAX_CHUNK_SIZE, it seems prudent to record the value in
pg_control and cross-check that the backend's compiled-in setting matches
the on-disk data.
Also tweak the code in inv_api.c so that fetches from pg_largeobject
explicitly verify that the length of the data field is not more than
LOBLKSIZE. Formerly we just had Asserts() for that, which is no protection
at all in production builds. In some of the call sites an overlength data
value would translate directly to a security-relevant stack clobber, so it
seems worth one extra runtime comparison to be sure.
In the back branches, we can't change the contents of pg_control; but we
can still make the extra checks in inv_api.c, which will offer some amount
of protection against running with the wrong value of LOBLKSIZE.
Do read/write permissions checks at most once per large object descriptor,
not once per lo_read or lo_write call as before. The repeated tests were
quite useless in the read case since the snapshot-based tests were
guaranteed to produce the same answer every time. In the write case,
the extra tests could in principle detect revocation of write privileges
after a series of writes has started --- but there's a race condition there
anyway, since we'd check privileges before performing and certainly before
committing the write. So there's no real advantage to checking every
single time, and we might as well redefine it as "only check the first
time".
On the same reasoning, remove the LargeObjectExists checks in inv_write
and inv_truncate. We already checked existence when the descriptor was
opened, and checking again doesn't provide any real increment of safety
that would justify the cost.
Get rid of the fundamentally indefensible assumption that "long long int"
exists and is exactly 64 bits wide on every platform Postgres runs on.
Instead let the configure script select the type to use for "pg_int64".
This is a bit of a pain in the rear since we do not want to pollute client
namespace with all the random symbols that pg_config.h defines; instead
we have to create a separate generated header file, "pg_config_ext.h".
But now that the infrastructure is there, we might have the ability to
add some other stuff that's long been wanting in this area.
4TB large objects (standard 8KB BLCKSZ case). For this purpose new
libpq API lo_lseek64, lo_tell64 and lo_truncate64 are added. Also
corresponding new backend functions lo_lseek64, lo_tell64 and
lo_truncate64 are added. inv_api.c is changed to handle 64-bit
offsets.
Patch contributed by Nozomi Anzai (backend side) and Yugo Nagata
(frontend side, docs, regression tests and example program). Reviewed
by Kohei Kaigai. Committed by Tatsuo Ishii with minor editings.
snapmgmt.c file for the former. The header files have also been reorganized
in three parts: the most basic snapshot definitions are now in a new file
snapshot.h, and the also new snapmgmt.h keeps the definitions for snapmgmt.c.
tqual.h has been reduced to the bare minimum.
This patch is just a first step towards managing live snapshots within a
transaction; there is no functionality change.
Per my proposal to pgsql-patches on 20080318191940.GB27458@alvh.no-ip.org and
subsequent discussion.
set to the large object context ("fscxt"), as this is inevitably a source of
transaction-duration memory leaks. Not sure why we'd not noticed it before;
maybe people weren't touching a whole lot of LOs in the same transaction
before the 8.1 pg_dump changes. Per report from Wayne Conrad.
Backpatched as far as 8.1, but the problem doubtless goes all the way back.
I'm disinclined to spend the time to try to verify that the older branches
would still work if patched, seeing that this code was significantly modified
for 8.0 and again for 8.1, and that we don't have any trouble reports before
8.1. (Maybe the leaks were smaller before?)
a descriptor that uses the current transaction snapshot, rather than
SnapshotNow as it did before (and still does if INV_WRITE is set).
This means pg_dump will now dump a consistent snapshot of large object
contents, as it never could do before. Also, add a lo_create() function
that is similar to lo_creat() but allows the desired OID of the large
object to be specified. This will simplify pg_restore considerably
(but I'll fix that in a separate commit).
Also performed an initial run through of upgrading our Copyright date to
extend to 2005 ... first run here was very simple ... change everything
where: grep 1996-2004 && the word 'Copyright' ... scanned through the
generated list with 'less' first, and after, to make sure that I only
picked up the right entries ...
as per recent discussions. Invent SubTransactionIds that are managed like
CommandIds (ie, counter is reset at start of each top transaction), and
use these instead of TransactionIds to keep track of subtransaction status
in those modules that need it. This means that a subtransaction does not
need an XID unless it actually inserts/modifies rows in the database.
Accordingly, don't assign it an XID nor take a lock on the XID until it
tries to do that. This saves a lot of overhead for subtransactions that
are only used for error recovery (eg plpgsql exceptions). Also, arrange
to release a subtransaction's XID lock as soon as the subtransaction
exits, in both the commit and abort cases. This avoids holding many
unique locks after a long series of subtransactions. The price is some
additional overhead in XactLockTableWait, but that seems acceptable.
Finally, restructure the state machine in xact.c to have a more orthogonal
set of states for subtransactions.
password/group files. Also allow read-only subtransactions of a read-write
parent, but not vice versa. These are the reasonably noncontroversial
parts of Alvaro's recent mop-up patch, plus further work on large objects
to minimize use of the TopTransactionResourceOwner.
as full as possible, seems better to use a tuple size around BLCKSZ/4
so that less space is wasted when a LO tuple is updated. Also, this
lets us use a logical page size that's an exact power of two, avoiding
partial-page writes when client is sending us stuff in power-of-2
buffer chunks.
kibitzing from Tom Lane. Large objects are now all stored in a single
system relation "pg_largeobject" --- no more xinv or xinx files, no more
relkind 'l'. This should offer substantial performance improvement for
large numbers of LOs, since there won't be directory bloat anymore.
It'll also fix problems like running out of locktable space when you
access thousands of LOs in one transaction.
Also clean up cruft in read/write routines. LOs with "holes" in them
(never-written byte ranges) now work just like Unix files with holes do:
a hole reads as zeroes but doesn't occupy storage space.
INITDB forced!
> Regression tests opr_sanity and sanity_check are now failing.
Um, Bruce, I've said several times that I didn't think Perchine's large
object changes should be applied until someone had actually reviewed
them.
I tested it restoring my database with > 100000 BLOBS, and dumping it out.
But unfortunatly I can not restore it back due to problems in pg_dump.
--
Sincerely Yours,
Denis Perchine
> this is patch v 0.4 to support transactions with BLOBs.
> All BLOBs are in one table. You need to make initdb.
>
> --
> Sincerely Yours,
> Denis Perchine