proposal for OUT parameter support. The columns don't actually *do*
anything yet, they are just left NULLs. But I thought I'd commit this
part separately as a fairly pure example of the tasks needed when adding
a column to pg_proc or one of the other core system tables.
change saves a great deal of space in pg_proc and its primary index,
and it eliminates the former requirement that INDEX_MAX_KEYS and
FUNC_MAX_ARGS have the same value. INDEX_MAX_KEYS is still embedded
in the on-disk representation (because it affects index tuple header
size), but FUNC_MAX_ARGS is not. I believe it would now be possible
to increase FUNC_MAX_ARGS at little cost, but haven't experimented yet.
There are still a lot of vestigial references to FUNC_MAX_ARGS, which
I will clean up in a separate pass. However, getting rid of it
altogether would require changing the FunctionCallInfoData struct,
and I'm not sure I want to buy into that.
executing a statement that fires triggers. Formerly this time was
included in "Total runtime" but not otherwise accounted for.
As a side benefit, we avoid re-opening relations when firing non-deferred
AFTER triggers, because the trigger code can re-use the main executor's
ResultRelInfo data structure.
convention for isnull flags. Also, remove the useless InsertIndexResult
return struct from index AM aminsert calls --- there is no reason for
the caller to know where in the index the tuple was inserted, and we
were wasting a palloc cycle per insert to deliver this uninteresting
value (plus nontrivial complexity in some AMs).
I forced initdb because of the change in the signature of the aminsert
routines, even though nothing really looks at those pg_proc entries...
to write out data that we are about to tell the filesystem to drop.
smgr_internal_unlink already had a DropRelFileNodeBuffers call to
get rid of dead buffers without a write after it's no longer possible
to roll back the deleting transaction. Adding a similar call in
smgrtruncate simplifies callers and makes the overall division of
labor clearer. This patch removes the former behavior that VACUUM
would write all dirty buffers of a relation unconditionally.
of tuples when passing data up through multiple plan nodes. A slot can now
hold either a normal "physical" HeapTuple, or a "virtual" tuple consisting
of Datum/isnull arrays. Upper plan levels can usually just copy the Datum
arrays, avoiding heap_formtuple() and possible subsequent nocachegetattr()
calls to extract the data again. This work extends Atsushi Ogawa's earlier
patch, which provided the key idea of adding Datum arrays to TupleTableSlots.
(I believe however that something like this was foreseen way back in Berkeley
days --- see the old comment on ExecProject.) A test case involving many
levels of join of fairly wide tables (about 80 columns altogether) showed
about 3x overall speedup, though simple queries will probably not be
helped very much.
I have also duplicated some code in heaptuple.c in order to provide versions
of heap_formtuple and friends that use "bool" arrays to indicate null
attributes, instead of the old convention of "char" arrays containing either
'n' or ' '. This provides a better match to the convention used by
ExecEvalExpr. While I have not made a concerted effort to get rid of uses
of the old routines, I think they should be deprecated and eventually removed.
number of palloc calls. This has a salutory impact on plpgsql operations
with record variables (which create and destroy tupdescs constantly)
and probably helps a bit in some other cases too.
Too much space is allocated for tablespace file path, I guess the
directory name used to be "pg_tablespaces" instead of "pg_tblspc" at
some point.
Heikki Linnakangas
the freelist, plus per-buffer spinlocks that protect access to individual
shared buffer headers. This requires abandoning a global freelist (since
the freelist is a global contention point), which shoots down ARC and 2Q
as well as plain LRU management. Adopt a clock sweep algorithm instead.
Preliminary results show substantial improvement in multi-backend situations.
command. This is useful because we can allow truncation of tables
referenced by foreign keys, so long as the referencing table is
truncated in the same command.
Alvaro Herrera
is the minimum required fix. I want to look next at taking advantage of
it by simplifying the message semantics in the shared inval message queue,
but that part can be held over for 8.1 if it turns out too ugly.
Also performed an initial run through of upgrading our Copyright date to
extend to 2005 ... first run here was very simple ... change everything
where: grep 1996-2004 && the word 'Copyright' ... scanned through the
generated list with 'less' first, and after, to make sure that I only
picked up the right entries ...
> throughout to the spellings suggested by your book.
Great.
A follow-up patch for current CVS HEAD is attached, and available at
http://troels.arvin.dk/db/pgsql/conformance/pgsql-sql-conformance-
followup.patch
The patch
- includes a core feature ID that had been left
out by mistake (C011)
- updates the sql_feature_packages.txt table to
reflect changes in SQL:2003 which were not
covered properly in my last patch
Troels Arvin
a relation's number of blocks, rather than the possibly-obsolete value
in pg_class.relpages. Scale the value in pg_class.reltuples correspondingly
to arrive at a hopefully more accurate number of rows. When pg_class
contains 0/0, estimate a tuple width from the column datatypes and divide
that into current file size to estimate number of rows. This improved
methodology allows us to jettison the ancient hacks that put bogus default
values into pg_class when a table is first created. Also, per a suggestion
from Simon, make VACUUM (but not VACUUM FULL or ANALYZE) adjust the value
it puts into pg_class.reltuples to try to represent the mean tuple density
instead of the minimal density that actually prevails just after VACUUM.
These changes alter the plans selected for certain regression tests, so
update the expected files accordingly. (I removed join_1.out because
it's not clear if it still applies; we can add back any variant versions
as they are shown to be needed.)
clause implicitly whenever one is not given explicitly. Remove concept
of a schema having an associated tablespace, and simplify the rules for
selecting a default tablespace for a table or index. It's now just
(a) explicit TABLESPACE clause; (b) default_tablespace if that's not an
empty string; (c) database's default. This will allow pg_dump to use
SET commands instead of tablespace clauses to determine object locations
(but I didn't actually make it do so). All per recent discussions.
http://archives.postgresql.org/pgsql-hackers/2004-10/msg00464.php.
This fix is intended to be permanent: it moves the responsibility for
calling SetBufferCommitInfoNeedsSave() into the tqual.c routines,
eliminating the requirement for callers to test whether t_infomask changed.
Also, tighten validity checking on buffer IDs in bufmgr.c --- several
routines were paranoid about out-of-range shared buffer numbers but not
about out-of-range local ones, which seems a tad pointless.
parent table's tablespace, as per gripe from Michael Kleiser. Choose
a more plausible column order for this view and pg_tables. Update
documentation of these views, which was missed in original patch.
columns. The returned tuple needs to have appropriate NULL columns
inserted so that it actually matches the declared rowtype. It seemed
convenient to use a JunkFilter for this, so I made some cleanups and
simplifications in the JunkFilter code to allow it to support this
additional functionality. (That in turn exposed a latent bug in
nodeAppend.c, which is that it was returning a tuple slot whose
descriptor didn't match its data.) Also, move check_sql_fn_retval
out of pg_proc.c and into functions.c, where it seems to more naturally
belong.
of locking used by REINDEX. REINDEX needs only ShareLock on the parent
table, same as CREATE INDEX, plus an exclusive lock on the specific index
being processed.
as per recent discussions. Invent SubTransactionIds that are managed like
CommandIds (ie, counter is reset at start of each top transaction), and
use these instead of TransactionIds to keep track of subtransaction status
in those modules that need it. This means that a subtransaction does not
need an XID unless it actually inserts/modifies rows in the database.
Accordingly, don't assign it an XID nor take a lock on the XID until it
tries to do that. This saves a lot of overhead for subtransactions that
are only used for error recovery (eg plpgsql exceptions). Also, arrange
to release a subtransaction's XID lock as soon as the subtransaction
exits, in both the commit and abort cases. This avoids holding many
unique locks after a long series of subtransactions. The price is some
additional overhead in XactLockTableWait, but that seems acceptable.
Finally, restructure the state machine in xact.c to have a more orthogonal
set of states for subtransactions.
default tablespace --- they should always go in the database's default
tablespace. Adjust heap_create() API so that it is passed the relkind
to make this easier; should simplify any further tweaking of the same
sort.
so that we close and flush the doomed relation's relcache entry before
we start to delete the underlying catalog rows, rather than afterwards.
For awhile yesterday I thought that an unexpected relcache entry rebuild
partway through this sequence might explain the infrequent parallel
regression failures we were chasing. It doesn't, mainly because there's
no CommandCounterIncrement in the sequence and so the deletions aren't
"really" done yet. But it sure seems like trouble waiting to happen.
presence of dropped columns. Document the already-presumed fact that
eref aliases in relation RTEs are supposed to have entries for dropped
columns; cause the user alias structs to have such entries too, so that
there's always a one-to-one mapping to the underlying physical attnums.
Adjust expandRTE() and related code to handle the case where a column
that is part of a JOIN has been dropped. Generalize expandRTE()'s API
so that it can be used in a couple of places that formerly rolled their
own implementation of the same logic. Fix ruleutils.c to suppress
display of aliases for columns that were dropped since the rule was made.
number of active subtransaction XIDs in each backend's PGPROC entry,
and use this to avoid expensive probes into pg_subtrans during
TransactionIdIsInProgress. Extend EOXactCallback API to allow add-on
modules to get control at subxact start/end. (This is deliberately
not compatible with the former API, since any uses of that API probably
need manual review anyway.) Add basic reference documentation for
SAVEPOINT and related commands. Minor other cleanups to check off some
of the open issues for subtransactions.
Alvaro Herrera and Tom Lane.
password/group files. Also allow read-only subtransactions of a read-write
parent, but not vice versa. These are the reasonably noncontroversial
parts of Alvaro's recent mop-up patch, plus further work on large objects
to minimize use of the TopTransactionResourceOwner.
keep track of portal-related resources separately from transaction-related
resources. This allows cursors to work in a somewhat sane fashion with
nested transactions. For now, cursor behavior is non-subtransactional,
that is a cursor's state does not roll back if you abort a subtransaction
that fetched from the cursor. We might want to change that later.
probably should have been to begin with; this is to cover cases like
needing to recreate the per-db directory during WAL replay.
Also, fix heap_create to force pg_class.reltablespace to be zero instead
of the database's default tablespace; this makes the world safe for
CREATE DATABASE to handle all tables in the default tablespace alike,
as per previous discussion. And force pg_class.reltablespace to zero
when creating a relation without physical storage (eg, a view); this
avoids possibly having dangling references in this column after a
subsequent DROP TABLESPACE.
creation of user-defined tablespaces with names starting with 'pg_', as
per suggestion of Chris K-L. Also install admin-guide tablespace
documentation from Gavin.
There are various things left to do: contrib dbsize and oid2name modules
need work, and so does the documentation. Also someone should think about
COMMENT ON TABLESPACE and maybe RENAME TABLESPACE. Also initlocation is
dead, it just doesn't know it yet.
Gavin Sherry and Tom Lane.
sequences, as per recent discussion. All these names are now of the
form table_column_type, with digits added if needed to make them unique.
Default constraint names are chosen to be unique across their whole schema,
not just within the parent object, so as to be more SQL-spec-compatible
and make the information schema views more useful.
Instead of prohibiting that, put code into ALTER TABLE to reject ALTERs
that would affect other tables' columns. Eventually we will probably
want to extend ALTER TABLE to actually do something useful here, but
in the meantime it seems wrong to forbid the feature completely just
because ALTER isn't fully baked.
when someone attempts to create a column of a composite datatype. For
now, just make sure we produce a reasonable error at the 'right place'.
Not sure if this will be made to work before 7.5, but make it act
reasonably in case nothing more gets done.
of bug report #1150. Also, arrange that the object owner's irrevocable
grant-option permissions are handled implicitly by the system rather than
being listed in the ACL as self-granted rights (which was wrong anyway).
I did not take the further step of showing these permissions in an
explicit 'granted by _SYSTEM' ACL entry, as that seemed more likely to
bollix up existing clients than to do anything really useful. It's still
a possible future direction, though.
rather than an error code, and does elog(ERROR) not elog(WARNING)
when it detects a problem. All callers were simply elog(ERROR)'ing on
failure return anyway, and I find it hard to envision a caller that would
not, so we may as well simplify the callers and produce the more useful
error message directly.
this is an aclmask function and does not have the same return convention
as aclcheck functions. Also adjust the behavior so that users without
CREATE TEMP permission still have USAGE permission on their session's
temp schema. This allows privileged code to create a temp table and
make it accessible to code that's not got the same privilege. (Since
the default permissions on a table are no-access, an explicit grant on
the table will still be needed; but I see no reason that the temp schema
itself should prohibit such access.)
the four functions.
> Also, please justify the temp-related changes. I was not aware that we
> had any breakage there.
patch-tmp-schema.txt contains the following bits:
*) Changes pg_namespace_aclmask() so that the superuser is always able
to create objects in the temp namespace.
*) Changes pg_namespace_aclmask() so that if this is a temp namespace,
objects are only allowed to be created in the temp namespace if the
user has TEMP privs on the database. This encompasses all object
creation, not just TEMP tables.
*) InitTempTableNamespace() checks to see if the current user, not the
session user, has access to create a temp namespace.
The first two changes are necessary to support the third change. Now
it's possible to revoke all temp table privs from non-super users and
limiting all creation of temp tables/schemas via a function that's
executed with elevated privs (security definer). Before this change,
it was not possible to have a setuid function to create a temp
table/schema if the session user had no TEMP privs.
patch-area-path.txt contains:
*) Can now determine the area of a closed path.
patch-dfmgr.txt contains:
*) Small tweak to add the library path that's being expanded.
I was using $lib/foo.so and couldn't easily figure out what the error
message, "invalid macro name in dynamic library path" meant without
looking through the source code. With the path in there, at least I
know where to start looking in my config file.
Sean Chittenden
In the past, we used a 'Lispy' linked list implementation: a "list" was
merely a pointer to the head node of the list. The problem with that
design is that it makes lappend() and length() linear time. This patch
fixes that problem (and others) by maintaining a count of the list
length and a pointer to the tail node along with each head node pointer.
A "list" is now a pointer to a structure containing some meta-data
about the list; the head and tail pointers in that structure refer
to ListCell structures that maintain the actual linked list of nodes.
The function names of the list API have also been changed to, I hope,
be more logically consistent. By default, the old function names are
still available; they will be disabled-by-default once the rest of
the tree has been updated to use the new API names.
permissions tests in about the same amount of code as before. Exactly what
the GRANT/REVOKE code ought to be doing is still up for debate, but this
should be helpful in any case, and it already solves an efficiency problem
in executor startup.
costing us lots more to maintain than it was worth. On shared tables
it was of exactly zero benefit because we couldn't trust it to be
up to date. On temp tables it sometimes saved an lseek, but not often
enough to be worth getting excited about. And the real problem was that
we forced an lseek on every relcache flush in order to update the field.
So all in all it seems best to lose the complexity.
in favor of using the REINDEX TABLE apparatus, which does the same thing
simpler and faster. Also, make TRUNCATE not use cluster.c at all, but
just assign a new relfilenode and REINDEX. This partially addresses
Hartmut Raschick's complaint from last December that 7.4's TRUNCATE is
an order of magnitude slower than prior releases. By getting rid of
a lot of unnecessary catalog updates, these changes buy back about a
factor of two (on my system). The remaining overhead seems associated
with creating and deleting storage files, which we may not be able to
do much about without abandoning transaction safety for TRUNCATE.
* ALTER ... ADD COLUMN with defaults and NOT NULL constraints works per SQL
spec. A default is implemented by rewriting the table with the new value
stored in each row.
* ALTER COLUMN TYPE. You can change a column's datatype to anything you
want, so long as you can specify how to convert the old value. Rewrites
the table. (Possible future improvement: optimize no-op conversions such
as varchar(N) to varchar(N+1).)
* Multiple ALTER actions in a single ALTER TABLE command. You can perform
any number of column additions, type changes, and constraint additions with
only one pass over the table contents.
Basic documentation provided in ALTER TABLE ref page, but some more docs
work is needed.
Original patch from Rod Taylor, additional work from Tom Lane.
'SELECT foo()' in a SQL function returning a rowtype, to simply pass
back the results of another function returning the same rowtype.
However, that hasn't actually worked in many years. Now it works again.
results with tuples as ordinary varlena Datums. This commit does not
in itself do much for us, except eliminate the horrid memory leak
associated with evaluation of whole-row variables. However, it lays the
groundwork for allowing composite types as table columns, and perhaps
some other useful features as well. Per my proposal of a few days ago.
remove separate implementation of ALTER TABLE SET WITHOUT OIDS in favor
of doing a regular DROP. Also, cause CREATE TABLE to account completely
correctly for the inheritance status of the OID column. This fixes
problems with dropping OID columns that have dependencies, as noted by
Christopher Kings-Lynne, as well as making sure that you can't drop an
OID column that was inherited from a parent.
message that is reporting a prechecking error in a SQL function.
This is to cue client-side code that the syntax error position,
if any, is with respect to the function body and not the outer command.
This commit teaches ANALYZE to store such stats in pg_statistic, but
nothing is done yet about teaching the planner to use 'em.
Also, repair longstanding oversight in separate ANALYZE command: it
updated the pg_class.relpages and reltuples counts for the table proper,
but not for indexes.
the relcache, and so the notion of 'blind write' is gone. This should
improve efficiency in bgwriter and background checkpoint processes.
Internal restructuring in md.c to remove the not-very-useful array of
MdfdVec objects --- might as well just use pointers.
Also remove the long-dead 'persistent main memory' storage manager (mm.c),
since it seems quite unlikely to ever get resurrected.
a series of numbers, optionally using an explicit step size other
than the default value (one). Use function in the information_schema
to replace hard-wired knowledge of INDEX_MAX_KEYS. initdb forced due
to pg_proc change. Documentation update still needed -- will be
committed separately.
whereToSendOutput instead because they are really inquiring about
the correct client communication protocol. Update some comments.
This is pointing towards supporting regular FE/BE client protocol
in a standalone backend, per discussion a month or so back.
should not be too eager to reject paths involving unknown schemas, since
it can't really tell whether the schemas exist in the target database.
(Also, when reading pg_dumpall output, it could be that the schemas
don't exist yet, but eventually will.) ALTER USER SET has a similar issue.
So, reduce the normal ERROR to a NOTICE when checking search_path values
for these commands. Supporting this requires changing the API for GUC
assign_hook functions, which causes the patch to touch a lot of places,
but the changes are conceptually trivial.
pointer type when it is not necessary to do so.
For future reference, casting NULL to a pointer type is only necessary
when (a) invoking a function AND either (b) the function has no prototype
OR (c) the function is a varargs function.
parameters to be declared with names. pg_proc has a column to store
names, and CREATE FUNCTION can insert data into it, but that's all as
yet. I need to do more work on the pg_dump and plpgsql portions of the
patch before committing those, but I thought I'd get the bulky changes
in before the tree drifts under me.
initdb forced due to pg_proc change.
- Update comment in IsReservedName() to the present day
- Improve some variable & function names in commands/vacuum.c. I
was planning to rewrite this to avoid lappend(), but since I
still intend to do the list rewrite, there's no need for that.
- Update some smgr comments which seemed to imply that we still
forced all dirty pages to disk at commit-time.
- Replace some #ifdef DIAGNOSTIC code with assertions.
- Make the distinction between OS-level file descriptors and
virtual file descriptors a little clearer in a few comments
- Other minor comment improvements in the smgr code
run the data through cpp, and we know of at least one platform where
unusual cpp behavior breaks the process. So remove the cpp step,
and make consequent simplifications.
showed that for common operator names such as '=', the pallocs done by
this routine occupied a surprisingly large fraction of the total time
for the parser to process an operator.
about whether it is applied before or after eval_const_expressions().
I believe there were some corner cases where the system would fail to
recognize that a partial index is applicable because of the previous
inconsistency. Store normal rather than 'implicit AND' representations
of constraints and index predicates in the catalogs.
initdb forced due to representation change of constraints/predicates.
to certain compile-time options (FUNC_MAX_ARGS, INDEX_MAX_KEYS,
NAMEDATALEN, BLCKSZ, HAVE_INT64_TIMESTAMP). Also added "category",
"short_desc", and "extra_desc" to the pg_settings view. Per recent
discussion here:
http://archives.postgresql.org/pgsql-patches/2003-11/msg00363.php
large objects. Dump all these in pg_dump; also add code to pg_dump
user-defined conversions. Make psql's large object code rely on
the backend for inserting/deleting LOB comments, instead of trying to
hack pg_description directly. Documentation and regression tests added.
Christopher Kings-Lynne, code reviewed by Tom
This first part of the background writer does no syncing at all.
It's only purpose is to keep the LRU heads clean so that regular
backends seldom to never have to call write().
Jan
pghackers proposal of 8-Nov. All the existing cross-type comparison
operators (int2/int4/int8 and float4/float8) have appropriate support.
The original proposal of storing the right-hand-side datatype as part of
the primary key for pg_amop and pg_amproc got modified a bit in the event;
it is easier to store zero as the 'default' case and only store a nonzero
when the operator is actually cross-type. Along the way, remove the
long-since-defunct bigbox_ops operator class.