some concurrent changes Jan was making to the bufmgr. Here's an
updated version of the patch -- it should apply cleanly to CVS
HEAD and passes the regression tests.
This patch makes the following changes:
- remove the UnlockAndReleaseBuffer() and UnlockAndWriteBuffer()
macros, and replace uses of them with calls to the appropriate
functions.
- remove a bunch of #ifdef BMTRACE code: it is ugly & broken
(i.e. it doesn't compile)
- make BufferReplace() return a bool, not an int
- cleanup some logic in bufmgr.c; should be functionality
equivalent to the previous code, just cleaner now
- remove the BM_PRIVATE flag as it is unused
- improve a few comments, etc.
planning to modify them itself. Otherwise we end up with shared RTE
substructure, which breaks inheritance_planner because the rte->inh
flag needs to be independent in each copied subquery. Per bug report
from Chris Piker.
to certain compile-time options (FUNC_MAX_ARGS, INDEX_MAX_KEYS,
NAMEDATALEN, BLCKSZ, HAVE_INT64_TIMESTAMP). Also added "category",
"short_desc", and "extra_desc" to the pg_settings view. Per recent
discussion here:
http://archives.postgresql.org/pgsql-patches/2003-11/msg00363.php
and hash bucket-size estimation. Issue has been there awhile but is more
critical in 7.4 because it affects varchar columns. Per report from
Greg Stark.
on 64-bit Solaris. Use a non-system-dependent datatype for UsedShmemSegID,
namely unsigned long (which we were already assuming could hold a shmem
key anyway, cf RecordSharedMemoryInLockFile).
proposal for eventually deprecating OIDs on user tables that I posted
earlier to pgsql-hackers. pg_dump now always specifies WITH OIDS or
WITHOUT OIDS when dumping a table. The documentation has been updated.
Neil Conway
method control structure, or a table of control structures.
. Use type LOCKMASK where an int is not a counter.
. Get rid of INVALID_TABLEID, use INVALID_LOCKMETHOD instead.
. Use INVALID_LOCKMETHOD instead of (LOCKMETHOD) NULL, because
LOCKMETHOD is not a pointer.
. Define and use macro LockMethodIsValid.
. Rename LOCKMETHOD to LOCKMETHODID.
. Remove global variable LongTermTableId in lmgr.c, because it is
never used.
. Make LockTableId static in lmgr.c, because it is used nowhere else.
Why not remove it and use DEFAULT_LOCKMETHOD?
. Rename the lock method control structure from LOCKMETHODTABLE to
LockMethodData. Introduce a pointer type named LockMethod.
. Remove elog(FATAL) after InitLockTable() call in
CreateSharedMemoryAndSemaphores(), because if something goes wrong,
there is elog(FATAL) in LockMethodTableInit(), and if this doesn't
help, an elog(ERROR) in InitLockTable() is promoted to FATAL.
. Make InitLockTable() void, because its only caller does not use its
return value any more.
. Rename variables in lock.c to avoid statements like
LockMethodTable[NumLockMethods] = lockMethodTable;
lockMethodTable = LockMethodTable[lockmethod];
. Change LOCKMETHODID type to uint16 to fit into struct LOCKTAG.
. Remove static variables BITS_OFF and BITS_ON from lock.c, because
I agree to this doubt:
* XXX is a fetch from a static array really faster than a shift?
. Define and use macros LOCKBIT_ON/OFF.
Manfred Koizar
to note:
1) arttype is numeric. I thought this was the best way of allowing
arbitarily large factorials, even though factorial(2^63) is a large
number. Happy to change to integers if this is overkill.
2) since we're accepting numeric arguments, the patch tests for floats.
If a numeric is passed with non-zero decimal portion, an error is raised
since (from memory) they are undefined.
Gavin Sherry
Ward's report that it can still happen in RC2 forces me to realize that
this is not a can't-happen condition after all, and that the compaction
code had better cope rather than panicking.
Vars created to fill subplan args lists. This is an ancient error, going
back at least to 7.0, but is more easily triggered in 7.4 than before
because we no longer compare varlevelsup when deciding whether a Param
slot can be re-used. Fixes bug reported by Klint Gore.
the hashclauses field of the parent HashJoin. This avoids problems with
duplicated links to SubPlans in hash clauses, as per report from
Andrew Holm-Hansen.
tree for CYCLE option; don't assume zeros are invalid values for sequence
fields other than increment_by; don't reset cache_value when not told to;
simplify code for testing whether to apply defaults.
large objects. Dump all these in pg_dump; also add code to pg_dump
user-defined conversions. Make psql's large object code rely on
the backend for inserting/deleting LOB comments, instead of trying to
hack pg_description directly. Documentation and regression tests added.
Christopher Kings-Lynne, code reviewed by Tom
This first part of the background writer does no syncing at all.
It's only purpose is to keep the LRU heads clean so that regular
backends seldom to never have to call write().
Jan
which had been unintentionally broken by recent changes to tighten up the
DateStyle rules for all-numeric date input. Add documentation and
regression tests for this, too.
pghackers proposal of 8-Nov. All the existing cross-type comparison
operators (int2/int4/int8 and float4/float8) have appropriate support.
The original proposal of storing the right-hand-side datatype as part of
the primary key for pg_amop and pg_amproc got modified a bit in the event;
it is easier to store zero as the 'default' case and only store a nonzero
when the operator is actually cross-type. Along the way, remove the
long-since-defunct bigbox_ops operator class.
Remove the 'strategy map' code, which was a large amount of mechanism
that no longer had any use except reverse-mapping from procedure OID to
strategy number. Passing the strategy number to the index AM in the
first place is simpler and faster.
This is a preliminary step in planned support for cross-datatype index
operations. I'm committing it now since the ScanKeyEntryInitialize()
API change touches quite a lot of files, and I want to commit those
changes before the tree drifts under me.
regression=# select 1 from tenk1 ta cross join tenk1 tb for update;
ERROR: no relation entry for relid 3
7.3 said "SELECT FOR UPDATE cannot be applied to a join", which was better
but still wrong, considering that 7.2 took the query just fine. Fix by
making transformForUpdate() ignore JOIN and other special RTE types,
rather than trying to mark them FOR UPDATE. The actual error message now
only appears if you explicitly name the join in FOR UPDATE.
process the command as though it were issued by the object owner.
This prevents creating weird scenarios in which the same privileges
may appear to flow from different sources, and ensures that a superuser
can in fact revoke all privileges if he wants to. In particular this
means that the regression tests work when run by a superuser other than
the original bootstrap userid. Per report from Larry Rosenman.
rule split the query into one INSERT and one UPDATE where the UPDATE
then hit's the just created row without modifying the key fields again.
In this special case, the new key slipped in totally unchecked.
Jan
ACL array, and force languages to be treated as owned by the bootstrap
user ID. (pg_language should have a lanowner column, but until it does
this will have to do as a workaround.)
subquery that didn't previously have one. We have traditionally made
the caller of ResolveNew responsible for updating the hasSubLinks flag
of the outermost query, but this fails to account for hasSubLinks in
subqueries. Fix ResolveNew to handle this. We might later want to
change the calling convention of ResolveNew so that it can fix the
outer query too, simplifying callers. But I went with the localized
fix for now. Per bug report from J Smith, 20-Oct-03.
fully search-path-proof yet; also, element_types view did not work for
parameters and result types of functions, because it didn't generate
the object_name for the function the same way the data_type_privileges
view does. While at it, centralize dependencies on INDEX_MAX_KEYS/
FUNC_MAX_ARGS into a function returning setof int, so that it will be
easier to fix information_schema for nonstandard values of these
parameters.
Use pg_get_constraintdef instead of pg_constraint.consrc
Use UNION ALL instread of UNION
Make use of regclass type for getting OID of system catalogs
Add schema qualifications where necessary
Fix typos
memory say 'out of shared memory'; some were doing that and some just
said 'out of memory'. Also add a HINT about increasing max_locks_per_transaction
where relevant, per suggestion from Sean Chittenden. (The former change
does not break the strings freeze; the latter does, but I think it's
worth doing anyway.)
protocol, per report from Igor Shevchenko. NOTIFY thought it could
do its thing if transaction blockState is TBLOCK_DEFAULT, but in
reality it had better check the low-level transaction state is
TRANS_DEFAULT as well. Formerly it was not possible to wait for the
client in a state where the first is true and the second is not ...
but now we can have such a state. Minor cleanup in StartTransaction()
as well.
when the pg_class.relhassubclass value is already correct. This should
avoid most cases of the 'tuple concurrently updated' problem that
Robert Creager recently complained about. Also remove a bunch of dead
code in StoreCatalogInheritance() --- it was still computing the complete
list of direct and indirect inheritance ancestors, though that list has
not been needed since we got rid of the pg_ipl catalog.
one side of a binary operator is probably supposed to be the same type
as the other operand' will be applied for domain types. This worked
in 7.3 but was broken in 7.4 due to code rearrangements. Mea culpa.
a single LEFT JOIN query instead of firing the check trigger for each
row individually. Stephan Szabo, with some kibitzing from Tom Lane and
Jan Wieck.
before it is de-backslashed, not after. This allows the null string \N
to be reliably distinguished from the data value \N (which must be
represented as \\N). Per bug report from Manfred Koizar ... but it's
amazing this hasn't been reported before ...
Also, be consistent about encoding conversion for null string: the form
specified in the command is in the server encoding, but what is sent
to/from client must be in client encoding. This never worked quite
right before either.
with required outer parentheses. Breakage seems to be leftover from
domain-constraint patches. This could be smarter about suppressing
extra parens, but at this stage of the release cycle I want certainty
not cuteness.
of function bodies is done at CREATE FUNCTION time. This is normally
true but can be set false to avoid problems with forward references,
wrong schema search path, etc. This is just the backend patch, still
need to adjust pg_dump to make use of it.
to make them comparable to what UpdateStats does in the same situation.
I'm not certain two instances of vac_update_relstats could run in
parallel for the same relation, but parallel invocations of vac_update_dbstats
do seem possible.
in the schema search path. Otherwise pg_dump doesn't correctly dump
scenarios where a custom opclass is created in 'public' and then used
by indexes in other schemas.
discussion on pgsql-hackers: in READ COMMITTED mode we just have to force
a QuerySnapshot update in the trigger, but in SERIALIZABLE mode we have
to run the scan under a current snapshot and then complain if any rows
would be updated/deleted that are not visible in the transaction snapshot.
invalid (has the wrong magic number) until the build is entirely
complete. This turns out to cost no additional writes in the normal
case, since we were rewriting the metapage at the end of the process
anyway. In normal scenarios there's no real gain in security, because
a failed index build would roll back the transaction leaving an unused
index file, but for rebuilding shared system indexes this seems to add
some useful protection.
Before patch:
test=# select pg_get_constraintdef(oid) from pg_constraint;
pg_get_constraintdef
-------------------------------------------------------------------------------------------------
CHECK (VALUE >= 0)
CHECK ((((a)::text = 'asdf'::text) OR ((a)::text = 'fdsa'::text)) OR
((a)::text = 'dfd'::text))
PRIMARY KEY (b)
FOREIGN KEY (a) REFERENCES test2(b)
UNIQUE (b)
(5 rows)
test=# select pg_get_constraintdef(oid, true) from pg_constraint;
pg_get_constraintdef
-----------------------------------------------------------------------------------
CHECK VALUE >= 0
CHECK a::text = 'asdf'::text OR a::text = 'fdsa'::text OR a::text =
'dfd'::text
PRIMARY KEY (b)
FOREIGN KEY (a) REFERENCES test2(b)
UNIQUE (b)
(5 rows)
After patch:
test=# select pg_get_constraintdef(oid) from pg_constraint;
pg_get_constraintdef
-------------------------------------------------------------------------------------------------
CHECK (VALUE >= 0)
CHECK ((((a)::text = 'asdf'::text) OR ((a)::text = 'fdsa'::text)) OR
((a)::text = 'dfd'::text))
PRIMARY KEY (b)
FOREIGN KEY (a) REFERENCES test2(b)
UNIQUE (b)
(5 rows)
test=# select pg_get_constraintdef(oid, true) from pg_constraint;
pg_get_constraintdef
-----------------------------------------------------------------------------------
CHECK (VALUE >= 0)
` CHECK (a::text = 'asdf'::text OR a::text = 'fdsa'::text OR a::text =
'dfd'::text)
PRIMARY KEY (b)
FOREIGN KEY (a) REFERENCES test2(b)
UNIQUE (b)
(5 rows)
It's important that those brackets are there to (a) match all other
constraints and (b) so that people can just copy and paste them and it
will work as SQL.
Christopher Kings-Lynne
post-abort cleanup hooks. I'm surprised that we have not needed this
already, but I need it now to fix a plpgsql problem, and the usefulness
for other dynamically loaded modules seems obvious.
in the RI triggers for ON DELETE/UPDATE SET DEFAULT. The code depended
way too much on knowledge of plan structure, and yet still would fail
if the generated query got rewritten by rules.
every string, especially if some of the output should be fixed-format
machine-readable. This needs to be more carefully sorted out. Also, make
the help message generated by --help-config -h be more similar in style to
the others.
to allow es_snapshot to be set to SnapshotNow rather than a query snapshot.
This solves a bug reported by Wade Klaver, wherein triggers fired as a
result of RI cascade updates could misbehave.
now able to cope with assigning new relfilenode values to nailed-in-cache
indexes, so they can be reindexed using the fully crash-safe method. This
leaves only shared system indexes as special cases. Remove the 'index
deactivation' code, since it provides no useful protection in the shared-
index case. Require reindexing of shared indexes to be done in standalone
mode, but remove other restrictions on REINDEX. -P (IgnoreSystemIndexes)
now prevents using indexes for lookups, but does not disable index updates.
It is therefore safe to allow from PGOPTIONS. Upshot: reindexing system catalogs
can be done without a standalone backend for all cases except
shared catalogs.
since 7.3: 'select array_dims(histogram_bounds) from pg_stats' used to
work and still should. Problem was that code wouldn't take input of
declared type anyarray as matching an anyarray argument. Allow this
case as long as we don't need to determine an element type (which in
practice means as long as anyelement isn't used in the function signature).
difference between INSERT_IN_PROGRESS and DELETE_IN_PROGRESS for
tuples inserted and then deleted by a concurrent transaction.
Example of bug:
regression=# create table foo (f1 int);
CREATE TABLE
regression=# begin;
BEGIN
regression=# insert into foo values(1);
INSERT 195531 1
regression=# delete from foo;
DELETE 1
regression=# insert into foo values(1);
INSERT 195532 1
regression=# create unique index fooi on foo(f1);
ERROR: could not create unique index
DETAIL: Table contains duplicated values.
not just MAXALIGN boundaries. This makes a noticeable difference in
the speed of transfers to and from kernel space, at least on recent
Pentiums, and might help other CPUs too. We should look at making
this happen for local buffers and buffile.c too. Patch from Manfred Spraul.
Per recent discussion, this does not work because other backends can't
reliably see tuples in a temp table and so cannot run the RI checks
correctly. Seems better to disallow this case than go back to accessing
temp tables through shared buffers. Also, disallow FK references to
ON COMMIT DELETE ROWS tables. We already caught this problem for normal
TRUNCATE, but the path used by ON COMMIT didn't check.
reindexing system tables without ignoring system indexes, when the
other two varieties of REINDEX disallow it. Make all three act the same,
and simplify downstream code accordingly.
child tables --- all cases that will trip various sanity checks elsewhere
in the system, as well as cases that should not occur in the only intended
use of this feature, namely coping with ancient pg_dump representation
of views. Per bug report from Chris Pizzi.
really general fix might be difficult, I believe the only case where
AtCommit_Notify could see an uncommitted tuple is where the other guy
has just unlistened and not yet committed. The best solution seems to
be to just skip updating that tuple, on the assumption that the other
guy does not want to hear about the notification anyway. This is not
perfect --- if the other guy rolls back his unlisten instead of committing,
then he really should have gotten this notify. But to do that, we'd have
to wait to see if he commits or not, or make UNLISTEN hold exclusive lock
on pg_listener until commit. Either of these answers is deadlock-prone,
not to mention horrible for interactive performance. Do it this way
for now. (What happened to that project to do LISTEN/NOTIFY in memory
with no table, anyway?)
recent gripe, I discovered not one but two undocumented, undesirable
behaviors of glibc's mktime. So, stop using it entirely, and always
rely on inversion of localtime() to determine the local time zone.
It's not even very much slower, as it turns out that mktime (at least
in the glibc implementation) also does repeated reverse-conversions.
comments/examples in pg_hba.conf. This patch remedies that, adds a brief
explanation of the connection types, and adds a missing period in the
docs.
Jon Jensen
ps status as '[local]', not as 'localhost' as the code has been doing
recently. That's too easily confused with TCP loopback connections,
and there is no good reason to change the behavior anyway.
to create a TCP/IP socket from FATAL to LOG. This was unwise;
historically we have expected socket conflicts to abort postmaster
startup. Conflicts on port numbers with another postmaster can only
be detected reliably at the TCP socket level.
sequence every time it's called is bogus --- it interferes with user
control over the seed, and actually decreases randomness overall
(because a seed based on time(NULL) is pretty predictable). If you really
want a reproducible result from geqo, do 'set seed = 0' before planning
a query.
on either name or inode; otherwise load_external_function() won't do
anything. At least on Linux, it appears that recompiling a shlib leads
to a new file with a different inode, so the old code failed to detect
a match.
pghackers. This fixes the problem recently reported by Markus KrÌutner
(hash bucket split corrupts the state of scans being done concurrently),
and I believe it also fixes all the known problems with deadlocks in
hash index operations. Hash indexes are still not really ready for prime
time (since they aren't WAL-logged), but this is a step forward.
killed items; just skip to the next item immediately. Only check for
key equality when we reach a non-killed item or the end of the index
page. This saves key comparisons when there are lots of killed items,
as for example in a heavily-updated table that's not been vacuumed lately.
Seems to be a win for pgbench anyway.
config file if it exists. This was already discussed as being a good
idea, and now seems the cleanest way to deal with initdb-time failures
on machines with small SHMMAX. (The submitted patches instead modified
initdb.sh to pass the correct sizing parameters, but that would still
leave standalone backends prone to failure later. An admin who needs
to use a standalone backend has enough trouble already, he shouldn't
have to manually configure its shmem settings...)
layout; therefore, this change forces REINDEX of hash indexes (though
not a full initdb). Widen hashm_ntuples to double so that hash space
management doesn't get confused by more than 4G entries; enlarge the
allowed number of free-space-bitmap pages; replace the useless bshift
field with a useful bmshift field; eliminate 4 bytes of wasted space
in the per-page special area.
scheme. A pleasant side effect is that it is *much* faster when deleting
a large fraction of the indexed tuples, because of elimination of
redundant hash_step activity induced by hash_adjscans. Various other
continuing code cleanup.
libpq, talking to an old server, should assume SQL_ASCII as the default
client encoding, because that is what the server will actually use (not
the server encoding).
yet). Fix a couple of bugs that would only appear if multiple bitmap pages
are used, including a buffer reference leak and incorrect computation of bit
indexes. Get rid of 'overflow address' concept, which accomplished nothing
except obfuscating the code and creating a risk of failure due to limited
range of offset field. Rename some misleadingly-named fields and routines,
and improve documentation.
explanation of the remarkably confusing page addressing scheme.
The file also includes my planned-but-not-yet-implemented revision
of the hash index locking scheme.
SQLSTATE error codes required by SQL99 (invalid format, datetime field
overflow, interval field overflow, invalid time zone displacement value).
Also emit a HINT about DateStyle in cases where it seems appropriate.
Per recent gripes.
lumping them into ERRCODE_UNDEFINED_OBJECT/ERRCODE_DUPLICATE_OBJECT.
This seems reasonable since 'object' was meant to refer to 'object in the
database' and a file is outside the database. Per request from Dave
Cramer.
max_connections at initdb time. Get rid of DEF_NBUFFERS and DEF_MAXBACKENDS
macros, which aren't doing anything useful anymore, and put more likely
defaults into postgresql.conf.sample.
perform a timestamp-to-date coercion. Instead both routines share a
subroutine that delivers the parsing result as a struct tm. This avoids
problems with timezone dependency of to_date's result, and should be
at least marginally faster too.
handling many-way scans: instead of re-evaluating all prior indexscan
quals to see if a tuple has been fetched more than once, use a hash table
indexed by tuple CTID. But fall back to the old way if the hash table
grows to exceed SortMem.
as well as the hash function (formerly the comparison function was hardwired
as memcmp()). This makes it possible to eliminate the special-purpose
hashtable management code in execGrouping.c in favor of using dynahash to
manage tuple hashtables; which is a win because dynahash knows how to expand
a hashtable when the original size estimate was too small, whereas the
special-purpose code was too stupid to do that. (See recent gripe from
Stephan Szabo about poor performance when hash table size estimate is way
off.) Free side benefit: when using string_hash, the default comparison
function is now strncmp() instead of memcmp(). This should eliminate some
part of the overhead associated with larger NAMEDATALEN values.
be anything yielding an array of the proper kind, not only sub-ARRAY[]
constructs; do subscript checking at runtime not parse time. Also,
adjust array_cat to make array || array comply with the SQL99 spec.
Joe Conway
datatype by array_eq and array_cmp; use this to solve problems with memory
leaks in array indexing support. The parser's equality_oper and ordering_oper
routines also use the cache. Change the operator search algorithms to look
for appropriate btree or hash index opclasses, instead of assuming operators
named '<' or '=' have the right semantics. (ORDER BY ASC/DESC now also look
at opclasses, instead of assuming '<' and '>' are the right things.) Add
several more index opclasses so that there is no regression in functionality
for base datatypes. initdb forced due to catalog additions.
"syslog" option.)
By the way: The "virtual_host" parameter is a bad name for that
particular option, I think. "Virtual host" signals that PostgreSQL will
behave differently according to which IP address it's contacted (like
Apache's virtual host support which makes the web-server serve different
sites according to different criteria). A better word for the options
would be "tcpip_listen_addr" or something like that.
Troels Arvin
via extended query protocol, because it sends Sync right after Execute
without realizing that the command to be executed is COPY. There seems
to be no reasonable way for it to realize that, either, so the best fix
seems to be to make the backend ignore Sync during copy-in mode. Bit of
a wart on the protocol, but little alternative. Also, libpq must send
another Sync after terminating the COPY, if the command was issued via
Execute.
just before CommitTransactionCommand(). This is a more sensible place
to put it since commit discards a lot of contexts, and we'd not find
out about stomps affecting only transaction-local contexts.
target columns in INSERT and UPDATE targetlists. Don't rely on resname
to be accurate in ruleutils, either. This fixes bug reported by
Donald Fraser, in which renaming a column referenced in a rule did not
work very well.
yet, though). Avoid using nth() to fetch tlist entries; provide a
common routine get_tle_by_resno() to search a tlist for a particular
resno. This replaces a couple uses of nth() and a dozen hand-coded
search loops. Also, replace a few uses of nth(length-1, list) with
llast().
index pages: when _bt_getbuf asks the FSM for a free index page, it is
possible (and, in some cases, even moderately likely) that the answer
will be the same page that _bt_split is trying to split. _bt_getbuf
already knew that the returned page might not be free, but it wasn't
prepared for the possibility that even trying to lock the page could
be problematic. Fix by doing a conditional rather than unconditional
grab of the page lock.
encountered; per bug report from Christian van der Leeden 8/7/03.
Also, adjust larger/smaller routines (MAX/MIN) to share code with
comparisons for timestamp, interval, timetz.
subplan it starts with, as they may be needed at upper join levels.
See comments added to code for the non-obvious reason why. Per bug report
from Robert Creager.
writing one more value into return arrays than will fit. This is
potentially a stack smash, though I do not think it is a problem in
current uses of the routine, since a failure return causes elog anyway.
in HAVE_INT64_TIMESTAMP cases, including two potential stack smashes
when more than six fractional digits were supplied. Per bug report
from Philipp Reisner.
and send() very well at all; and in any case we can't use retval==0
for EOF due to race conditions. Make the same fixes in the backend as
are required in libpq.
that OS=hpux is the same as CPU=hppa. First steps at doing this.
With these patches, we still work on hppa with either gcc or HP's cc.
We might work on hpux/itanium with gcc, but I can't test it. Definitely
will not work on hpux/itanium with non-gcc compiler, for lack of spinlock
code.
float8smaller, float8larger (and thereby the MIN/MAX aggregates on these
datatypes) to agree with the datatypes' comparison operations as
regards NaN handling. In all these datatypes, NaN is arbitrarily
considered larger than any normal value ... but MIN/MAX had not gotten
the word. Per recent discussion on pgsql-sql.
error, if any input element is NULL. This is not what we ultimately want,
but until arrays can have NULL elements, it will have to do. Patch from
Joe Conway.
bottom. Otherwise we fail to moveright when the root page was split while
we were "in flight" to it. This is not a significant problem when the root
is above the leaf level, but if the root was also a leaf (ie, a single-page
index just got split) we may return the wrong leaf page to the caller,
resulting in failure to find a key that is in fact present. Bug has existed
at least since 7.1, probably forever.
CREATE TABLE (or ALTER TABLE SET DEFAULT), rather than postponing it to
the time that the default is inserted into an INSERT command by the
rewriter. This reverses an old decision that was intended to make the
world safe for writing
f1 timestamp default 'now'
but in fact merely made the failure modes subtle rather than obvious.
Per recent trouble report and followup discussion.
initdb forced since there is a chance that stored default expressions
will change.
heuristic determination of day vs month in date/time input. Add the
ability to specify that input is interpreted as yy-mm-dd order (which
formerly worked, but only for yy greater than 31). DateStyle's input
component now has the preferred spellings DMY, MDY, or YMD; the older
keywords European and US are now aliases for the first two of these.
Per recent discussions on pgsql-general.
materialized.
New items have been added to GucContext and GucSource enums, but of
course they were not added to the corresponding GucContextName[] and
GucSourceName[] arrays in the patch. Here's a new patch to fix the
resulting bugs.
Joe Conway
I'd have to disagree with regards to the memory leaks not being worth
a mention - any such leak can cause problems when the PostgreSQL
installation is either unattended, long-living andor has very high
connection levels. Half a kilobyte on start-up isn't negligible in
this light.
Regards, Lee.
Tom Lane writes:
> Lee Kindness <lkindness@csl.co.uk> writes:
> > Guys, attached is a patch to fix two memory leaks on start-up.
>
> I do not like the changes to miscinit.c. In the first place, it is not
> a "memory leak" to do a one-time allocation of state for a proc_exit
> function. A bigger complaint is that your proposed change introduces
> fragile coupling between CreateLockFile and its callers, in order to
> save no resources worth mentioning. More, it introduces an assumption
> that the globals directoryLockFile and socketLockFile don't change while
> the postmaster is running. UnlinkLockFile should unlink the file that
> it was originally told to unlink, regardless of what happens to those
> globals.
>
> If you are intent on spending code to free stuff just before the
> postmaster exits, a better fix would be for UnlinkLockFile to free its
> string argument after using it.
Lee Kindness
>>ISTM that "source" is worth knowing.
>
> Hm, possibly. Any other opinions?
This version has the seven fields I proposed, including "source". Here's
an example that shows why I think it's valuable:
regression=# \x
Expanded display is on.
regression=# select * from pg_settings where name = 'enable_seqscan';
-[ RECORD 1 ]-----------
name | enable_seqscan
setting | on
context | user
vartype | bool
source | default
min_val |
max_val |
regression=# update pg_settings set setting = 'off' where name =
'enable_seqscan';
-[ RECORD 1 ]---
set_config | off
regression=# select * from pg_settings where name = 'enable_seqscan';
-[ RECORD 1 ]-----------
name | enable_seqscan
setting | off
context | user
vartype | bool
source | session
min_val |
max_val |
regression=# alter user postgres set enable_seqscan to 'off';
ALTER USER
(log out and then back in again)
regression=# \x
Expanded display is on.
regression=# select * from pg_settings where name = 'enable_seqscan';
-[ RECORD 1 ]-----------
name | enable_seqscan
setting | off
context | user
vartype | bool
source | user
min_val |
max_val |
In the first case, enable_seqscan is set to its default value. After
setting it to off, it is obvious that the value has been changed for the
session only. In the third case, you can see that the value has been set
specifically for the user.
Joe Conway
modes (and replace the requiressl boolean). The four options were first
spelled out by Magnus Hagander <mha@sollentuna.net> on 2000-08-23 in email
to pgsql-hackers, archived here:
http://archives.postgresql.org/pgsql-hackers/2000-08/msg00639.php
My original less-flexible patch and the ensuing thread are archived at:
http://dbforums.com/t623845.html
Attached is a new patch, including documentation.
To sum up, there's a new client parameter "sslmode" and environment
variable "PGSSLMODE", with these options:
sslmode description
------- -----------
disable Unencrypted non-SSL only
allow Negotiate, prefer non-SSL
prefer Negotiate, prefer SSL (default)
require Require SSL
The only change to the server is a new pg_hba.conf line type,
"hostnossl", for specifying connections that are not allowed to use SSL
(for example, to prevent servers on a local network from accidentally
using SSL and wasting cycles). Thus the 3 pg_hba.conf line types are:
pg_hba.conf line types
----------------------
host applies to either SSL or regular connections
hostssl applies only to SSL connections
hostnossl applies only to regular connections
These client and server options, the postgresql.conf ssl = false option,
and finally the possibility of compiling with no SSL support at all,
make quite a range of combinations to test. I threw together a test
script to try many of them out. It's in a separate tarball with its
config files, a patch to psql so it'll announce SSL connections even in
absence of a tty, and the test output. The test is especially informative
when run on the same tty the postmaster was started on, so the FATAL:
errors during negotiation are interleaved with the psql client output.
I saw Tom write that new submissions for 7.4 have to be in before midnight
local time, and since I'm on the east coast in the US, this just makes it
in before the bell. :)
Jon Jensen
was modified for IPv6. Use a robust definition of struct sockaddr_storage,
do a proper configure test to see if ss_len exists, don't assume that
getnameinfo() will handle AF_UNIX sockets, don't trust getaddrinfo to
return the protocol we ask for, etc. This incorporates several outstanding
patches from Kurt Roeckx, but I'm to blame for anything that doesn't
work ...
database, emit a WARNING and do nothing, rather than raising ERROR.
Per recent discussion in which we concluded this is the best way to deal
with database dumps that are reloaded into a database of a new name.
fixed incorrect initial setting of StartUpID. The logic in XLogWrite()
expects that Write->curridx is advanced to the next page as soon as
LogwrtResult points to the end of the current page, but StartupXLOG()
failed to make that happen when the old WAL ended exactly on a page
boundary. Per trouble report from Hannu Krosing.
for the sign of timezone offsets, ie, positive is east from UTC. These
were previously out of step with other operations that accept or show
timezones, such as I/O of timestamptz values.
query node, since that won't work unless the planner is upgraded.
Someday we should try to support at least some cases of this, but for
now just plug the hole in the dike. Per discussion with Dmitry Tkach.
shared_buffers and max_connections values to use before we run the
bootstrap process. Without this, initdb would fail on platforms where
the hardwired default values are too large. (We could get around that
by making the hardwired defaults tiny, perhaps, but why slow down
bootstrap by starving it for buffers...)
and 100 respectively, if the platform will allow it. initdb selects
values that are not too large to allow the postmaster to start, and
places these values in the installed postgresql.conf file. This allows
us to continue to start up out-of-the-box on platforms with small SHMMAX,
while having somewhat-realistic default settings on platforms with
reasonable SHMMAX. Per recent pghackers discussion.
without needing a running backend. Reorder postgresql.conf.sample
to match new layout of runtime.sgml. This commit re-adds work lost
in Wednesday's crash.
instead of the former kluge whereby gram.y emitted already-transformed
expressions. This is needed so that Params appearing in these clauses
actually work correctly. I suppose some might claim that the side effect
of 'SELECT ... LIMIT 2+2' working is a new feature, but I say this is
a bug fix.
It also works to create a non-polymorphic aggregate from polymorphic
functions, should you want to do that. Regression test added, docs still
lacking. By Joe Conway, with some kibitzing from Tom Lane.
ANYELEMENT. The effect is to postpone typechecking of the function
body until runtime. Documentation is still lacking.
Original patch by Joe Conway, modified to postpone type checking
by Tom Lane.
node emits only those vars that are actually needed above it in the
plan tree. (There were comments in the code suggesting that this was
done at some point in the dim past, but for a long time we have just
made join nodes emit everything that either input emitted.) Aside from
being marginally more efficient, this fixes the problem noted by Peter
Eisentraut where a join above an IN-implemented-as-join might fail,
because the subplan targetlist constructed in the latter case didn't
meet the expectation of including everything.
Along the way, fix some places that were O(N^2) in the targetlist
length. This is not all the trouble spots for wide queries by any
means, but it's a step forward.
'scalar op ALL (array)', where the operator is applied between the
lefthand scalar and each element of the array. The operator must
yield boolean; the result of the construct is the OR or AND of the
per-element results, respectively.
Original coding by Joe Conway, after an idea of Peter's. Rewritten
by Tom to keep the implementation strictly separate from subqueries.
comparison functions), replacing the highly bogus bitwise array_eq. Create
a btree index opclass for ANYARRAY --- it is now possible to create indexes
on array columns.
Arrange to cache the results of catalog lookups across multiple array
operations, instead of repeating the lookups on every call.
Add string_to_array and array_to_string functions.
Remove singleton_array, array_accum, array_assign, and array_subscript
functions, since these were for proof-of-concept and not intended to become
supported functions.
Minor adjustments to behavior in some corner cases with empty or
zero-dimensional arrays.
Joe Conway (with some editorializing by Tom Lane).
HH:MM:SS.SSS... when there is a nonzero part-of-a-day field in an
interval value. The seconds part used to be suppressed if zero,
but there's no equivalent behavior for timestamp, and since we're
modeling this format on timestamp it's probably wrong. Per complaint
and patch from Larry Rosenman.
This is no longer necessary or appropriate since we don't use zero typeid
as a wildcard anymore, and it fixes a nasty performance problem with
functions with many parameters. Per recent example from Reuven Lerner.
The attached fixes select_common_type() to support the below case:
create table t1( c1 int);
create domain dom_c1 int;
create table t2(c1 dom_c1);
select * from t1 join t2 using( c1 );
I didn't see a need for maintaining the domain as the preferred type. A
simple getBaseType() call on all elements of the list seems to be
enough.
--
Rod Taylor <rbt@rbt.ca>
- LIKE <subtable> [ INCLUDING DEFAULTS | EXCLUDING DEFAULTS ]
- Quick cleanup of analyze.c function prototypes.
- New non-reserved keywords (INCLUDING, EXCLUDING, DEFAULTS), SQL 200X
Opted not to extend for check constraints at this time.
As per the definition that it's user defined columns, OIDs are NOT
inherited.
Doc and Source patches attached.
--
Rod Taylor <rbt@rbt.ca>
> http://developer.postgresql.org/cvsweb.cgi/pgsql-server/src/include/libpq/pqcomm.h.diff?r1=1.85&r2=1.86
>
> modified SockAddr, but no corresponding change was made here
> (fe-auth.c:612):
>
> case AUTH_REQ_KRB5:
> #ifdef KRB5
> if (pg_krb5_sendauth(PQerrormsg, conn->sock, &conn->laddr.in,
> &conn->raddr.in,
> hostname) != STATUS_OK)
>
> It's not obvious to me what the change ought to be though.
This patch should hopefully fix both kerberos 4 and 5.
Kurt Roeckx
>> actually having updated the tuple, [...] can we simply
>> set the HEAP_XMAX_INVALID hint bit of the tuple?
>
>AFAICS this is a reasonable thing to do.
Thanks for the confirmation. Here's a patch which also contains some
more noncritical changes to tqual.c:
. make code more readable by introducing local variables for xvac
. no longer two separate branches for aborted and crashed.
The actions were the same in all cases.
Manfred Koizar
restructures the deferred trigger queue. The fundamental change is to
put all the static variables to hold the deferred triggers in a single
structure.
Alvaro Herrera
Regression tests for IPv6 operations added.
Documentation updated to document IPv6 bits.
Stop treating IPv4 as an "unsigned int" and IPv6 as an array of
characters. Instead, always use the array of characters so we
can have one function fits all. This makes bitncmp(), addressOK(),
and several other functions "just work" on both address families.
add family() function which returns integer 4 or 6 for IPv4 or
IPv6. (See examples below) Note that to add this new function
you will need to dump/initdb/reload or find the correct magic
to add the function to the postgresql function catalogs.
IPv4 addresses always sort before IPv6.
On disk we use AF_INET for IPv4, and AF_INET+1 for IPv6 addresses.
This prevents the need for a dump and reload, but lets IPv6 parsing
work on machines without AF_INET6.
To select all IPv4 addresses from a table:
select * from foo where family(addr) = 4 ...
Order by and other bits should all work.
Michael Graff
specific hash functions used by hash indexes, rather than the old
not-datatype-aware ComputeHashFunc routine. This makes it safe to do
hash joining on several datatypes that previously couldn't use hashing.
The sets of datatypes that are hash indexable and hash joinable are now
exactly the same, whereas before each had some that weren't in the other.
character in identifiers. The first change eliminates the current need
to put spaces around parameter references, as in "x<=$2". The second
change improves compatibility with Oracle and some other RDBMSes. This
was discussed and agreed to back in January, but did not get done.
silently resolving them to type TEXT. This is comparable to what we
do when faced with UNKNOWN in CASE, UNION, and other contexts. It gets
rid of this and related annoyances:
select distinct f1, '' from int4_tbl;
ERROR: Unable to identify an ordering operator '<' for type unknown
This was discussed many moons ago, but no one got round to fixing it.
some cases of redundant clauses that were formerly not caught. We have
to special-case this because the clauses involved never get attached to
the same join restrictlist and so the existing logic does not notice
that they are redundant.
both clauses specify the same targets, rather than always using the
default ordering operator. This allows 'GROUP BY foo ORDER BY foo DESC'
to be done with only one sort step.
---------------------------------------------------------------------------
here is a patch that allows CIDR netmasks in pg_hba.conf. It allows two
address/mask forms:
. address/maskbits, or
. address netmask (as now)
If the patch is accepted I will submit a documentation patch to cover
it.
This is submitted by agreement with Kurt Roeckx, who has worked on a
patch that covers this and other IPv6 issues.
address/mask forms:
. address/maskbits, or
. address netmask (as now)
If the patch is accepted I will submit a documentation patch to cover
it.
This is submitted by agreement with Kurt Roeckx, who has worked on a
patch that covers this and other IPv6 issues.
Andrew Dunstan
free'd for every transaction or statement, respectively. This patch
puts these data structures into static memory, thus saving a few CPU
cycles and two malloc calls per transaction or (in isolation level
READ COMMITTED) per query.
Manfred Koizar
least-recently-used strategy from clog.c into slru.c. It doesn't
change any visible behaviour and passes all regression tests plus a
TruncateCLOG test done manually.
Apart from refactoring I made a little change to SlruRecentlyUsed,
formerly ClogRecentlyUsed: It now skips incrementing lru_counts, if
slotno is already the LRU slot, thus saving a few CPU cycles. To make
this work, lru_counts are initialised to 1 in SimpleLruInit.
SimpleLru will be used by pg_subtrans (part of the nested transactions
project), so the main purpose of this patch is to avoid future code
duplication.
Manfred Koizar
some reading on the subject.
1) PostgreSQL uses ephemeral keying, for its connections (good thing)
2) PostgreSQL doesn't set the cipher list that it allows (bad thing,
fixed)
3) PostgreSQL's renegotiation code wasn't text book correct (could be
bad, fixed)
4) The rate of renegotiating was insanely low (as Tom pointed out, set
to a more reasonable level)
I haven't checked around much to see if there are any other SSL bits
that need some review, but I'm doing some OpenSSL work right now
and'll send patches for improvements along the way (if I find them).
At the very least, the changes in this patch will make security folks
happier for sure. The constant renegotiation of sessions was likely a
boon to systems that had bad entropy gathering means (read: Slowaris
/dev/rand|/dev/urand != ANDIrand). The new limit for renegotiations
is 512MB which should be much more reasonable.
Sean Chittenden
I'd placed the check for newly created matching pk rows for on update no
action earlier than it needed to be so that it'd check even when the key
values hadn't changed. This patch moves it to after checking for NULLs
in the old row and comparing the values since the select's probably more
expensive.
Stephan Szabo
protocol 3, then falls back to 2 if postmaster rejects the startup packet
with an old-format error message. A side benefit of the rewrite is that
SSL-encrypted connections can now be made without blocking. (I think,
anyway, but do not have a good way to test.)
catalog lookups when not in a transaction. This prevents bizarre
failures if someone tries to set a value for session_authorization in
postgresql.conf. Per report from Fernando Nasser.
extensions to support our historical behavior. An aggregate belongs
to the closest query level of any of the variables in its argument,
or the current query level if there are no variables (e.g., COUNT(*)).
The implementation involves adding an agglevelsup field to Aggref,
and treating outer aggregates like outer variables at planning time.
when the plan is ReScanned, we don't have to rebuild the hash table
if there is no parameter change for its child node. This idea has
been used for a long time in Sort and Material nodes, but was not in
the hash code till now.
yy_fatal_error() call results in elog(ERROR) not exit(). This was
already fixed in the main lexer and plpgsql, but extend same technique
to all the other dot-l files. Also, on review of the possible calls
to yy_fatal_error(), it seems safe to use elog(ERROR) not elog(FATAL).
leave it for the kernel to do after the process dies. This allows clients
to wait for the backend to exit if they wish (after sending X message,
wait till EOF is detected on the socket).