Commit Graph

5664 Commits

Author SHA1 Message Date
Simon Riggs
9e95c9ad55 Create new errcode for recovery conflict caused by db drop on master.
Previously reported as ERRCODE_ADMIN_SHUTDOWN, this case is now
reported as ERRCODE_T_R_DATABASE_DROPPED. No message text change.
Unlikely to happen on most servers, so low impact change to allow
session poolers to correctly handle this situation.

Tatsuo Ishii, edits by me, review by Robert Haas
2011-02-01 00:20:53 +00:00
Heikki Linnakangas
997b48ed96 Support multiple concurrent pg_basebackup backups.
With this patch, pg_basebackup doesn't write a backup_label file in the
data directory, so it doesn't interfere with a pg_start/stop_backup() based
backup anymore. backup_label is still included in the backup, but it is
injected directly into the tar stream.

Heikki Linnakangas, reviewed by Fujii Masao and Magnus Hagander.
2011-01-31 18:25:39 +02:00
Andrew Dunstan
48c9de8028 Fix typo 2011-01-30 20:34:05 -05:00
Andrew Dunstan
91812df4ed Enable building with the Mingw64 compiler.
This can be used to build 64 bit Windows binaries, not only on 64 bit
Windows but on supported cross-compiling hosts including 32 bit Windows,
Cygwin, Darwin and Linux.
2011-01-30 19:56:46 -05:00
Magnus Hagander
507069de6d Add option to include WAL in base backup
When included, this makes the base backup a complete working
"clone" of the initial database, ready to have a postmaster
started against it without the need to set up any log archiving
or similar.

Magnus Hagander, reviewed by Fujii Masao and Heikki Linnakangas
2011-01-30 21:30:09 +01:00
Peter Eisentraut
6fe5e4e63e autoreconf
Synchronize pg_config.h.in with configure.in (someone must have
forgotten to run autoheader or autoreconf), and clean up some spurious
change in configure introduced by the last commit there.
2011-01-27 01:19:45 +02:00
Tom Lane
bd1ad1b019 Replace pg_class.relhasexclusion with pg_index.indisexclusion.
There isn't any need to track this state on a table-wide basis, and trying
to do so introduces undesirable semantic fuzziness.  Move the flag to
pg_index, where it clearly describes just a single index and can be
immutable after index creation.
2011-01-25 17:51:59 -05:00
Tom Lane
88452d5ba6 Implement ALTER TABLE ADD UNIQUE/PRIMARY KEY USING INDEX.
This feature allows a unique or pkey constraint to be created using an
already-existing unique index.  While the constraint isn't very
functionally different from the bare index, it's nice to be able to do that
for documentation purposes.  The main advantage over just issuing a plain
ALTER TABLE ADD UNIQUE/PRIMARY KEY is that the index can be created with
CREATE INDEX CONCURRENTLY, so that there is not a long interval where the
table is locked against updates.

On the way, refactor some of the code in DefineIndex() and index_create()
so that we don't have to pass through those functions in order to create
the index constraint's catalog entries.  Also, in parse_utilcmd.c, pass
around the ParseState pointer in struct CreateStmtContext to save on
notation, and add error location pointers to some error reports that didn't
have one before.

Gurjeet Singh, reviewed by Steve Singer and Tom Lane
2011-01-25 15:43:05 -05:00
Magnus Hagander
e5487f65fd Make walsender options order-independent
While doing this, also move base backup options into
a struct instead of increasing the number of parameters
to multiple functions for each new option.
2011-01-23 23:39:18 +01:00
Magnus Hagander
048d148fe6 Add pg_basebackup tool for streaming base backups
This tool makes it possible to do the pg_start_backup/
copy files/pg_stop_backup step in a single command.

There are still some steps to be done before this is a
complete backup solution, such as the ability to stream
the required WAL logs, but it's still usable, and
could do with some buildfarm coverage.

In passing, make the checkpoint request optionally
fast instead of hardcoding it.

Magnus Hagander, reviewed by Fujii Masao and Dimitri Fontaine
2011-01-23 12:21:23 +01:00
Robert Haas
6f59777c65 Code cleanup for assign_transaction_read_only.
As in commit fb4c5d2798 on 2011-01-21,
this avoids spurious debug messages and allows idempotent changes at
any time.  Along the way, make assign_XactIsoLevel allow idempotent
changes even when not within a subtransaction, to be consistent with
the new coding of assign_transaction_read_only and because there's
no compelling reason to do otherwise.

Kevin Grittner, with some adjustments.
2011-01-22 20:55:50 -05:00
Robert Haas
fb4c5d2798 Code cleanup for assign_XactIsoLevel.
The new coding avoids a spurious debug message when a transaction
that has changed the isolation level has been rolled back.  It also
allows the property to be freely changed to the current value within
a subtransaction.

Kevin Grittner, with one small change by me.
2011-01-21 21:49:19 -05:00
Robert Haas
8ceb245680 Make ALTER TABLE revalidate uniqueness and exclusion constraints.
Failure to do so can lead to constraint violations.  This was broken by
commit 1ddc2703a9 on 2010-02-07, so
back-patch to 9.0.

Noah Misch.  Regression test by me.
2011-01-20 22:44:10 -05:00
Tom Lane
6ca452ba7f Move a couple of declarations to reflect where the routines really are. 2011-01-15 16:09:05 -05:00
Heikki Linnakangas
8f5d65e916 Treat a WAL sender process that hasn't started streaming yet as a regular
backend, as far as the postmaster shutdown logic is concerned. That means,
fast shutdown will wait for WAL sender processes to exit before signaling
bgwriter to finish. This avoids race conditions between a base backup stopping
or starting, and bgwriter writing the shutdown checkpoint WAL record. We don't
want e.g the end-of-backup WAL record to be written after the shutdown
checkpoint.
2011-01-15 16:38:21 +02:00
Magnus Hagander
fcd810c69a Use a lexer and grammar for parsing walsender commands
Makes it easier to parse mainly the BASE_BACKUP command
with it's options, and avoids having to manually deal
with quoted identifiers in the label (previously broken),
and makes it easier to add new commands and options in
the future.

In passing, refactor the case statement in the walsender
to put each command in it's own function.
2011-01-14 16:30:33 +01:00
Magnus Hagander
688423d004 Exit from base backups when shutdown is requested
When the exit waits until the whole backup completes, it may take
a very long time.

In passing, add back an error check in the main loop so we detect
clients that disconnect much earlier if the backup is large.
2011-01-14 12:36:45 +01:00
Tom Lane
52948169bc Code review for postmaster.pid contents changes.
Fix broken test for pre-existing postmaster, caused by wrong code for
appending lines to the lockfile; don't write a failed listen_address
setting into the lockfile; don't arbitrarily change the location of the
data directory in the lockfile compared to previous releases; provide more
consistent and useful definitions of the socket path and listen_address
entries; avoid assuming that pg_ctl has the same DEFAULT_PGSOCKET_DIR as
the postmaster; assorted code style improvements.
2011-01-13 19:01:28 -05:00
Tom Lane
d487afbb81 Fix PlanRowMark/ExecRowMark structures to handle inheritance correctly.
In an inherited UPDATE/DELETE, each target table has its own subplan,
because it might have a column set different from other targets.  This
means that the resjunk columns we add to support EvalPlanQual might be
at different physical column numbers in each subplan.  The EvalPlanQual
rewrite I did for 9.0 failed to account for this, resulting in possible
misbehavior or even crashes during concurrent updates to the same row,
as seen in a recent report from Gordon Shannon.  Revise the data structure
so that we track resjunk column numbers separately for each subplan.

I also chose to move responsibility for identifying the physical column
numbers back to executor startup, instead of assuming that numbers derived
during preprocess_targetlist would stay valid throughout subsequent
massaging of the plan.  That's a bit slower, so we might want to consider
undoing it someday; but it would complicate the patch considerably and
didn't seem justifiable in a bug fix that has to be back-patched to 9.0.
2011-01-12 20:47:02 -05:00
Magnus Hagander
4c8e20f815 Track walsender state in shared memory and expose in pg_stat_replication 2011-01-11 21:25:28 +01:00
Magnus Hagander
0eb59c4591 Backend support for streaming base backups
Add BASE_BACKUP command to walsender, allowing it to stream a
base backup to the client (in tar format). The syntax is still
far from ideal, that will be fixed in the switch to use a proper
grammar for walsender.

No client included yet, will come as a separate commit.

Magnus Hagander and Heikki Linnakangas
2011-01-10 14:04:19 +01:00
Magnus Hagander
4448917d51 Split pg_start_backup() and pg_stop_backup() into two pieces
Move the actual functionality into a separate function that's
easier to call internally, and change the SQL-callable function
to be a wrapper calling this.

Also create a pg_abort_backup() function, only callable internally,
that does only the most vital parts of pg_stop_backup(), making it
safe(r) to call from error handlers.
2011-01-09 21:00:28 +01:00
Tom Lane
52fd2d65a3 Fix up core tsquery GIN support for new extractQuery API.
No need for the empty-prefix-match kluge to force a full scan anymore.
2011-01-09 14:34:50 -05:00
Magnus Hagander
db4d22d0ef Add pgreadlink() on Windows to read junction points
Add support for reading back information about the symbolic
links we've created with pgsymlink(), which are actually
Junction Points. Just like pgsymlink() can only create directory
symlinks, pgreadlink() can only read directory symlinks.
2011-01-09 15:09:19 +01:00
Tom Lane
adf328c0e1 Add array_contains_nulls() function in arrayfuncs.c.
This will support fixing contrib/intarray (and probably other places)
so that they don't have to fail on arrays that contain a null bitmap
but no live null entries.
2011-01-08 20:26:14 -05:00
Tom Lane
7e2f906201 Remove pg_am.amindexnulls.
The only use we have had for amindexnulls is in determining whether an
index is safe to cluster on; but since the addition of the amclusterable
flag, that usage is pretty redundant.

In passing, clean up assorted sloppiness from the last patch that touched
pg_am.h: Natts_pg_am was wrong, and ambuildempty was not documented.
2011-01-08 16:08:05 -05:00
Tom Lane
56a57473a9 Refactor GIN's handling of duplicate search entries.
The original coding could combine duplicate entries only when they
originated from the same qual condition.  In particular it could not
combine cases where multiple qual conditions all give rise to full-index
scan requests, which is an expensive case well worth optimizing.  Refactor
so that duplicates are recognized across all the quals.
2011-01-08 14:48:08 -05:00
Tom Lane
a032d50128 Fix the built-in GIN support procedure declarations in pg_proc.h.
Add more "internal" arguments so that these pg_proc entries reflect the
current preferred API.  This is purely a cosmetic change, since GIN doesn't
actually consult the pg_proc entry when calling a support function.
Accordingly, no catversion bump.
2011-01-07 20:40:48 -05:00
Tom Lane
73912e7fbd Fix GIN to support null keys, empty and null items, and full index scans.
Per my recent proposal(s).  Null key datums can now be returned by
extractValue and extractQuery functions, and will be stored in the index.
Also, placeholder entries are made for indexable items that are NULL or
contain no keys according to extractValue.  This means that the index is
now always complete, having at least one entry for every indexed heap TID,
and so we can get rid of the prohibition on full-index scans.  A full-index
scan is implemented much the same way as partial-match scans were already:
we build a bitmap representing all the TIDs found in the index, and then
drive the results off that.

Also, introduce a concept of a "search mode" that can be requested by
extractQuery when the operator requires matching to empty items (this is
just as cheap as matching to a single key) or requires a full index scan
(which is not so cheap, but it sure beats failing or giving wrong answers).
The behavior remains backward compatible for opclasses that don't return
any null keys or request a non-default search mode.

Using these features, we can now make the GIN index opclass for anyarray
behave in a way that matches the actual anyarray operators for &&, <@, @>,
and = ... which it failed to do before in assorted corner cases.

This commit fixes the core GIN code and ginarrayprocs.c, updates the
documentation, and adds some simple regression test cases for the new
behaviors using the array operators.  The tsearch and contrib GIN opclass
support functions still need to be looked over and probably fixed.

Another thing I intend to fix separately is that this is pretty inefficient
for cases where more than one scan condition needs a full-index search:
we'll run duplicate GinScanEntrys, each one of which builds a large bitmap.
There is some existing logic to merge duplicate GinScanEntrys but it needs
refactoring to make it work for entries belonging to different scan keys.

Note that most of gin.h has been split out into a new file gin_private.h,
so that gin.h doesn't export anything that's not supposed to be used by GIN
opclasses or the rest of the backend.  I did quite a bit of other code
beautification work as well, mostly fixing comments and choosing more
appropriate names for things.
2011-01-07 19:16:24 -05:00
Robert Haas
9b4271deb9 Document pg_stat_replication, bump catversion since that was overlooked.
Itagaki Takahiro, edited by me.
2011-01-07 11:06:55 -05:00
Itagaki Takahiro
a755ea33ae New system view pg_stat_replication displays activity of wal sender processes.
Itagaki Takahiro and Simon Riggs.
2011-01-07 20:35:38 +09:00
Magnus Hagander
66a8a0428d Give superusers REPLIACTION permission by default
This can be overriden by using NOREPLICATION on the CREATE ROLE
statement, but by default they will have it, making it backwards
compatible and "less surprising" (given that superusers normally
override all checks).
2011-01-05 14:24:17 +01:00
Robert Haas
7f60be72b0 Fix crash in ALTER OPERATOR CLASS/FAMILY .. SET SCHEMA.
In the previous coding, the parser emitted a List containing a C string,
which is no good, because copyObject() can't handle it.

Dimitri Fontaine
2011-01-03 22:08:55 -05:00
Magnus Hagander
77745cc7f1 Bump catversion, forgot in previous commit. 2011-01-03 12:50:30 +01:00
Magnus Hagander
40d9e94bd7 Add views and functions to monitor hot standby query conflicts
Add the view pg_stat_database_conflicts and a column to pg_stat_database,
and the underlying functions to provide the information.
2011-01-03 12:46:03 +01:00
Peter Eisentraut
39b8843296 Implement remaining fields of information_schema.sequences view
Add new function pg_sequence_parameters that returns a sequence's start,
minimum, maximum, increment, and cycle values, and use that in the view.
(bug #5662; design suggestion by Tom Lane)

Also slightly adjust the view's column order and permissions after review of
SQL standard.
2011-01-02 15:15:21 +02:00
Robert Haas
0d692a0dc9 Basic foreign table support.
Foreign tables are a core component of SQL/MED.  This commit does
not provide a working SQL/MED infrastructure, because foreign tables
cannot yet be queried.  Support for foreign table scans will need to
be added in a future patch.  However, this patch creates the necessary
system catalog structure, syntax support, and support for ancillary
operations such as COMMENT and SECURITY LABEL.

Shigeru Hanada, heavily revised by Robert Haas
2011-01-01 23:48:11 -05:00
Bruce Momjian
5d950e3b0c Stamp copyrights for year 2011. 2011-01-01 13:18:15 -05:00
Bruce Momjian
30aeda4394 Include the first valid listen address in pg_ctl to improve server start
"wait" detection and add postmaster start time to help determine if the
postmaster is actually using the specified data directory.
2010-12-31 17:25:02 -05:00
Tom Lane
7b46401557 Move symbols for ExecMergeJoin's state machine into nodeMergejoin.c.
There's no reason for these values to be known anywhere else.  After
doing this, executor/execdefs.h is vestigial and can be removed.
2010-12-30 22:12:40 -05:00
Tom Lane
f4e4b32743 Support RIGHT and FULL OUTER JOIN in hash joins.
This is advantageous first because it allows us to hash the smaller table
regardless of the outer-join type, and second because hash join can be more
flexible than merge join in dealing with arbitrary join quals in a FULL
join.  For merge join all the join quals have to be mergejoinable, but hash
join will work so long as there's at least one hashjoinable qual --- the
others can be any condition.  (This is true essentially because we don't
keep per-inner-tuple match flags in merge join, while hash join can do so.)

To do this, we need a has-it-been-matched flag for each tuple in the
hashtable, not just one for the current outer tuple.  The key idea that
makes this practical is that we can store the match flag in the tuple's
infomask, since there are lots of bits there that are of no interest for a
MinimalTuple.  So we aren't increasing the size of the hashtable at all for
the feature.

To write this without turning the hash code into even more of a pile of
spaghetti than it already was, I rewrote ExecHashJoin in a state-machine
style, similar to ExecMergeJoin.  Other than that decision, it was pretty
straightforward.
2010-12-30 20:26:08 -05:00
Alvaro Herrera
55573990ca Avoid unnecessary public struct declaration in slru.h
Instead, declare a public wrapper of the sole function using it for
external callers, so that they don't have to always pass a NULL
argument.

Author: Kevin Grittner
2010-12-30 12:09:17 -03:00
Robert Haas
d2bc1c9907 Bump XLOG_PAGE_MAGIC.
The unlogged tables patch (commit 53dbc27c62,
2010-12-29) should have done this, since it changes the format of an
XLOG_SMGR_CREATE record.
2010-12-29 07:19:21 -05:00
Robert Haas
53dbc27c62 Support unlogged tables.
The contents of an unlogged table are WAL-logged; thus, they are not
available on standby servers and are truncated whenever the database
system enters recovery.  Indexes on unlogged tables are also unlogged.
Unlogged GiST indexes are not currently supported.
2010-12-29 06:48:53 -05:00
Magnus Hagander
9b8aff8c19 Add REPLICATION privilege for ROLEs
This privilege is required to do Streaming Replication, instead of
superuser, making it possible to set up a SR slave that doesn't
have write permissions on the master.

Superuser privileges do NOT override this check, so in order to
use the default superuser account for replication it must be
explicitly granted the REPLICATION permissions. This is backwards
incompatible change, in the interest of higher default security.
2010-12-29 11:05:03 +01:00
Tom Lane
f2ba1e994c Avoid unexpected conversion overflow in planner for distant date values.
The "date" type supports a wider range of dates than int64 timestamps do.
However, there is pre-int64-timestamp code in the planner that assumes that
all date values can be converted to timestamp with impunity.  Fortunately,
what we really need out of the conversion is always a double (float8)
value; so even when the date is out of timestamp's range it's possible to
produce a sane answer.  All we need is a code path that doesn't try to
force the result into int64.  Per trouble report from David Rericha.

Back-patch to all supported versions.  Although this is surely a corner
case, there's not much point in advertising a date range wider than
timestamp's if we will choke on such values in unexpected places.
2010-12-28 22:49:57 -05:00
Tom Lane
84fc571395 Rename the C functions bitand(), bitor() to bit_and(), bit_or().
This is to avoid use of the C++ keywords "bitand" and "bitor" in
the header file utils/varbit.h.  Note the functions' SQL-level
names are not changed, only their C-level names.

In passing, make some comments in varbit.c conform to project-standard
layout.
2010-12-27 14:57:41 -05:00
Tom Lane
37b61a69f3 Fix failure of executor/hashjoin.h to compile standalone.
Noted while experimenting with cpluspluscheck.
2010-12-27 12:20:09 -05:00
Tom Lane
275411912d Fix ill-chosen use of "private" as an argument and struct field name.
"private" is a keyword in C++, so this breaks the poorly-enforced policy
that header files should be include-able in C++ code.  Per report from
Craig Ringer and some investigation with cpluspluscheck.
2010-12-27 11:26:19 -05:00
Robert Haas
63676ebff4 Corrections to patch adding SQL/MED error codes.
My previous commit, 85cff3ce7f on
2010-12-25, failed to update errcodes.sgml or plerrcodes.h.  This patch
corrects that oversight, per a gripe from Tom Lane, and also corrects
a typographical error.
2010-12-26 21:35:25 -05:00
Andrew Dunstan
a534728afb Only build in crashdump support on Windows if there's a working dbghelp.h. 2010-12-26 10:34:47 -05:00
Robert Haas
85cff3ce7f Add foreign data wrapper error code values for SQL/MED.
Extracted from a much larger patch by Shigeru Hanada.
2010-12-25 13:57:39 -05:00
Heikki Linnakangas
9de3aa65f0 Rewrite the GiST insertion logic so that we don't need the post-recovery
cleanup stage to finish incomplete inserts or splits anymore. There was two
reasons for the cleanup step:

1. When a new tuple was inserted to a leaf page, the downlink in the parent
needed to be updated to contain (ie. to be consistent with) the new key.
Updating the parent in turn might require recursively updating the parent of
the parent. We now handle that by updating the parent while traversing down
the tree, so that when we insert the leaf tuple, all the parents are already
consistent with the new key, and the tree is consistent at every step.

2. When a page is split, we need to insert the downlink for the new right
page(s), and update the downlink for the original page to not include keys
that moved to the right page(s). We now handle that by setting a new flag,
F_FOLLOW_RIGHT, on the non-rightmost pages in the split. When that flag is
set, scans always follow the rightlink, regardless of the NSN mechanism used
to detect concurrent page splits. That way the tree is consistent right after
split, even though the downlink is still missing. This is very similar to the
way B-tree splits are handled. When the downlink is inserted in the parent,
the flag is cleared. To keep the insertion algorithm simple, when an
insertion sees an incomplete split, indicated by the F_FOLLOW_RIGHT flag, it
finishes the split before doing anything else.

These changes allow removing the whole "invalid tuple" mechanism, but I
retained the scan code to still follow invalid tuples correctly. While we
don't create any such tuples anymore, we want to handle them gracefully in
case you pg_upgrade a GiST index that has them. If we encounter any on an
insert, though, we just throw an error saying that you need to REINDEX.

The issue that got me into doing this is that if you did a checkpoint while
an insert or split was in progress, and the checkpoint finishes quickly so
that there is no WAL record related to the insert between RedoRecPtr and the
checkpoint record, recovery from that checkpoint would not know to finish
the incomplete insert. IOW, we have the same issue we solved with the
rm_safe_restartpoint mechanism during normal operation too. It's highly
unlikely to happen in practice, and this fix is far too large to backpatch,
so we're just going to live with in previous versions, but this refactoring
fixes it going forward.

With this patch, you don't get the annoying
'index "FOO" needs VACUUM or REINDEX to finish crash recovery' notices
anymore if you crash at an unfortunate moment.
2010-12-23 16:21:47 +02:00
Magnus Hagander
dcb09b595f Support for collecting crash dumps on Windows
Add support for collecting "minidump" style crash dumps on
Windows, by setting up an exception handling filter. Crash
dumps will be generated in PGDATA/crashdumps if the directory
is created (the existance of the directory is used as on/off
switch for the generation of the dumps).

Craig Ringer and Magnus Hagander
2010-12-19 16:45:28 +01:00
Tom Lane
61b53695fb Remove optreset from src/port/ implementations of getopt and getopt_long.
We don't actually need optreset, because we can easily fix the code to
ensure that it's cleanly restartable after having completed a scan over the
argv array; which is the only case we need to restart in.  Getting rid of
it avoids a class of interactions with the system libraries and allows
reversion of my change of yesterday in postmaster.c and postgres.c.

Back-patch to 8.4.  Before that the getopt code was a bit different anyway.
2010-12-16 16:23:05 -05:00
Itagaki Takahiro
03db44eae3 Add pg_read_binary_file() and whole-file-at-once versions of pg_read_file().
One of the usages of the binary version is to read files in a different
encoding from the server encoding.

Dimitri Fontaine and Itagaki Takahiro.
2010-12-16 06:56:28 +09:00
Robert Haas
34c70c7ac4 Instrument checkpoint sync calls.
Greg Smith, reviewed by Jeff Janes
2010-12-14 09:26:19 -05:00
Robert Haas
d368e1a2a7 Allow plugins to suppress inlining and hook function entry/exit/abort.
This is intended as infrastructure to allow an eventual SE-Linux plugin to
support trusted procedures.

KaiGai Kohei
2010-12-13 19:15:53 -05:00
Robert Haas
5f7b58fad8 Generalize concept of temporary relations to "relation persistence".
This commit replaces pg_class.relistemp with pg_class.relpersistence;
and also modifies the RangeVar node type to carry relpersistence rather
than istemp.  It also removes removes rd_istemp from RelationData and
instead performs the correct computation based on relpersistence.

For clarity, we add three new macros: RelationNeedsWAL(),
RelationUsesLocalBuffers(), and RelationUsesTempNamespace(), so that we
can clarify the purpose of each check that previous depended on
rd_istemp.

This is intended as infrastructure for the upcoming unlogged tables
patch, as well as for future possible work on global temporary tables.
2010-12-13 12:34:26 -05:00
Tom Lane
5132ad8bdf Make S_IRGRP etc available in mingw builds as well as MSVC.
(Hm, I wonder whether BCC defines them either...)

Also label dangling endifs a bit better in this area.
2010-12-12 13:43:44 -05:00
Tom Lane
1319002e2e Provide a complete set of file-permission-bit macros in win32.h.
My previous patch exposed the fact that we didn't have these.  Those
hard-wired octal constants were actually wrong on Windows, not just
inconsistent.
2010-12-11 13:11:18 -05:00
Robert Haas
d3d414696f Allow bidirectional copy messages in streaming replication mode.
Fujii Masao.  Review by Alvaro Herrera, Tom Lane, and myself.
2010-12-11 09:27:37 -05:00
Tom Lane
671199929d Move a couple of initdb's subroutines into src/port/.
mkdir_p and check_data_dir will be useful in CREATE TABLESPACE, since we
have agreed that that command should handle subdirectory creation just like
initdb creates the PGDATA directory.  Push them into src/port/ so that they
are available to both initdb and the backend.  Rename to pg_mkdir_p and
pg_check_dir, just to be on the safe side.  Add FreeBSD's copyright notice
to pgmkdirp.c, since that's where the code came from originally (this
really should have been in initdb.c).  Very marginal code/comment cleanup.
2010-12-10 19:42:44 -05:00
Tom Lane
576477e73c Force default wal_sync_method to be fdatasync on Linux.
Recent versions of the Linux system header files cause xlogdefs.h to
believe that open_datasync should be the default sync method, whereas
formerly fdatasync was the default on Linux.  open_datasync is a bad
choice, first because it doesn't actually outperform fdatasync (in fact
the reverse), and second because we try to use O_DIRECT with it, causing
failures on certain filesystems (e.g., ext4 with data=journal option).
This part of the patch is largely per a proposal from Marti Raudsepp.
More extensive changes are likely to follow in HEAD, but this is as much
change as we want to back-patch.

Also clean up confusing code and incorrect documentation surrounding the
fsync_writethrough option.  Those changes shouldn't result in any actual
behavioral change, but I chose to back-patch them anyway to keep the
branches looking similar in this area.

In 9.0 and HEAD, also do some copy-editing on the WAL Reliability
documentation section.

Back-patch to all supported branches, since any of them might get used
on modern Linux versions.
2010-12-08 20:01:09 -05:00
Simon Riggs
e620ee35b2 Optimize commit_siblings in two ways to improve group commit.
First, avoid scanning the whole ProcArray once we know there
are at least commit_siblings active; second, skip the check
altogether if commit_siblings = 0.

Greg Smith
2010-12-08 18:48:03 +00:00
Heikki Linnakangas
5a031a5556 Fix bugs in the hot standby known-assigned-xids tracking logic. If there's
an old transaction running in the master, and a lot of transactions have
started and finished since, and a WAL-record is written in the gap between
the creating the running-xacts snapshot and WAL-logging it, recovery will fail
with "too many KnownAssignedXids" error. This bug was reported by
Joachim Wieland on Nov 19th.

In the same scenario, when fewer transactions have started so that all the
xids fit in KnownAssignedXids despite the first bug, a more serious bug
arises. We incorrectly initialize the clog code with the oldest still running
transaction, and when we see the WAL record belonging to a transaction with
an XID larger than one that committed already before the checkpoint we're
recovering from, we zero the clog page containing the already committed
transaction, leading to data loss.

In hindsight, trying to track xids in the known-assigned-xids array before
seeing the running-xacts record was too complicated. To fix that, hold
XidGenLock while the running-xacts snapshot is taken and WAL-logged. That
ensures that no transaction can begin or end in that gap, so that in recvoery
we know that the snapshot contains all transactions running at that point in
WAL.
2010-12-07 09:23:30 +01:00
Tom Lane
e194a942f9 Update comment to match later code changes. 2010-12-04 03:21:49 -05:00
Tom Lane
554506871b KNNGIST, otherwise known as order-by-operator support for GIST.
This commit represents a rather heavily editorialized version of
Teodor's builtin_knngist_itself-0.8.2 and builtin_knngist_proc-0.8.1
patches.  I redid the opclass API to add a separate Distance method
instead of turning the Consistent method into an illogical mess,
fixed some bit-rot in the rbtree interfaces, and generally worked over
the code style and comments.

There's still no non-code documentation to speak of, but I'll work on
that separately.  Some contrib-module changes are also yet to come
(right now, point <-> point is the only KNN-ified operator).

Teodor Sigaev and Tom Lane
2010-12-03 20:53:29 -05:00
Robert Haas
970a18687f Use GUC lexer for recovery.conf parsing.
This eliminates some crufty, special-purpose code and, as a non-trivial
side benefit, allows recovery.conf parameters to be unquoted.

Dimitri Fontaine, with review and cleanup by Alvaro Herrera, Itagaki
Takahiro, and me.
2010-12-03 08:56:44 -05:00
Tom Lane
d583f10b7e Create core infrastructure for KNNGIST.
This is a heavily revised version of builtin_knngist_core-0.9.  The
ordering operators are no longer mixed in with actual quals, which would
have confused not only humans but significant parts of the planner.
Instead, ordering operators are carried separately throughout planning and
execution.

Since the API for ambeginscan and amrescan functions had to be changed
anyway, this commit takes the opportunity to rationalize that a bit.
RelationGetIndexScan no longer forces a premature index_rescan call;
instead, callers of index_beginscan must call index_rescan too.  Aside from
making the AM-side initialization logic a bit less peculiar, this has the
advantage that we do not make a useless extra am_rescan call when there are
runtime key values.  AMs formerly could not assume that the key values
passed to amrescan were actually valid; now they can.

Teodor Sigaev and Tom Lane
2010-12-02 20:51:37 -05:00
Tom Lane
c0b5fac701 Simplify and speed up mapping of index opfamilies to pathkeys.
Formerly we looked up the operators associated with each index (caching
them in relcache) and then the planner looked up the btree opfamily
containing such operators in order to build the btree-centric pathkey
representation that describes the index's sort order.  This is quite
pointless for btree indexes: we might as well just use the index's opfamily
information directly.  That saves syscache lookup cycles during planning,
and furthermore allows us to eliminate the relcache's caching of operators
altogether, which may help in reducing backend startup time.

I added code to plancat.c to perform the same type of double lookup
on-the-fly if it's ever faced with a non-btree amcanorder index AM.
If such a thing actually becomes interesting for production, we should
replace that logic with some more-direct method for identifying the
corresponding btree opfamily; but it's not worth spending effort on now.

There is considerably more to do pursuant to my recent proposal to get rid
of sort-operator-based representations of sort orderings, but this patch
grabs some of the low-hanging fruit.  I'll look at the remainder of that
work after the current commitfest.
2010-11-29 12:30:43 -05:00
Simon Riggs
ed78384acd Move call to GetTopTransactionId() earlier in LockAcquire(),
removing an infrequently occurring race condition in Hot Standby.
An xid must be assigned before a lock appears in shared memory,
rather than immediately after, else GetRunningTransactionLocks()
may see InvalidTransactionId, causing assertion failures during
lock processing on standby.

Bug report and diagnosis by Fujii Masao, fix by me.
2010-11-29 01:08:02 +00:00
Robert Haas
55109313f9 Add more ALTER <object> .. SET SCHEMA commands.
This adds support for changing the schema of a conversion, operator,
operator class, operator family, text search configuration, text search
dictionary, text search parser, or text search template.

Dimitri Fontaine, with assorted corrections and other kibitzing.
2010-11-26 17:31:54 -05:00
Robert Haas
cc1ed40d57 Object access hook framework, with post-creation hook.
After a SQL object is created, we provide an opportunity for security
or logging plugins to get control; for example, a security label provider
could use this to assign an initial security label to newly created
objects.  The basic infrastructure is (hopefully) reusable for other types
of events that might require similar treatment.

KaiGai Kohei, with minor adjustments.
2010-11-25 11:50:13 -05:00
Bruce Momjian
ba11258ccb When reporting the server as not responding, if the hostname was
supplied, also print the IP address.  This allows IPv4 and IPv6 failures
to be distinguished.  Also useful when a hostname resolves to multiple
IP addresses.

Also, remove use of inet_ntoa() and use our own inet_net_ntop() in all
places, including in libpq, because it is thread-safe.
2010-11-24 17:04:19 -05:00
Tom Lane
725d52d0c2 Create the system catalog infrastructure needed for KNNGIST.
This commit adds columns amoppurpose and amopsortfamily to pg_amop, and
column amcanorderbyop to pg_am.  For the moment all the entries in
amcanorderbyop are "false", since the underlying support isn't there yet.

Also, extend the CREATE OPERATOR CLASS/ALTER OPERATOR FAMILY commands with
[ FOR SEARCH | FOR ORDER BY sort_operator_family ] clauses to allow the new
columns of pg_amop to be populated, and create pg_dump support for dumping
that information.

I also added some documentation, although it's perhaps a bit premature
given that the feature doesn't do anything useful yet.

Teodor Sigaev, Robert Haas, Tom Lane
2010-11-24 14:22:17 -05:00
Peter Eisentraut
f2a4278330 Propagate ALTER TYPE operations to typed tables
This adds RESTRICT/CASCADE flags to ALTER TYPE ... ADD/DROP/ALTER/
RENAME ATTRIBUTE to control whether to alter typed tables as well.
2010-11-23 22:50:17 +02:00
Peter Eisentraut
fc946c39ae Remove useless whitespace at end of lines 2010-11-23 22:34:55 +02:00
Robert Haas
44475e782f Centralize some ALTER <whatever> .. SET SCHEMA checks.
Any flavor of ALTER <whatever> .. SET SCHEMA fails if (1) the object
is already in the new schema, (2) either the old or new schema is
a temp schema, or (3) either the old or new schema is the TOAST schema.

Extraced from a patch by Dimitri Fontaine, with additional hacking by me.
2010-11-22 19:53:34 -05:00
Robert Haas
506070be34 Bump catversion. Should have done this as part of format(text) patch. 2010-11-21 06:34:42 -05:00
Robert Haas
7504870778 Add new SQL function, format(text).
Currently, three conversion format specifiers are supported: %s for a
string, %L for an SQL literal, and %I for an SQL identifier.  The latter
two are deliberately designed not to overlap with what sprintf() already
supports, in case we want to add more of sprintf()'s functionality here
later.

Patch by Pavel Stehule, heavily revised by me.  Reviewed by Jeff Janes
and, in earlier versions, by Itagaki Takahiro and Tom Lane.
2010-11-20 22:33:27 -05:00
Tom Lane
89a368418c Further cleanup of indxpath logic related to IndexOptInfo.opfamily array.
We no longer need the terminating zero entry in opfamily[], so get rid of
it.  Also replace assorted ad-hoc looping logic with simple for and foreach
constructs.  This code is now noticeably more readable than it was an hour
ago; credit to Robert for seeing that it could be simplified.
2010-11-20 15:07:16 -05:00
Robert Haas
4343c0e546 Expose quote_literal_cstr() from core.
This eliminates the need for inefficient implementions of this
functionality in both contrib/dblink and contrib/tablefunc, so remove
them.  The upcoming patch implementing an in-core format() function
will also require this functionality.

In passing, add some regression tests.
2010-11-20 10:04:48 -05:00
Robert Haas
4fc115b2e9 Speed up conversion of signed integers to C strings.
A hand-coded implementation turns out to be much faster than calling
printf().  In passing, add a few more regresion tests.

Andres Freund, with assorted, mostly cosmetic changes.
2010-11-19 22:13:11 -05:00
Tom Lane
0f61d4dd1b Improve relation width estimation for subqueries.
As per the ancient comment for set_rel_width, it really wasn't much good
for relations that aren't plain tables: it would never find any stats and
would always fall back on datatype-based estimates, which are often pretty
silly.  Fix that by copying up width estimates from the subquery planning
process.

At some point we might want to do this for CTEs too, but that would be a
significantly more invasive patch because the sub-PlannerInfo is no longer
accessible by the time it's needed.  I refrained from doing anything about
that, partly for fear of breaking the unmerged CTE-related patches.

In passing, also generate less bogus width estimates for whole-row Vars.

Per a gripe from Jon Nelson.
2010-11-19 17:31:50 -05:00
Alvaro Herrera
6cc2deb86e Add pg_describe_object function
This function is useful to obtain textual descriptions of objects as
stored in pg_depend.
2010-11-18 17:06:19 -03:00
Tom Lane
6fbc323c80 Further fallout from the MergeAppend patch.
Fix things so that top-N sorting can be used in child Sort nodes of a
MergeAppend node, when there is a LIMIT and no intervening joins or
grouping.  Actually doing this on the executor side isn't too bad,
but it's a bit messier to get the planner to cost it properly.
Per gripe from Robert Haas.

In passing, fix an oversight in the original top-N-sorting patch:
query_planner should not assume that a LIMIT can be used to make an
explicit sort cheaper when there will be grouping or aggregation in
between.  Possibly this should be back-patched, but I'm not sure the
mistake is serious enough to be a real problem in practice.
2010-11-18 00:30:10 -05:00
Tom Lane
511e902b51 Make TRUNCATE ... RESTART IDENTITY restart sequences transactionally.
In the previous coding, we simply issued ALTER SEQUENCE RESTART commands,
which do not roll back on error.  This meant that an error between
truncating and committing left the sequences out of sync with the table
contents, with potentially bad consequences as were noted in a Warning on
the TRUNCATE man page.

To fix, create a new storage file (relfilenode) for a sequence that is to
be reset due to RESTART IDENTITY.  If the transaction aborts, we'll
automatically revert to the old storage file.  This acts just like a
rewriting ALTER TABLE operation.  A penalty is that we have to take
exclusive lock on the sequence, but since we've already got exclusive lock
on its owning table, that seems unlikely to be much of a problem.

The interaction of this with usual nontransactional behaviors of sequence
operations is a bit weird, but it's hard to see what would be completely
consistent.  Our choice is to discard cached-but-unissued sequence values
both when the RESTART is executed, and at rollback if any; but to not touch
the currval() state either time.

In passing, move the sequence reset operations to happen before not after
any AFTER TRUNCATE triggers are fired.  The previous ordering was not
logically sensible, but was forced by the need to minimize inconsistency
if the triggers caused an error.  Transactional rollback is a much better
solution to that.

Patch by Steve Singer, rather heavily adjusted by me.
2010-11-17 16:42:18 -05:00
Heikki Linnakangas
2edc5cd493 The GiST scan algorithm uses LSNs to detect concurrent pages splits, but
temporary indexes are not WAL-logged. We used a constant LSN for temporary
indexes, on the assumption that we don't need to worry about concurrent page
splits in temporary indexes because they're only visible to the current
session. But that assumption is wrong, it's possible to insert rows and
split pages in the same session, while a scan is in progress. For example,
by opening a cursor and fetching some rows, and INSERTing new rows before
fetching some more.

Fix by generating fake increasing LSNs, used in place of real LSNs in
temporary GiST indexes.
2010-11-16 11:32:21 +02:00
Robert Haas
3134d8863e Add new buffers_backend_fsync field to pg_stat_bgwriter.
This new field counts the number of times that a backend which writes a
buffer out to the OS must also fsync() it.  This happens when the
bgwriter fsync request queue is full, and is generally detrimental to
performance, so it's good to know when it's happening.  Along the way,
log a new message at level DEBUG1 whenever we fail to hand off an fsync,
so that the problem can also be seen in examination of log files
(if the logging level is cranked up high enough).

Greg Smith, with minor tweaks by me.
2010-11-15 12:42:59 -05:00
Robert Haas
20cf8ae478 Fix copy-and-pasteo a little more completely.
copydir.c is no longer in src/port
2010-11-15 10:10:58 -05:00
Alvaro Herrera
ae4b17edee Fix copy-and-pasteo. 2010-11-15 11:52:56 -03:00
Robert Haas
11e482c350 Move copydir() prototype into its own header file.
Having this in src/include/port.h makes no sense, now that copydir.c lives
in src/backend/strorage rather than src/port.  Along the way, remove an
obsolete comment from contrib/pg_upgrade that makes reference to the old
location.
2010-11-12 16:39:53 -05:00
Robert Haas
7ba6e4f0e0 Add monitoring function pg_last_xact_replay_timestamp.
Fujii Masao, with a little wordsmithing by me.
2010-11-09 22:52:19 -05:00
Itagaki Takahiro
844ed5dc97 Don't use __declspec (dllimport) for PGDLLEXPORT to reduce warnings
by gcc version 4 on mingw and cygwin. We don't use dllexport here
because dllexport and dllwrap don't work well together.
2010-11-10 12:17:43 +09:00
Tom Lane
947d0c862c Use appendrel planning logic for top-level UNION ALL structures.
Formerly, we could convert a UNION ALL structure inside a subquery-in-FROM
into an appendrel, as a side effect of pulling up the subquery into its
parent; but top-level UNION ALL always caused use of plan_set_operations().
That didn't matter too much because you got an Append-based plan either
way.  However, now that the appendrel code can do things with MergeAppend,
it's worthwhile to hack up the top-level case so it also uses appendrels.

This is a bit of a stopgap; but going much further than this will require
a major rewrite of the planner's set-operations support, which I'm not
prepared to undertake now.  For the moment let's grab the low-hanging fruit.
2010-11-08 15:15:02 -05:00
Tom Lane
034967bdcb Reimplement planner's handling of MIN/MAX aggregate optimization.
Per my recent proposal, get rid of all the direct inspection of indexes
and manual generation of paths in planagg.c.  Instead, set up
EquivalenceClasses for the aggregate argument expressions, and let the
regular path generation logic deal with creating paths that can satisfy
those sort orders.  This makes planagg.c a bit more visible to the rest
of the planner than it was originally, but the approach is basically a lot
cleaner than before.  A major advantage of doing it this way is that we get
MIN/MAX optimization on inheritance trees (using MergeAppend of indexscans)
practically for free, whereas in the old way we'd have had to add a whole
lot more duplicative logic.

One small disadvantage of this approach is that MIN/MAX aggregates can no
longer exploit partial indexes having an "x IS NOT NULL" predicate, unless
that restriction or something that implies it is specified in the query.
The previous implementation was able to use the added "x IS NOT NULL"
condition as an extra predicate proof condition, but in this version we
rely entirely on indexes that are considered usable by the main planning
process.  That seems a fair tradeoff for the simplicity and functionality
gained.
2010-11-04 12:01:17 -04:00
Tom Lane
0811ff2063 Avoid using a local FunctionCallInfoData struct in ExecMakeFunctionResult
and related routines.

We already had a redundant FunctionCallInfoData struct in FuncExprState,
but were using that copy only in set-returning-function cases, to avoid
keeping function evaluation state in the expression tree for the benefit
of plpgsql's "simple expression" logic.  But of course that didn't work
anyway.  Given the recent fixes in plpgsql there is no need to have two
separate behaviors here.  Getting rid of the local FunctionCallInfoData
structs should make things a little faster (because we don't need to do
InitFunctionCallInfoData each time), and it also makes for a noticeable
reduction in stack space consumption during recursive calls.
2010-11-01 13:54:21 -04:00
Tom Lane
186cbbda8f Provide hashing support for arrays.
The core of this patch is hash_array() and associated typcache
infrastructure, which works just about exactly like the existing support
for array comparison.

In addition I did some work to ensure that the planner won't think that an
array type is hashable unless its element type is hashable, and similarly
for sorting.  This includes adding a datatype parameter to op_hashjoinable
and op_mergejoinable, and adding an explicit "hashable" flag to
SortGroupClause.  The lack of a cross-check on the element type was a
pre-existing bug in mergejoin support --- but it didn't matter so much
before, because if you couldn't sort the element type there wasn't any good
alternative to failing anyhow.  Now that we have the alternative of hashing
the array type, there are cases where we can avoid a failure by being picky
at the planner stage, so it's time to be picky.

The issue of exactly how to combine the per-element hash values to produce
an array hash is still open for discussion, but the rest of this is pretty
solid, so I'll commit it as-is.
2010-10-30 21:56:11 -04:00
Tom Lane
14231a41a9 Avoid creation of useless EquivalenceClasses during planning.
Zoltan Boszormenyi exhibited a test case in which planning time was
dominated by construction of EquivalenceClasses and PathKeys that had no
actual relevance to the query (and in fact got discarded immediately).
This happened because we generated PathKeys describing the sort ordering of
every index on every table in the query, and only after that checked to see
if the sort ordering was relevant.  The EC/PK construction code is O(N^2)
in the number of ECs, which is all right for the intended number of such
objects, but it gets out of hand if there are ECs for lots of irrelevant
indexes.

To fix, twiddle the handling of mergeclauses a little bit to ensure that
every interesting EC is created before we begin path generation.  (This
doesn't cost anything --- in fact I think it's a bit cheaper than before
--- since we always eventually created those ECs anyway.)  Then, if an
index column can't be found in any pre-existing EC, we know that that sort
ordering is irrelevant for the query.  Instead of creating a useless EC,
we can just not build a pathkey for the index column in the first place.
The index will still be considered if it's useful for non-order-related
reasons, but we will think of its output as unsorted.
2010-10-29 11:52:50 -04:00
Robert Haas
20709f8136 Add a client authentication hook.
KaiGai Kohei, with minor cleanup of the comments by me.
2010-10-26 21:20:38 -04:00
Itagaki Takahiro
bf76ad07fe Fix typos "are are". 2010-10-26 17:15:17 +09:00
Peter Eisentraut
35670340f5 Refactor typenameTypeId()
Split the old typenameTypeId() into two functions: A new typenameTypeId() that
returns only a type OID, and typenameTypeIdAndMod() that returns type OID and
typmod.  This isolates call sites better that actually care about the typmod.
2010-10-25 21:44:49 +03:00
Tom Lane
84c123be1d Allow new values to be added to an existing enum type.
After much expenditure of effort, we've got this to the point where the
performance penalty is pretty minimal in typical cases.

Andrew Dunstan, reviewed by Brendan Jurd, Dean Rasheed, and Tom Lane
2010-10-24 23:05:41 -04:00
Heikki Linnakangas
5c84fe4607 Make OFF keyword unreserved. It's not hard to imagine wanting to use 'off'
as a variable or column name, and it's not reserved in recent versions of
the SQL spec either. This became particularly annoying in 9.0, before that
PL/pgSQL replaced variable names in queries with parameter markers, so
it was possible to use OFF and many other backend parser keywords as
variable names. Because of that, backpatch to 9.0.
2010-10-22 17:44:50 +03:00
Tom Lane
529cb267a6 Improve handling of domains over arrays.
This patch eliminates various bizarre behaviors caused by sloppy thinking
about the difference between a domain type and its underlying array type.
In particular, the operation of updating one element of such an array
has to be considered as yielding a value of the underlying array type,
*not* a value of the domain, because there's no assurance that the
domain's CHECK constraints are still satisfied.  If we're intending to
store the result back into a domain column, we have to re-cast to the
domain type so that constraints are re-checked.

For similar reasons, such a domain can't be blindly matched to an ANYARRAY
polymorphic parameter, because the polymorphic function is likely to apply
array-ish operations that could invalidate the domain constraints.  For the
moment, we just forbid such matching.  We might later wish to insert an
automatic downcast to the underlying array type, but such a change should
also change matching of domains to ANYELEMENT for consistency.

To ensure that all such logic is rechecked, this patch removes the original
hack of setting a domain's pg_type.typelem field to match its base type;
the typelem will always be zero instead.  In those places where it's really
okay to look through the domain type with no other logic changes, use the
newly added get_base_element_type function in place of get_element_type.
catversion bumped due to change in pg_type contents.

Per bug #5717 from Richard Huxton and subsequent discussion.
2010-10-21 16:07:17 -04:00
Tom Lane
6e74a91b2b Fix incorrect generation of whole-row variables in planner.
A couple of places in the planner need to generate whole-row Vars, and were
cutting corners by setting vartype = RECORDOID in the Vars, even in cases
where there's an identifiable named composite type for the RTE being
referenced.  While we mostly got away with this, it failed when there was
also a parser-generated whole-row reference to the same RTE, because the
two Vars weren't equal() due to the difference in vartype.  Fix by
providing a subroutine the planner can call to generate whole-row Vars
the same way the parser does.

Per bug #5716 from Andrew Tipton.  Back-patch to 9.0 where one of the bogus
calls was introduced (the other one is new in HEAD).
2010-10-19 15:09:23 -04:00
Peter Eisentraut
bc8624b15d Support key word 'all' in host column of pg_hba.conf 2010-10-18 22:15:44 +03:00
Tom Lane
419d2374bf Fix a passel of inappropriately-named global functions in GIN.
The GIN code has absolutely no business exporting GIN-specific functions
with names as generic as compareItemPointers() or newScanKey(); that's
just trouble waiting to happen.  I got annoyed about this again just now
and decided to fix it.  This commit ensures that all global symbols
defined in access/gin/ have names including "gin" or "Gin".  There were a
couple of cases, like names involving "PostingItem", where arguably the
names were already sufficiently nongeneric; but I figured as long as I was
risking creating merge problems for unapplied GIN patches I might as well
impose a uniform policy.

I didn't touch any static symbol names.  There might be some places
where it'd be appropriate to rename some static functions to match
siblings that are exported, but I'll leave that for another time.
2010-10-17 21:43:26 -04:00
Tom Lane
48c7d9f6ff Improve GIN indexscan cost estimation.
The better estimate requires more statistics than we previously stored:
in particular, counts of "entry" versus "data" pages within the index,
as well as knowledge of the number of distinct key values.  We collect
this information during initial index build and update it during VACUUM,
storing the info in new fields on the index metapage.  No initdb is
required because these fields will read as zeroes in a pre-existing
index, and the new gincostestimate code is coded to behave (reasonably)
sanely if they are zeroes.

Teodor Sigaev, reviewed by Jan Urbanski, Tom Lane, and Itagaki Takahiro.
2010-10-17 20:52:32 -04:00
Tom Lane
07f1264dda Allow WITH clauses to be attached to INSERT, UPDATE, DELETE statements.
This is not the hoped-for facility of using INSERT/UPDATE/DELETE inside
a WITH, but rather the other way around.  It seems useful in its own
right anyway.

Note: catversion bumped because, although the contents of stored rules
might look compatible, there's actually a subtle semantic change.
A single Query containing a WITH and INSERT...VALUES now represents
writing the WITH before the INSERT, not before the VALUES.  While it's
not clear that that matters to anyone, it seems like a good idea to
have it cited in the git history for catversion.h.

Original patch by Marko Tiikkaja, with updating and cleanup by
Hitoshi Harada.
2010-10-15 19:55:25 -04:00
Peter Eisentraut
6ab42ae367 Support host names in pg_hba.conf
Peter Eisentraut, reviewed by KaiGai Kohei and Tom Lane
2010-10-15 22:56:18 +03:00
Tom Lane
11cad29c91 Support MergeAppend plans, to allow sorted output from append relations.
This patch eliminates the former need to sort the output of an Append scan
when an ordered scan of an inheritance tree is wanted.  This should be
particularly useful for fast-start cases such as queries with LIMIT.

Original patch by Greg Stark, with further hacking by Hans-Jurgen Schonig,
Robert Haas, and Tom Lane.
2010-10-14 16:57:57 -04:00
Peter Eisentraut
1a996d6c29 Remove executable permission from files where it doesn't belong 2010-10-13 22:30:25 +03:00
Tom Lane
f4d242ef94 Remove some unnecessary tests of pgstat_track_counts.
We may as well make pgstat_count_heap_scan() and related macros just count
whenever rel->pgstat_info isn't null.  Testing pgstat_track_counts buys
nothing at all in the normal case where that flag is ON; and when it's OFF,
the pgstat_info link will be null, so it's still a useless test.

This change is unlikely to buy any noticeable performance improvement,
but a cycle shaved is a cycle earned; and my investigations earlier today
convinced me that we're down to the point where individual instructions in
the inner execution loops are starting to matter.
2010-10-12 14:44:25 -04:00
Tom Lane
220e45bf32 Improve the planner's simplification of NOT constructs.
This patch merges the responsibility for NOT-flattening into
eval_const_expressions' processing.  It wasn't done that way originally
because prepqual.c is far older than eval_const_expressions.  But putting
this work into eval_const_expressions saves one pass over the qual trees,
and in fact saves even more than that because we can exploit the knowledge
that the subexpressions have already been recursively simplified.  Doing it
this way also lets us do it uniformly over all expressions, whereas
prepqual.c formerly just did it at top level to save cycles.  That should
improve the planner's ability to recognize logically-equivalent constructs.

While at it, also add the ability to fold a NOT into BooleanTest and
NullTest constructs (the latter only for the scalar-datatype case).

Per discussion of bug #5702.
2010-10-10 23:19:50 -04:00
Tom Lane
2ec993a7cb Support triggers on views.
This patch adds the SQL-standard concept of an INSTEAD OF trigger, which
is fired instead of performing a physical insert/update/delete.  The
trigger function is passed the entire old and/or new rows of the view,
and must figure out what to do to the underlying tables to implement
the update.  So this feature can be used to implement updatable views
using trigger programming style rather than rule hacking.

In passing, this patch corrects the names of some columns in the
information_schema.triggers view.  It seems the SQL committee renamed
them somewhere between SQL:99 and SQL:2003.

Dean Rasheed, reviewed by Bernd Helmle; some additional hacking by me.
2010-10-10 13:45:07 -04:00
Tom Lane
9cc8c84e73 Improve logging in VACUUM FULL VERBOSE and CLUSTER VERBOSE.
This patch resurrects some of the information that could be logged by the
old, now-dead implementation of VACUUM FULL, in particular counts of live
and dead tuples and the time taken for the table rebuild proper.  There's
still no logging about the ensuing index rebuilds, though.

Itagaki Takahiro
2010-10-07 21:46:46 -04:00
Tom Lane
3ba11d3df2 Teach CLUSTER to use seqscan-and-sort when it's faster than indexscan.
... or at least, when the planner's cost estimates say it will be faster.

Leonardo Francalanci, reviewed by Itagaki Takahiro and Tom Lane
2010-10-07 20:00:28 -04:00
Tom Lane
3e5f9412d0 Reduce the memory requirement for large ispell dictionaries.
This patch eliminates per-chunk palloc overhead for most small allocations
needed in the representation of an ispell dictionary.  This saves close to
a factor of 2 on the current Czech ispell data.  While it doesn't cover
every last small allocation in the ispell code, we are at the point of
diminishing returns, because about 95% of the allocations are covered
already.

Pavel Stehule, rather heavily revised by Tom
2010-10-06 19:31:05 -04:00
Tom Lane
9b910def24 Clean up temporary-memory management during ispell dictionary loading.
Add explicit initialization and cleanup functions to spell.c, and keep
all working state in the already-existing ISpellDict struct.  This lets us
get rid of a static variable along with some extremely shaky assumptions
about usage of child memory contexts.

This commit is just code beautification and has no impact on functionality
or performance, but it opens the way to a less-grotty implementation of
Pavel's memory-saving hack, which will follow shortly.
2010-10-06 15:15:15 -04:00
Tom Lane
eb22950510 Fix PlaceHolderVar mechanism's interaction with outer joins.
The point of a PlaceHolderVar is to allow a non-strict expression to be
evaluated below an outer join, after which its value bubbles up like a Var
and can be forced to NULL when the outer join's semantics require that.
However, there was a serious design oversight in that, namely that we
didn't ensure that there was actually a correct place in the plan tree
to evaluate the placeholder :-(.  It may be necessary to delay evaluation
of an outer join to ensure that a placeholder that should be evaluated
below the join can be evaluated there.  Per recent bug report from Kirill
Simonov.

Back-patch to 8.4 where the PlaceHolderVar mechanism was introduced.
2010-09-28 14:19:00 -04:00
Robert Haas
eacb22ec47 Fix duplicate OIDs introduced by SECURITY LABEL patch.
Report by Shigeru Hanada.
2010-09-28 07:07:03 -04:00
Robert Haas
4d355a8336 Add a SECURITY LABEL command.
This is intended as infrastructure to support integration with label-based
mandatory access control systems such as SE-Linux. Further changes (mostly
hooks) will be needed, but this is a big chunk of it.

KaiGai Kohei and Robert Haas
2010-09-27 20:55:27 -04:00
Peter Eisentraut
e440e12c56 Add ALTER TYPE ... ADD/DROP/ALTER/RENAME ATTRIBUTE
Like with tables, this also requires allowing the existence of
composite types with zero attributes.

reviewed by KaiGai Kohei
2010-09-26 14:41:03 +03:00
Magnus Hagander
fe9b36fd59 Convert cvsignore to gitignore, and add .gitignore for build targets. 2010-09-22 12:57:04 +02:00
Magnus Hagander
9f2e211386 Remove cvs keywords from all files. 2010-09-20 22:08:53 +02:00
Tom Lane
9eef3318a2 Fix several broken $PostgreSQL$ keywords. Noted while experimenting
with Magnus's script to remove these.
2010-09-19 16:17:45 +00:00
Heikki Linnakangas
723d0184e2 Use a latch to make startup process wake up and replay immediately when
new WAL arrives via streaming replication. This reduces the latency, and
also allows us to use a longer polling interval, which is good for energy
efficiency.

We still need to poll to check for the appearance of a trigger file, but
the interval is now 5 seconds (instead of 100ms), like when waiting for
a new WAL segment to appear in WAL archive.
2010-09-15 10:35:05 +00:00
Heikki Linnakangas
236b6bc29e Simplify Windows implementation of latches. There's no need to keep a
dynamic pool of event handles, we can permanently assign one for each
shared latch. Thanks to that, we no longer need a separate shared memory
block for latches, and we don't need to know in advance how many shared
latches there is, so you no longer need to remember to update
NumSharedLatches when you introduce a new latch to the system.
2010-09-15 10:06:21 +00:00
Tom Lane
4e97631e6a Fix join-removal logic for pseudoconstant and outerjoin-delayed quals.
In these cases a qual can get marked with the removable rel in its
required_relids, but this is just to schedule its evaluation correctly, not
because it really depends on the rel.  We were assuming that, in effect,
we could throw away *all* quals so marked, which is nonsense.  Tighten up
the logic to be a little more paranoid about which quals belong to the
outer join being considered for removal, and arrange for all quals that
don't belong to be updated so they will still get evaluated correctly.

Also fix another problem that happened to be exposed by this test case,
which was that make_join_rel() was failing to notice some cases where
a constant-false qual could be used to prove a join relation empty.  If it's
a pushed-down constant false, then the relation is empty even if it's an
outer join, because the qual applies after the outer join expansion.

Per report from Nathan Grange.  Back-patch into 9.0.
2010-09-14 23:15:29 +00:00
Heikki Linnakangas
d1c33ccf62 Remove prototype for non-existent function from walreceiver.h. Tidy up by
separating prototypes for functions in walreceiver.c and walreceiverfuncs.c
with comments.
2010-09-13 10:14:25 +00:00
Joe Conway
5eb15c9942 SERIALIZABLE transactions are actually implemented beneath the covers with
transaction snapshots, i.e. a snapshot registered at the beginning of
a transaction. Change variable naming and comments to reflect this reality
in preparation for a future, truly serializable mode, e.g.
Serializable Snapshot Isolation (SSI).

For the moment transaction snapshots are still used to implement
SERIALIZABLE, but hopefully not for too much longer. Patch by Kevin
Grittner and Dan Ports with review and some minor wording changes by me.
2010-09-11 18:38:58 +00:00
Heikki Linnakangas
2746e5f21d Introduce latches. A latch is a boolean variable, with the capability to
wait until it is set. Latches can be used to reliably wait until a signal
arrives, which is hard otherwise because signals don't interrupt select()
on some platforms, and even when they do, there's race conditions.

On Unix, latches use the so called self-pipe trick under the covers to
implement the sleep until the latch is set, without race conditions. On
Windows, Windows events are used.

Use the new latch abstraction to sleep in walsender, so that as soon as
a transaction finishes, walsender is woken up to immediately send the WAL
to the standby. This reduces the latency between master and standby, which
is good.

Preliminary work by Fujii Masao. The latch implementation is by me, with
helpful comments from many people.
2010-09-11 15:48:04 +00:00
Tom Lane
303696c3b4 Install a data-type-based solution for protecting pg_get_expr().
Since the code underlying pg_get_expr() is not secure against malformed
input, and can't practically be made so, we need to prevent miscreants
from feeding arbitrary data to it.  We can do this securely by declaring
pg_get_expr() to take a new datatype "pg_node_tree" and declaring the
system catalog columns that hold nodeToString output to be of that type.
There is no way at SQL level to create a non-null value of type pg_node_tree.
Since the backend-internal operations that fill those catalog columns
operate below the SQL level, they are oblivious to the datatype relabeling
and don't need any changes.
2010-09-03 01:34:55 +00:00
Tom Lane
8ab6a6b456 In HEAD only, revert kluge solution for preventing misuse of pg_get_expr().
A data-type-based solution, which is much cleaner and more bulletproof,
will follow shortly.  It seemed best to make this a separate commit though.
2010-09-03 01:26:52 +00:00
Tom Lane
9513918c6c Fix up flushing of composite-type typcache entries to be driven directly by
SI invalidation events, rather than indirectly through the relcache.

In the previous coding, we had to flush a composite-type typcache entry
whenever we discarded the corresponding relcache entry.  This caused problems
at least when testing with RELCACHE_FORCE_RELEASE, as shown in recent report
from Jeff Davis, and might result in real-world problems given the kind of
unexpected relcache flush that that test mechanism is intended to model.

The new coding decouples relcache and typcache management, which is a good
thing anyway from a structural perspective.  The cost is that we have to
search the typcache linearly to find entries that need to be flushed.  There
are a couple of ways we could avoid that, but at the moment it's not clear
it's worth any extra trouble, because the typcache contains very few entries
in typical operation.

Back-patch to 8.2, the same as some other recent fixes in this general area.
The patch could be carried back to 8.0 with some additional work, but given
that it's only hypothetical whether we're fixing any problem observable in
the field, it doesn't seem worth the work now.
2010-09-02 03:16:46 +00:00
Peter Eisentraut
2355b69b1e Small refactoring of makeVar() from a TargetEntry 2010-08-27 20:30:08 +00:00
Robert Haas
c10575ff00 Rewrite comment code for better modularity, and add necessary locking.
Review by Alvaro Herrera, KaiGai Kohei, and Tom Lane.
2010-08-27 11:47:41 +00:00
Itagaki Takahiro
49b27ab551 Add string functions: concat(), concat_ws(), left(), right(), and reverse().
Pavel Stehule, reviewed by me.
2010-08-24 06:30:44 +00:00
Tom Lane
b9defe0405 Marginal code cleanup for streaming replication.
There is no reason that proc.c should have to get involved in this dirty hack
for letting the postmaster know which children are walsenders.  Revert that
file to the way it was, and confine the kluge to pmsignal.c and postmaster.c.
2010-08-23 17:20:01 +00:00
Magnus Hagander
946045f04d Add vacuum and analyze counters to pg_stat_*_tables views. 2010-08-21 10:59:17 +00:00
Peter Eisentraut
3f11971916 Remove extra newlines at end and beginning of files, add missing newlines
at end of files.
2010-08-19 05:57:36 +00:00
Tom Lane
2d8314bd43 Rename utf2ucs() to utf8_to_unicode(), and export it so it can be used
elsewhere.

Similarly rename the version in mbprint.c, not because this affects anything
but just to keep the two copies in exact sync.  There was some discussion of
having only one copy in src/port/ instead, but this function is so small
and unlikely to change that that seems like overkill.

Slightly editorialized version of a patch by Joseph Adams.  (The bug-fix
aspect of his patch was applied separately, and back-patched.)
2010-08-18 19:54:01 +00:00
Tom Lane
b5565bca11 Fix failure of "ALTER TABLE t ADD COLUMN c serial" when done by non-owner.
The implicitly created sequence was created as owned by the current user,
who could be different from the table owner, eg if current user is a
superuser or some member of the table's owning role.  This caused sanity
checks in the SEQUENCE OWNED BY code to spit up.  Although possibly we
don't need those sanity checks, the safest fix seems to be to make sure
the implicit sequence is assigned the same owner role as the table has.
(We still do all permissions checks as the current user, however.)
Per report from Josh Berkus.

Back-patch to 9.0.  The bug goes back to the invention of SEQUENCE OWNED BY
in 8.2, but the fix requires an API change for DefineRelation(), which seems
to have potential for breaking third-party code if done in a minor release.
Given the lack of prior complaints, it's probably not worth fixing in the
stable branches.
2010-08-18 18:35:21 +00:00
Tom Lane
946b078044 MyBackendId now needs to be PGDLLIMPORT, so that contrib modules can
access it on Windows.  Per buildfarm.
2010-08-14 13:37:21 +00:00
Robert Haas
debcec7dc3 Include the backend ID in the relpath of temporary relations.
This allows us to reliably remove all leftover temporary relation
files on cluster startup without reference to system catalogs or WAL;
therefore, we no longer include temporary relations in XLOG_XACT_COMMIT
and XLOG_XACT_ABORT WAL records.

Since these changes require including a backend ID in each
SharedInvalSmgrMsg, the size of the SharedInvalidationMessage.id
field has been reduced from two bytes to one, and the maximum number
of connections has been reduced from INT_MAX / 4 to 2^23-1.  It would
be possible to remove these restrictions by increasing the size of
SharedInvalidationMessage by 4 bytes, but right now that doesn't seem
like a good trade-off.

Review by Jaime Casanova and Tom Lane.
2010-08-13 20:10:54 +00:00
Tom Lane
a0b7b717a4 Add xml_is_well_formed, xml_is_well_formed_document, xml_is_well_formed_content
functions to the core XML code.  Per discussion, the former depends on
XMLOPTION while the others do not.  These supersede a version previously
offered by contrib/xml2.

Mike Fowler, reviewed by Pavel Stehule
2010-08-13 18:36:26 +00:00
Robert Haas
30c22eb8fc Correct sundry errors in Hot Standby-related comments.
Fujii Masao
2010-08-12 23:24:54 +00:00
Tom Lane
33f43725fb Add three-parameter forms of array_to_string and string_to_array, to allow
better handling of NULL elements within the arrays.  The third parameter
is a string that should be used to represent a NULL element, or should
be translated into a NULL element, respectively.  If the third parameter
is NULL it behaves the same as the two-parameter form.

There are two incompatible changes in the behavior of the two-parameter form
of string_to_array.  First, it will return an empty (zero-element) array
rather than NULL when the input string is of zero length.  Second, if the
field separator is NULL, the function splits the string into individual
characters, rather than returning NULL as before.  These two changes make
this form fully compatible with the behavior of the new three-parameter form.

Pavel Stehule, reviewed by Brendan Jurd
2010-08-10 21:51:00 +00:00
Tom Lane
4dfc457854 Add an xpath_exists() function. This is equivalent to XMLEXISTS except that
it offers support for namespace mapping.

Mike Fowler, reviewed by David Fetter
2010-08-08 19:15:27 +00:00
Tom Lane
46aa77c7bd Add stats functions and views to provide access to a transaction's own
statistics counts.  These numbers are being accumulated but haven't yet been
transmitted to the collector (and won't be, until the transaction ends).
For some purposes, though, it's handy to be able to look at them.

Joel Jacobson, reviewed by Itagaki Takahiro
2010-08-08 16:27:06 +00:00
Tom Lane
e49ae8d3bc Recognize functional dependency on primary keys. This allows a table's
other columns to be referenced without listing them in GROUP BY, so long as
the primary key column(s) are listed in GROUP BY.

Eventually we should also allow functional dependency on a UNIQUE constraint
when the columns are marked NOT NULL, but that has to wait until NOT NULL
constraints are represented in pg_constraint, because we need to have
pg_constraint OIDs for all the conditions needed to ensure functional
dependency.

Peter Eisentraut, reviewed by Alex Hunsaker and Tom Lane
2010-08-07 02:44:09 +00:00
Tom Lane
b0c451e145 Remove the single-argument form of string_agg(). It added nothing much in
functionality, while creating an ambiguity in usage with ORDER BY that at
least two people have already gotten seriously confused by.  Also, add an
opr_sanity test to check that we don't in future violate the newly minted
policy of not having built-in aggregates with the same name and different
numbers of parameters.  Per discussion of a complaint from Thom Brown.
2010-08-05 18:21:19 +00:00
Robert Haas
fd1843ff89 Standardize get_whatever_oid functions for other object types.
- Rename TSParserGetPrsid to get_ts_parser_oid.
- Rename TSDictionaryGetDictid to get_ts_dict_oid.
- Rename TSTemplateGetTmplid to get_ts_template_oid.
- Rename TSConfigGetCfgid to get_ts_config_oid.
- Rename FindConversionByName to get_conversion_oid.
- Rename GetConstraintName to get_constraint_oid.
- Add new functions get_opclass_oid, get_opfamily_oid, get_rewrite_oid,
  get_rewrite_oid_without_relid, get_trigger_oid, and get_cast_oid.

The name of each function matches the corresponding catalog.

Thanks to KaiGai Kohei for the review.
2010-08-05 15:25:36 +00:00
Robert Haas
2a6ef3445c Standardize get_whatever_oid functions for object types with
unqualified names.

- Add a missing_ok parameter to get_tablespace_oid.
- Avoid duplicating get_tablespace_od guts in objectNamesToOids.
- Add a missing_ok parameter to get_database_oid.
- Replace get_roleid and get_role_checked with get_role_oid.
- Add get_namespace_oid, get_language_oid, get_am_oid.
- Refactor existing code to use new interfaces.

Thanks to KaiGai Kohei for the review.
2010-08-05 14:45:09 +00:00
Peter Eisentraut
641459f269 Add xmlexists function
by Mike Fowler, reviewed by Peter Eisentraut
2010-08-05 04:21:54 +00:00
Robert Haas
26e47efb66 Fix declared argument name for numeric_maximum_size.
The previous commit changed the function to say 'typmod' rather than
'typemod', but I forgot to update the header file.
2010-08-04 17:35:59 +00:00
Tom Lane
db04f2b322 Replace the naive HYPOT() macro with a standards-conformant hypotenuse
function.  This avoids unnecessary overflows and probably gives a more
accurate result as well.

Paul Matthews, reviewed by Andrew Geery
2010-08-03 21:21:03 +00:00
Tom Lane
67becf8d41 Fix ANALYZE's ancient deficiency of not trying to collect stats for expression
indexes when the index column type (the opclass opckeytype) is different from
the expression's datatype.  When coded, this limitation wasn't worth worrying
about because we had no intelligence to speak of in stats collection for the
datatypes used by such opclasses.  However, now that there's non-toy
estimation capability for tsvector queries, it amounts to a bug that ANALYZE
fails to do this.

The fix changes struct VacAttrStats, and therefore constitutes an API break
for custom typanalyze functions.  Therefore we can't back-patch it into
released branches, but it was agreed that 9.0 isn't yet frozen hard enough
to make such a change unacceptable.  Ergo, back-patch to 9.0 but no further.
The API break had better be mentioned in 9.0 release notes.
2010-08-01 22:38:11 +00:00
Tom Lane
d4fe61b083 Fix an additional set of problems in GIN's handling of lossy page pointers.
Although the key-combining code claimed to work correctly if its input
contained both lossy and exact pointers for a single page in a single TID
stream, in fact this did not work, and could not work without pretty
fundamental redesign.  Modify keyGetItem so that it will not return such a
stream, by handling lossy-pointer cases a bit more explicitly than we did
before.

Per followup investigation of a gripe from Artur Dabrowski.
An example of a query that failed given his data set is
select count(*) from search_tab where
(to_tsvector('german', keywords ) @@ to_tsquery('german', 'ee:* | dd:*')) and
(to_tsvector('german', keywords ) @@ to_tsquery('german', 'aa:*'));

Back-patch to 8.4 where the lossy pointer code was introduced.
2010-08-01 19:16:39 +00:00
Tom Lane
0454f13161 Rewrite the rbtree routines so that an RBNode is the first field of the
struct representing a tree entry, rather than being a separately allocated
piece of storage.  This API is at least as clean as the old one (if not
more so --- there were some bizarre choices in there) and it permits a
very substantial memory savings, on the order of 2X in ginbulk.c's usage.

Also, fix minor memory leaks in code called by ginEntryInsert, in
particular in ginInsertValue and entryFillRoot, as well as ginEntryInsert
itself.  These leaks resulted in the GIN index build context continuing
to bloat even after we'd filled it to maintenance_work_mem and started
to dump data out to the index.

In combination these fixes restore the GIN index build code to honoring
the maintenance_work_mem limit about as well as it did in 8.4.  Speed
seems on par with 8.4 too, maybe even a bit faster, for a non-pathological
case in which HEAD was formerly slower.

Back-patch to 9.0 so we don't have a performance regression from 8.4.
2010-08-01 02:12:42 +00:00
Tom Lane
2ab57e089b Rewrite the key-combination logic in GIN's keyGetItem() and scanGetItem()
routines to make them behave better in the presence of "lossy" index pointers.
The previous coding was outright incorrect for some cases, as recently
reported by Artur Dabrowski: scanGetItem would fail to return index entries in
cases where one index key had multiple exact pointers on the same page as
another key had a lossy pointer.  Also, keyGetItem was extremely inefficient
for cases where a single index key generates multiple "entry" streams, such as
an @@ operator with a multiple-clause tsquery.  The presence of a lossy page
pointer in any one stream defeated its ability to use the opclass
consistentFn, resulting in probing many heap pages that didn't really need to
be visited.  In Artur's example case, a query like
	WHERE tsvector @@ to_tsquery('a & b')
was about 50X slower than the theoretically equivalent
	WHERE tsvector @@ to_tsquery('a') AND tsvector @@ to_tsquery('b')
The way that I chose to fix this was to have GIN call the consistentFn
twice with both TRUE and FALSE values for the in-doubt entry stream,
returning a hit if either call produces TRUE, but not if they both return
FALSE.  The code handles this for the case of a single in-doubt entry stream,
but punts (falling back to the stupid behavior) if there's more than one lossy
reference to the same page.  The idea could be scaled up to deal with multiple
lossy references, but I think that would probably be wasted complexity.  At
least to judge by Artur's example, such cases don't occur often enough to be
worth trying to optimize.

Back-patch to 8.4.  8.3 did not have lossy GIN index pointers, so not
subject to these problems.
2010-07-31 00:30:54 +00:00
Robert Haas
8a4dc94ca0 Make details of the Numeric representation private to numeric.c.
Review by Tom Lane.
2010-07-30 04:30:23 +00:00
Tom Lane
f223bb7a41 Improved version of patch to protect pg_get_expr() against misuse:
look through join alias Vars to avoid breaking join queries, and
move the test to someplace where it will catch more possible ways
of calling a function.  We still ought to throw away the whole thing
in favor of a data-type-based solution, but that's not feasible in
the back branches.

This needs to be back-patched further than 9.0, but I don't have time
to do so today.  Committing now so that the fix gets into 9.0beta4.
2010-07-29 23:16:33 +00:00
Simon Riggs
5b8bd0529e Rename asyncCommitLSN to asyncXactLSN to reflect changed role in 9.0.
Transaction aborts now record their LSN to avoid corner case
behaviour in SR/HS, hence change of name of variables and functions.
As pointed out by Fujii Masao. Cosmetic changes only.
2010-07-29 22:27:27 +00:00
Tom Lane
aab353a60b Clean up some inconsistencies in the volatility marking of various I/O
related functions.  Per today's discussion, we will henceforth assume
that datatype I/O functions are either stable or immutable, never volatile.
(This implies in particular that domain CHECK constraint expressions shouldn't
be volatile, since domain_in executes them.)  In turn, functions that execute
the I/O functions of arbitrary datatypes should always be labeled stable.
This affects the labeling of array_to_string, which was unsafely marked
immutable, and record_in, record_out, record_recv, record_send,
domain_in, domain_recv, which were over-conservatively marked volatile.
The array I/O functions were already marked stable, which is correct
per this policy but would have been wrong if we maintained domain_in
as volatile.

Back-patch to 9.0, along with an earlier fix to correctly mark cash_in
and cash_out as stable not immutable (since they depend on lc_monetary).

No catversion bump --- the implications of this are not currently
severe enough to justify a forced initdb.
2010-07-29 20:09:25 +00:00
Simon Riggs
04e17bae50 Add explicit regression tests for ALTER TABLE lock levels.
Use this to catch a couple of lock level assignments that slipped
through manual testing, per Peter Eisentraut.
2010-07-29 11:06:34 +00:00
Simon Riggs
2dbbda02e7 Reduce lock levels of CREATE TRIGGER and some ALTER TABLE, CREATE RULE actions.
Avoid hard-coding lockmode used for many altering DDL commands, allowing easier
future changes of lock levels. Implementation of initial analysis on DDL
sub-commands, so that many lock levels are now at ShareUpdateExclusiveLock or
ShareRowExclusiveLock, allowing certain DDL not to block reads/writes.
First of number of planned changes in this area; additional docs required
when full project complete.
2010-07-28 05:22:24 +00:00
Tom Lane
133924e13e Fix potential failure when hashing the output of a subplan that produces
a pass-by-reference datatype with a nontrivial projection step.
We were using the same memory context for the projection operation as for
the temporary context used by the hashtable routines in execGrouping.c.
However, the hashtable routines feel free to reset their temp context at
any time, which'd lead to destroying input data that was still needed.
Report and diagnosis by Tao Ma.

Back-patch to 8.1, where the problem was introduced by the changes that
allowed us to work with "virtual" tuples instead of materializing intermediate
tuple values everywhere.  The earlier code looks quite similar, but it doesn't
suffer the problem because the data gets copied into another context as a
result of having to materialize ExecProject's output tuple.
2010-07-28 04:50:50 +00:00
Robert Haas
a3b012b560 CREATE TABLE IF NOT EXISTS.
Reviewed by Bernd Helmle.
2010-07-25 23:21:22 +00:00
Robert Haas
ce68df468a Add options to force quoting of all identifiers.
I've added a quote_all_identifiers GUC which affects the behavior
of the backend, and a --quote-all-identifiers argument to pg_dump
and pg_dumpall which sets the GUC and also affects the quoting done
internally by those applications.

Design by Tom Lane; review by Alex Hunsaker; in response to bug #5488
filed by Hartmut Goebel.
2010-07-22 01:22:35 +00:00
Robert Haas
b8c6c71d1c Centralize DML permissions-checking logic.
Remove bespoke code in DoCopy and RI_Initial_Check, which now instead
fabricate call ExecCheckRTPerms with a manufactured RangeTblEntry.
This is intended to make it feasible for an enhanced security provider
to actually make use of ExecutorCheckPerms_hook, but also has the
advantage that RI_Initial_Check can allow use of the fast-path when
column-level but not table-level permissions are present.

KaiGai Kohei.  Reviewed (in an earlier version) by Stephen Frost, and by me.
Some further changes to the comments by me.
2010-07-22 00:47:59 +00:00
Robert Haas
5ffaa9005c Add restart_after_crash GUC.
Normally, we automatically restart after a backend crash, but in some
cases when PostgreSQL is invoked by clusterware it may be desirable to
suppress this behavior, so we provide an option which does this.
Since no existing GUC group quite fits, create a new group called
"error handling options" for this and the previously undocumented GUC
exit_on_error, which is now documented.

Review by Fujii Masao.
2010-07-20 00:47:53 +00:00
Tom Lane
3ec694e17b Add a log_file_mode GUC that allows control of the file permissions set on
log files created by the syslogger process.

In passing, make unix_file_permissions display its value in octal, same
as log_file_mode now does.

Martin Pihlak
2010-07-16 22:25:51 +00:00
Tom Lane
7590ddb3eb Add support for dividing money by money (yielding a float8 result) and for
casting between money and numeric.

Andy Balholm, reviewed by Kevin Grittner
2010-07-16 02:15:56 +00:00
Tom Lane
1cc29fe7c6 Teach EXPLAIN to print PARAM_EXEC Params as the referenced expressions,
rather than just $N.  This brings the display of nestloop-inner-indexscan
plans back to where it's been, and incidentally improves the display of
SubPlan parameters as well.  In passing, simplify the EXPLAIN code by
having it deal primarily in the PlanState tree rather than separately
searching Plan and PlanState trees.  This is noticeably cleaner for
subplans, and about a wash elsewhere.

One small difference from previous behavior is that EXPLAIN will no longer
qualify local variable references in inner-indexscan plan nodes, since it
no longer sees such nodes as possibly referencing multiple tables.  Vars
referenced through PARAM_EXEC Params are still forcibly qualified, though,
so I don't think the display is any more confusing than before.  Adjust a
couple of examples in the documentation to match this behavior.
2010-07-13 20:57:19 +00:00
Tom Lane
53e757689c Make NestLoop plan nodes pass outer-relation variables into their inner
relation using the general PARAM_EXEC executor parameter mechanism, rather
than the ad-hoc kluge of passing the outer tuple down through ExecReScan.
The previous method was hard to understand and could never be extended to
handle parameters coming from multiple join levels.  This patch doesn't
change the set of possible plans nor have any significant performance effect,
but it's necessary infrastructure for future generalization of the concept
of an inner indexscan plan.

ExecReScan's second parameter is now unused, so it's removed.
2010-07-12 17:01:06 +00:00
Robert Haas
f4122a8d50 Add a hook in ExecCheckRTPerms().
This hook allows a loadable module to gain control when table permissions
are checked.  It is expected to be used by an eventual SE-PostgreSQL
implementation, but there are other possible applications as well.  A
sample contrib module can be found in the archives at:

http://archives.postgresql.org/pgsql-hackers/2010-05/msg01095.php

Robert Haas and Stephen Frost
2010-07-09 14:06:01 +00:00
Tom Lane
b40466c337 Stamp HEAD as 9.1devel.
(And there was much rejoicing.)
2010-07-09 04:10:58 +00:00
Marc G. Fournier
1084f31770 tag beta3 2010-07-09 02:43:12 +00:00
Bruce Momjian
239d769e7e pgindent run for 9.0, second run 2010-07-06 19:19:02 +00:00
Heikki Linnakangas
eb81b6509f The previous fix in CVS HEAD and 8.4 for handling the case where a cursor
being used in a PL/pgSQL FOR loop is closed was inadequate, as Tom Lane
pointed out. The bug affects FOR statement variants too, because you can
close an implicitly created cursor too by guessing the "<unnamed portal X>"
name created for it.

To fix that, "pin" the portal to prevent it from being dropped while it's
being used in a PL/pgSQL FOR loop. Backpatch all the way to 7.4 which is
the oldest supported version.
2010-07-05 09:27:18 +00:00
Tom Lane
e76c1a0f4d Replace max_standby_delay with two parameters, max_standby_archive_delay and
max_standby_streaming_delay, and revise the implementation to avoid assuming
that timestamps found in WAL records can meaningfully be compared to clock
time on the standby server.  Instead, the delay limits are compared to the
elapsed time since we last obtained a new WAL segment from archive or since
we were last "caught up" to WAL data arriving via streaming replication.
This avoids problems with clock skew between primary and standby, as well
as other corner cases that the original coding would misbehave in, such
as the primary server having significant idle time between transactions.
Per my complaint some time ago and considerable ensuing discussion.

Do some desultory editing on the hot standby documentation, too.
2010-07-03 20:43:58 +00:00
Robert Haas
b3b7d603fb Allow REASSIGNED OWNED to handle opclasses and opfamilies.
Backpatch to 8.3, which is as far back as we have opfamilies.
The opclass portion could probably be backpatched to 8.2, when
REASSIGN OWNED was added, but for now I have not done that.

Asko Tiidumaa, with minor adjustments by me.
2010-07-03 13:53:13 +00:00
Peter Eisentraut
a3401bea9c Use different function names for plpython3 handlers, to avoid clashes in
pg_pltemplate

This should have a catversion bump, but it's still being debated whether
it's worth it during beta.
2010-06-29 00:18:11 +00:00
Tom Lane
78e8f0025e Clean up some randomness associated with trace_recovery_messages: don't
put the variable declaration in the middle of a bunch of externs,
and do use extern where it should be used.
2010-06-17 17:44:40 +00:00
Tom Lane
07e8b6aabc Don't allow walsender to send WAL data until it's been safely fsync'd on the
master.  Otherwise a subsequent crash could cause the master to lose WAL that
has already been applied on the slave, resulting in the slave being out of
sync and soon corrupt.  Per recent discussion and an example from Robert Haas.

Fujii Masao
2010-06-17 16:41:25 +00:00
Itagaki Takahiro
147c665d01 Remove prototype of GetOldestWALSendPointer(), that is marked as NOT_USED. 2010-06-17 00:06:34 +00:00
Itagaki Takahiro
41f302b52a Add new GUC categories corresponding to sections in docs, and move
description for vacuum_defer_cleanup_age to the correct category.
Sections in postgresql.conf are also sorted in the same order with docs.

Per gripe by Fujii Masao, suggestion by Heikki Linnakangas, and patch by me.
2010-06-15 07:52:11 +00:00
Robert Haas
26b7abfa32 Fix ALTER LARGE OBJECT and GRANT ... ON LARGE OBJECT for large OIDs.
The previous coding failed for OIDs too large to be represented by
a signed integer.
2010-06-13 17:43:13 +00:00
Heikki Linnakangas
0a7cb85531 Make TriggerFile variable static. It's not used outside xlog.c.
Fujii Masao
2010-06-10 07:49:23 +00:00
Marc G. Fournier
dcd52a64bd tag 9.0beta2 2010-06-04 07:28:30 +00:00
Tom Lane
0cc59cc1f3 Add current WAL end (as seen by walsender, ie, GetWriteRecPtr() result)
and current server clock time to SR data messages.  These are not currently
used on the slave side but seem likely to be useful in future, and it'd be
better not to change the SR protocol after release.  Per discussion.
Also do some minor code review and cleanup on walsender.c, and improve the
protocol documentation.
2010-06-03 22:17:32 +00:00
Alvaro Herrera
667440162c Add comments about definitions that may affect PG_CONTROL_VERSION,
per recent unintended-initdb-forcing fiasco
2010-06-03 20:37:13 +00:00
Tom Lane
34e543763c Bump PG_CONTROL_VERSION to account for the incompatible change committed earlier. 2010-06-03 14:50:30 +00:00
Robert Haas
d561430b66 On clean shutdown during recovery, don't warn about possible corruption.
Fujii Masao.  Review by Heikki Linnakangas and myself.
2010-06-03 03:20:00 +00:00
Itagaki Takahiro
e54b0cba96 PGDLLEXPORT is __declspec (dllexport) only on MSVC,
but is __declspec (dllimport) on other compilers
because cygwin and mingw don't like dllexport.
2010-05-28 16:34:15 +00:00
Tom Lane
c82d931dd1 Fix the volatility marking of textanycat() and anytextcat(): they were marked
immutable, but that is wrong in general because the cast from the polymorphic
argument to text could be stable or even volatile.  Mark them volatile for
safety.  In the typical case where the cast isn't volatile, the planner will
deduce the correct expression volatility after inlining the function, so
performance is not lost.  The just-committed fix in CREATE INDEX also ensures
this won't break any indexing cases that ought to be allowed.

Per discussion, I'm not bumping catversion for this change, as it doesn't
seem critical enough to force an initdb on beta testers.
2010-05-27 16:20:11 +00:00
Itagaki Takahiro
77e50a61ff Mark PG_MODULE_MAGIC and PG_FUNCTION_INFO_V1 with PGDLLEXPORT
independently from BUILDING_DLL. It is always __declspec(dllexport).
2010-05-27 07:59:48 +00:00
Simon Riggs
f9dbac9476 HS Defer buffer pin deadlock check until deadlock_timeout has expired.
During Hot Standby we need to check for buffer pin deadlocks when the
Startup process begins to wait, in case it never wakes up again. We
previously made the deadlock check immediately on the basis it was
cheap, though clearer thinking and prima facie evidence shows that
was too simple. Refactor existing code to make it easy to add in
deferral of deadlock check until deadlock_timeout allowing a good
reduction in deadlock checks since far few buffer pins are held for
that duration. It's worth doing anyway, though major goal is to
prevent further reports of context switching with high numbers of
users on occasional tests.
2010-05-26 19:52:52 +00:00
Michael Meskes
29259531c7 Replace self written 'long long int' configure test by standard 'AC_TYPE_LONG_LONG_INT' macro call. 2010-05-25 17:28:20 +00:00
Michael Meskes
555a02f910 Added a configure test for "long long" datatypes. So far this is only used in ecpg and replaces the old test that was kind of hackish. 2010-05-25 14:32:55 +00:00
Robert Haas
ea9968c331 Rename PM_RECOVERY_CONSISTENT and PMSIGNAL_RECOVERY_CONSISTENT.
The new names PM_HOT_STANDBY and PMSIGNAL_BEGIN_HOT_STANDBY more accurately
reflect their actual function.
2010-05-15 20:01:32 +00:00
Tom Lane
c453569f0d Spell __NetBSD__ the same way everywhere. Per Giles Lean. 2010-05-15 14:44:13 +00:00
Bruce Momjian
5b79fdadda Use __bsdi__ consistently. 2010-05-15 10:14:20 +00:00
Tom Lane
382ff21203 Fix up lame idea of not using autoconf to determine if platform has scandir().
Should fix buildfarm failures.
2010-05-13 22:07:43 +00:00
Simon Riggs
8431e296ea Cleanup initialization of Hot Standby. Clarify working with reanalysis
of requirements and documentation on LogStandbySnapshot(). Fixes
two minor bugs reported by Tom Lane that would lead to an incorrect
snapshot after transaction wraparound. Also fix two other problems
discovered that would give incorrect snapshots in certain cases.
ProcArrayApplyRecoveryInfo() substantially rewritten. Some minor
refactoring of xact_redo_apply() and ExpireTreeKnownAssignedTransactionIds().
2010-05-13 11:15:38 +00:00
Robert Haas
dd6fcd35e3 Change typedef for rb_appendator to avoid conflict with C++ reserved words.
Fixes a complaint from src/tools/pginclude/cpluspluscheck reported by
Peter Eisentraut.
2010-05-11 18:14:01 +00:00
Marc G. Fournier
f9d9b2b34a tag for 9.0beta1 2010-04-30 03:16:58 +00:00
Tom Lane
f0488bd57c Rename the parameter recovery_connections to hot_standby, to reduce possible
confusion with streaming-replication settings.  Also, change its default
value to "off", because of concern about executing new and poorly-tested
code during ordinary non-replicating operation.  Per discussion.

In passing do some minor editing of related documentation.
2010-04-29 21:36:19 +00:00
Heikki Linnakangas
9b8a73326e Introduce wal_level GUC to explicitly control if information needed for
archival or hot standby should be WAL-logged, instead of deducing that from
other options like archive_mode. This replaces recovery_connections GUC in
the primary, where it now has no effect, but it's still used in the standby
to enable/disable hot standby.

Remove the WAL-logging of "unlogged operations", like creating an index
without WAL-logging and fsyncing it at the end. Instead, we keep a copy of
the wal_mode setting and the settings that affect how much shared memory a
hot standby server needs to track master transactions (max_connections,
max_prepared_xacts, max_locks_per_xact) in pg_control. Whenever the settings
change, at server restart, write a WAL record noting the new settings and
update pg_control. This allows us to notice the change in those settings in
the standby at the right moment, they used to be included in checkpoint
records, but that meant that a changed value was not reflected in the
standby until the first checkpoint after the change.

Bump PG_CONTROL_VERSION and XLOG_PAGE_MAGIC. Whack XLOG_PAGE_MAGIC back to
the sequence it used to follow, before hot standby and subsequent patches
changed it to 0x9003.
2010-04-28 16:10:43 +00:00
Bruce Momjian
75c5738177 Reorder pg_stat_activity columns to be more consistent, using layout
suggested by Tom Lane.

Catalog version bumped due to system view change.
2010-04-26 14:22:37 +00:00
Simon Riggs
90e04bab39 Patch revoked because of objections. 2010-04-24 16:20:32 +00:00
Robert Haas
33980a0640 Fix various instances of "the the".
Two of these were pointed out by Erik Rijkers; the rest I found.
2010-04-23 23:21:44 +00:00
Simon Riggs
473af39737 Add missing optimizer hooks for function cost and number of rows.
Closely follow design of other optimizer hooks: if hook exists
retrieve value from plugin; if still not set then get from cache.
2010-04-23 22:23:39 +00:00
Simon Riggs
491d1ea5b3 Previous patch revoked following objections. 2010-04-23 20:21:31 +00:00
Simon Riggs
6ca23b1a29 Make CheckRequiredParameterValues() depend upon correct combination
of parameters. Fix bug report by Robert Haas that error message and
hint was incorrect if wrong mode parameters specified on master.
Internal changes only. Proposals for parameter simplification on
master/primary still under way.
2010-04-23 19:57:19 +00:00
Simon Riggs
bc2b85d904 Fix oversight in collecting values for cleanup_info records.
vacuum_log_cleanup_info() now generates log records with a valid
latestRemovedXid set in all cases. Also be careful not to zero the
value when we do a round of vacuuming part-way through lazy_scan_heap().
Incidentally, this reduces frequency of conflicts in Hot Standby.
2010-04-21 17:20:56 +00:00
Tom Lane
ea46000a40 Arrange for client authentication to occur before we select a specific
database to connect to. This is necessary for the walsender code to work
properly (it was previously using an untenable assumption that template1 would
always be available to connect to).  This also gets rid of a small security
shortcoming that was introduced in the original patch to eliminate the flat
authentication files: before, you could find out whether or not the requested
database existed even if you couldn't pass the authentication checks.

The changes needed to support this are mainly just to treat pg_authid and
pg_auth_members as nailed relations, so that we can read them without having
to be able to locate real pg_class entries for them.  This mechanism was
already debugged for pg_database, but we hadn't recognized the value of
applying it to those catalogs too.

Since the current code doesn't have support for accessing toast tables before
we've brought up all of the relcache, remove pg_authid's toast table to ensure
that no one can store an out-of-line toasted value of rolpassword.  The case
seems quite unlikely to occur in practice, and was effectively unsupported
anyway in the old "flatfiles" implementation.

Update genbki.pl to actually implement the same rules as bootstrap.c does for
not-nullability of catalog columns.  The previous coding was a bit cheesy but
worked all right for the previous set of bootstrap catalogs.  It does not work
for pg_authid, where rolvaliduntil needs to be nullable.

Initdb forced due to minor catalog changes (mainly the toast table removal).
2010-04-20 23:48:47 +00:00
Robert Haas
481cb5d9b5 Rename standby_keep_segments to wal_keep_segments.
Also, make the name of the GUC and the name of the backing variable match.
Alnong the way, clean up a couple of slight typographical errors in the
related docs.
2010-04-20 11:15:06 +00:00
Simon Riggs
cfac702223 Add new message for explicit rejection by pg_hba.conf. Implicit
rejection retains same message as before.
2010-04-19 19:02:18 +00:00
Robert Haas
5b89ef384c Add an 'enable_material' GUC.
The logic for determining whether to materialize has been significantly
overhauled for 9.0.  In case there should be any doubt about whether
materialization is a win in any particular case, this should provide a
convenient way of seeing what happens without it; but even with enable_material
turned off, we still materialize in cases where it is required for
correctness.

Thanks to Tom Lane for the review.
2010-04-19 00:55:26 +00:00
Simon Riggs
2847de9df2 Remove some additional changes in previous commit that belong elsewhere. 2010-04-18 18:17:12 +00:00
Simon Riggs
21d6a6a128 Tune GetSnapshotData() during Hot Standby by avoiding loop
through normal backends. Makes code clearer also, since we
avoid various Assert()s. Performance of snapshots taken
during recovery no longer depends upon number of read-only
backends.
2010-04-18 18:06:07 +00:00
Heikki Linnakangas
361bd1662e Allow Hot Standby to begin from a shutdown checkpoint.
Patch by Simon Riggs & me
2010-04-13 14:17:46 +00:00
Heikki Linnakangas
30556568f5 Update the location of last removed WAL segment in shared memory only
after actually removing one, so that if we can't remove segments because
WAL archiving is lagging behind, we don't unnecessarily forbid streaming
the old not-yet-archived segments that are still perfectly valid. Per
suggestion from Fujii Masao.
2010-04-12 10:40:43 +00:00
Heikki Linnakangas
e57cd7f0a1 Change the logic to decide when to delete old WAL segments, so that it
doesn't take into account how far the WAL senders are. This way a hung
WAL sender doesn't prevent old WAL segments from being recycled/removed
in the primary, ultimately causing the disk to fill up. Instead add
standby_keep_segments setting to control how many old WAL segments are
kept in the primary. This also makes it more reliable to use streaming
replication without WAL archiving, assuming that you set
standby_keep_segments high enough.
2010-04-12 09:52:29 +00:00
Tom Lane
9029df17c4 Fix updateAclDependencies() to not assume that ACL role dependencies can only
be added during GRANT and can only be removed during REVOKE; and fix its
callers to not lie to it about the existing set of dependencies when
instantiating a formerly-default ACL.  The previous coding accidentally failed
to malfunction so long as default ACLs contain only references to the object's
owning role, because that role is ignored by updateAclDependencies.  However
this is obviously pretty fragile, as well as being an undocumented assumption.
The new coding is a few lines longer but IMO much clearer.
2010-04-05 01:09:53 +00:00
Magnus Hagander
4c10623306 Update a number of broken links in comments.
Josh Kupershmidt
2010-04-02 15:21:20 +00:00
Robert Haas
54943734f8 Refer to max_wal_senders in a more consistent fashion.
The error message now makes explicit reference to the GUC that must be changed
to fix the problem, using wording suggested by Tom Lane.  Along the way,
rename the GUC from MaxWalSenders to max_wal_senders for consistency and
grep-ability.
2010-04-01 00:43:29 +00:00
Tom Lane
d174a4adbb Fix "constraint_exclusion = partition" logic so that it will also attempt
constraint exclusion on an inheritance set that is the target of an UPDATE
or DELETE query.  Per gripe from Marc Cousin.  Back-patch to 8.4 where
the feature was introduced.
2010-03-30 21:58:11 +00:00
Tom Lane
b78f6264eb Rework join-removal logic as per recent discussion. In particular this
fixes things so that it works for cases where nested removals are possible.
The overhead of the optimization should be significantly less, as well.
2010-03-28 22:59:34 +00:00
Simon Riggs
a760893dbd Derive latestRemovedXid for btree deletes by reading heap pages. The
WAL record for btree delete contains a list of tids, even when backup
blocks are present. We follow the tids to their heap tuples, taking
care to follow LP_REDIRECT tuples. We ignore LP_DEAD tuples on the
understanding that they will always have xmin/xmax earlier than any
LP_NORMAL tuples referred to by killed index tuples. Iff all tuples
are LP_DEAD we return InvalidTransactionId. The heap relfilenode is
added to the WAL record, requiring API changes to pass down the heap
Relation. XLOG_PAGE_MAGIC updated.
2010-03-28 09:27:02 +00:00
Alvaro Herrera
be8cebc717 Prevent ALTER USER f RESET ALL from removing the settings that were put there
by a superuser -- "ALTER USER f RESET setting" already disallows removing such a
setting.

Apply the same treatment to ALTER DATABASE d RESET ALL when run by a database
owner that's not superuser.
2010-03-25 14:44:34 +00:00
Simon Riggs
bf6285b3a7 Further corrections of mismatching struct and btree SizeOf macros.
In this case, correction is to remove now unused fields from struct.
Since these were unused and full of garbage anyway, no version change.
2010-03-20 07:49:48 +00:00
Tom Lane
865b29540e Fix oversight in btpo.xact patch; it was in fact installing garbage
in the xact field on replay, due to not writing out all the data in
the wal log struct.
2010-03-19 20:51:30 +00:00
Simon Riggs
aa36bd2039 Update XLOG_PAGE_MAGIC to recognise WAL format changes. 2010-03-19 17:42:10 +00:00
Simon Riggs
3cdafe40e7 Adjust comment in .history file to match recovery target specified. Comment
present since 8.0 was never fully meaningful, since two recovery targets
cannot be specified. Refactor recovery target type to make this change
and associated code easier to understand. No change in function.

Bug report arising from internal support question.
2010-03-19 11:05:15 +00:00
Simon Riggs
5c73ae17d1 Reset btpo.xact following recovery of btree delete page. Add btpo_xact
field into WAL record and reset it from there, rather than using
FrozenTransactionId which can lead to some corner case bugs.

Problem report and suggested route to a fix from Heikki, details by me.
2010-03-19 10:41:22 +00:00
Tom Lane
93324355eb Pass incompletely-transformed aggregate argument lists as separate parameters
to transformAggregateCall, instead of abusing fields in Aggref to carry them
temporarily.  No change in functionality but hopefully the code is a bit
clearer now.  Per gripe from Gokulakannan Somasundaram.
2010-03-17 16:52:38 +00:00
Bruce Momjian
a6c1cea2b7 Add libpq warning message if the .pgpass-retrieved password fails.
Add ERRCODE_INVALID_PASSWORD sqlstate error code.
2010-03-13 14:55:57 +00:00
Tom Lane
4df5c6c719 Update comment for pg_constraint.conindid to mention that it's used for
exclusion constraints.  Not sure how we managed to update the comment for
it in catalogs.sgml but miss this one.
2010-03-11 03:36:22 +00:00
Tom Lane
8bf14182cf Export xml.c's libxml-error-handling support so that contrib/xml2 can use it
too, instead of duplicating the functionality (badly).

I renamed xml_init to pg_xml_init, because the former seemed just a bit too
generic to be safe as a global symbol.  I considered likewise renaming
xml_ereport to pg_xml_ereport, but felt that the reference to ereport probably
made it sufficiently PG-centric already.
2010-03-03 17:29:45 +00:00
Bruce Momjian
65e806cba1 pgindent run for 9.0 2010-02-26 02:01:40 +00:00
Tom Lane
11b5847058 Add an OR REPLACE option to CREATE LANGUAGE.
This operates in the same way as other CREATE OR REPLACE commands, ie,
it replaces everything but the ownership and ACL lists of an existing
entry, and requires the caller to have owner privileges for that entry.

While modifying an existing language has some use in development scenarios,
in typical usage all the "replaced" values come from pg_pltemplate so there
will be no actual change in the language definition.  The reason for adding
this is mainly to allow programs to ensure that a language exists without
triggering an error if it already does exist.

This commit just adds and documents the new option.  A followon patch
will use it to clean up some unpleasant cases in pg_dump and pg_regress.
2010-02-23 22:51:43 +00:00
Tom Lane
05d8a561ff Clean up handling of XactReadOnly and RecoveryInProgress checks.
Add some checks that seem logically necessary, in particular let's make
real sure that HS slave sessions cannot create temp tables.  (If they did
they would think that temp tables belonging to the master's session with
the same BackendId were theirs.  We *must* not allow myTempNamespace to
become set in a slave session.)

Change setval() and nextval() so that they are only allowed on temp sequences
in a read-only transaction.  This seems consistent with what we allow for
table modifications in read-only transactions.  Since an HS slave can't have a
temp sequence, this also provides a nicer cure for the setval PANIC reported
by Erik Rijkers.

Make the error messages more uniform, and have them mention the specific
command being complained of.  This seems worth the trifling amount of extra
code, since people are likely to see such messages a lot more than before.
2010-02-20 21:24:02 +00:00
Peter Eisentraut
2f6cf9192c Revert version stamping in wrong branch 2010-02-19 18:42:30 +00:00
Peter Eisentraut
a779afb40c Version stamp 9.0alpha4 2010-02-19 16:03:22 +00:00
Heikki Linnakangas
ad458cfe81 Don't use O_DIRECT when writing WAL files if archiving or streaming is
enabled. Bypassing the kernel cache is counter-productive in that case,
because the archiver/walsender process will read from the WAL file
soon after it's written, and if it's not cached the read will cause
a physical read, eating I/O bandwidth available on the WAL drive.

Also, walreceiver process does unaligned writes, so disable O_DIRECT
in walreceiver process for that reason too.
2010-02-19 10:51:04 +00:00
Tom Lane
50a90fac40 Stamp HEAD as 9.0devel, and update various places that were referring to 8.5
(hope I got 'em all).  Per discussion, this release will be 9.0 not 8.5.
2010-02-17 04:19:41 +00:00
Tom Lane
d1e027221d Replace the pg_listener-based LISTEN/NOTIFY mechanism with an in-memory queue.
In addition, add support for a "payload" string to be passed along with
each notify event.

This implementation should be significantly more efficient than the old one,
and is also more compatible with Hot Standby usage.  There is not yet any
facility for HS slaves to receive notifications generated on the master,
although such a thing is possible in future.

Joachim Wieland, reviewed by Jeff Davis; also hacked on by me.
2010-02-16 22:34:57 +00:00
Andrew Dunstan
fc5173ad51 Add query text to auto_explain output.
Still to be done: fix docs and fix regression failures under auto_explain.
2010-02-16 22:19:59 +00:00
Magnus Hagander
215cbc90f8 Add emulation of non-blocking sockets to the win32 socket/signal layer,
and use this in pq_getbyte_if_available.

It's only a limited implementation which swithes the whole emulation layer
no non-blocking mode, but that's enough as long as non-blocking is only
used during a short period of time, and only one socket is accessed during
this time.
2010-02-16 19:26:02 +00:00
Greg Stark
f8c183a1ac Speed up CREATE DATABASE by deferring the fsyncs until after copying
all the data and using posix_fadvise to nudge the OS into flushing it
earlier. This also hopefully makes CREATE DATABASE avoid spamming the
cache.

Tests show a big speedup on Linux at least on some filesystems.

Idea and patch from Andres Freund.
2010-02-15 00:50:57 +00:00
Robert Haas
e26c539e9f Wrap calls to SearchSysCache and related functions using macros.
The purpose of this change is to eliminate the need for every caller
of SearchSysCache, SearchSysCacheCopy, SearchSysCacheExists,
GetSysCacheOid, and SearchSysCacheList to know the maximum number
of allowable keys for a syscache entry (currently 4).  This will
make it far easier to increase the maximum number of keys in a
future release should we choose to do so, and it makes the code
shorter, too.

Design and review by Tom Lane.
2010-02-14 18:42:19 +00:00
Tom Lane
7507b193bc Don't expose the inline definition of MemoryContextSwitchTo when FRONTEND is
defined.  Its reference to CurrentMemoryContext causes link failures on some
platforms, evidently because the inline function gets compiled despite lack of
use.  Per buildfarm member warthog.
2010-02-13 20:46:52 +00:00
Simon Riggs
dd428c79a4 Fix relcache init file invalidation during Hot Standby for the case
where a database has a non-default tablespaceid. Pass thru MyDatabaseId
and MyDatabaseTableSpace to allow file path to be re-created in
standby and correct invalidation to take place in all cases.
Update and rework xact_commit_desc() debug messages.
Bug report from Tom by code inspection. Fix by me.
2010-02-13 16:15:48 +00:00
Tom Lane
e08ab7c312 Support inlining various small performance-critical functions on non-GCC
compilers, by applying a configure check to see if the compiler will accept
an unreferenced "static inline foo ..." function without warnings.  It is
believed that such warnings are the only reason not to declare inlined
functions in headers, if the compiler understands "inline" at all.

Kurt Harriman
2010-02-13 02:34:16 +00:00
Simon Riggs
b95a720a48 Re-enable max_standby_delay = -1 using deadlock detection on startup
process. If startup waits on a buffer pin we send a request to all
backends to cancel themselves if they are holding the buffer pin
required and they are also waiting on a lock. If not, startup waits
until max_standby_delay before cancelling any backend waiting for
the requested buffer pin.
2010-02-13 01:32:20 +00:00
Simon Riggs
fafa374f2d Introduce WAL records to log reuse of btree pages, allowing conflict
resolution during Hot Standby. Page reuse interlock requested by Tom.
Analysis and patch by me.
2010-02-13 00:59:58 +00:00
Tom Lane
ec4be2ee68 Extend the set of frame options supported for window functions.
This patch allows the frame to start from CURRENT ROW (in either RANGE or
ROWS mode), and it also adds support for ROWS n PRECEDING and ROWS n FOLLOWING
start and end points.  (RANGE value PRECEDING/FOLLOWING isn't there yet ---
the grammar works, but that's all.)

Hitoshi Harada, reviewed by Pavel Stehule
2010-02-12 17:33:21 +00:00
Teodor Sigaev
5209c084a6 Generic implementation of red-black binary tree. It's planned to use in
several places, but for now only GIN uses it during index creation.
Using self-balanced tree greatly speeds up index creation in corner cases
with preordered data.
2010-02-11 14:29:50 +00:00
Tom Lane
cbe9d6beb4 Fix up rickety handling of relation-truncation interlocks.
Move rd_targblock, rd_fsm_nblocks, and rd_vm_nblocks from relcache to the smgr
relation entries, so that they will get reset to InvalidBlockNumber whenever
an smgr-level flush happens.  Because we now send smgr invalidation messages
immediately (not at end of transaction) when a relation truncation occurs,
this ensures that other backends will reset their values before they next
access the relation.  We no longer need the unreliable assumption that a
VACUUM that's doing a truncation will hold its AccessExclusive lock until
commit --- in fact, we can intentionally release that lock as soon as we've
completed the truncation.  This patch therefore reverts (most of) Alvaro's
patch of 2009-11-10, as well as my marginal hacking on it yesterday.  We can
also get rid of assorted no-longer-needed relcache flushes, which are far more
expensive than an smgr flush because they kill a lot more state.

In passing this patch fixes smgr_redo's failure to perform visibility-map
truncation, and cleans up some rather dubious assumptions in freespace.c and
visibilitymap.c about when rd_fsm_nblocks and rd_vm_nblocks can be out of
date.
2010-02-09 21:43:30 +00:00
Tom Lane
d5768dce10 Create an official API function for C functions to use to check if they are
being called as aggregates, and to get the aggregate transition state memory
context if needed.  Use it instead of poking directly into AggState and
WindowAggState in places that shouldn't know so much.

We should have done this in 8.4, probably, but better late than never.

Revised version of a patch by Hitoshi Harada.
2010-02-08 20:39:52 +00:00
Bruce Momjian
dfc902854a Add C comments that HEAP_MOVED_* define usage is only for pre-9.0 binary
upgrades.
2010-02-08 14:10:21 +00:00
Tom Lane
9a75803b1a Remove CatalogCacheFlushRelation, and the reloidattr infrastructure that was
needed by nothing else.

The restructuring I just finished doing on cache management exposed to me how
silly this routine was.  Its function was to go into the catcache and blow
away all entries related to a given relation when there was a relcache flush
on that relation.  However, there is no point in removing a catcache entry
if the catalog row it represents is still valid --- and if it isn't valid,
there must have been a catcache entry flush on it, because that's triggered
directly by heap_update or heap_delete on the catalog row.  So this routine
accomplished nothing except to blow away valid cache entries that we'd very
likely be wanting in the near future to help reconstruct the relcache entry.
Dumb.

On top of which, it required a subtle and easy-to-get-wrong attribute in
syscache definitions, ie, the column containing the OID of the related
relation if any.  Removing that is a very useful maintenance simplification.
2010-02-08 05:53:55 +00:00
Tom Lane
0a469c8769 Remove old-style VACUUM FULL (which was known for a little while as
VACUUM FULL INPLACE), along with a boatload of subsidiary code and complexity.
Per discussion, the use case for this method of vacuuming is no longer large
enough to justify maintaining it; not to mention that we don't wish to invest
the work that would be needed to make it play nicely with Hot Standby.

Aside from the code directly related to old-style VACUUM FULL, this commit
removes support for certain WAL record types that could only be generated
within VACUUM FULL, redirect-pointer removal in heap_page_prune, and
nontransactional generation of cache invalidation sinval messages (the last
being the sticking point for Hot Standby).

We still have to retain all code that copes with finding HEAP_MOVED_OFF and
HEAP_MOVED_IN flag bits on existing tuples.  This can't be removed as long
as we want to support in-place update from pre-9.0 databases.
2010-02-08 04:33:55 +00:00
Tom Lane
1ddc2703a9 Work around deadlock problems with VACUUM FULL/CLUSTER on system catalogs,
as per my recent proposal.

First, teach IndexBuildHeapScan to not wait for INSERT_IN_PROGRESS or
DELETE_IN_PROGRESS tuples to commit unless the index build is checking
uniqueness/exclusion constraints.  If it isn't, there's no harm in just
indexing the in-doubt tuple.

Second, modify VACUUM FULL/CLUSTER to suppress reverifying
uniqueness/exclusion constraint properties while rebuilding indexes of
the target relation.  This is reasonable because these commands aren't
meant to deal with corrupted-data situations.  Constraint properties
will still be rechecked when an index is rebuilt by a REINDEX command.

This gets us out of the problem that new-style VACUUM FULL would often
wait for other transactions while holding exclusive lock on a system
catalog, leading to probable deadlock because those other transactions
need to look at the catalogs too.  Although the real ultimate cause of
the problem is a debatable choice to release locks early after modifying
system catalogs, changing that choice would require pretty serious
analysis and is not something to be undertaken lightly or on a tight
schedule.  The present patch fixes the problem in a fairly reasonable
way and should also improve the speed of VACUUM FULL/CLUSTER a little bit.
2010-02-07 22:40:33 +00:00
Tom Lane
b9b8831ad6 Create a "relation mapping" infrastructure to support changing the relfilenodes
of shared or nailed system catalogs.  This has two key benefits:

* The new CLUSTER-based VACUUM FULL can be applied safely to all catalogs.

* We no longer have to use an unsafe reindex-in-place approach for reindexing
  shared catalogs.

CLUSTER on nailed catalogs now works too, although I left it disabled on
shared catalogs because the resulting pg_index.indisclustered update would
only be visible in one database.

Since reindexing shared system catalogs is now fully transactional and
crash-safe, the former special cases in REINDEX behavior have been removed;
shared catalogs are treated the same as non-shared.

This commit does not do anything about the recently-discussed problem of
deadlocks between VACUUM FULL/CLUSTER on a system catalog and other
concurrent queries; will address that in a separate patch.  As a stopgap,
parallel_schedule has been tweaked to run vacuum.sql by itself, to avoid
such failures during the regression tests.
2010-02-07 20:48:13 +00:00
Tom Lane
9727c583fe Restructure CLUSTER/newstyle VACUUM FULL/ALTER TABLE support so that swapping
of old and new toast tables can be done either at the logical level (by
swapping the heaps' reltoastrelid links) or at the physical level (by swapping
the relfilenodes of the toast tables and their indexes).  This is necessary
infrastructure for upcoming changes to support CLUSTER/VAC FULL on shared
system catalogs, where we cannot change reltoastrelid.  The physical swap
saves a few catalog updates too.

We unfortunately have to keep the logical-level swap logic because in some
cases we will be adding or deleting a toast table, so there's no possibility
of a physical swap.  However, that only happens as a consequence of schema
changes in the table, which we do not need to support for system catalogs,
so such cases aren't an obstacle for that.

In passing, refactor the cluster support functions a little bit to eliminate
unnecessarily-duplicated code; and fix the problem that while CLUSTER had
been taught to rename the final toast table at need, ALTER TABLE had not.
2010-02-04 00:09:14 +00:00
Heikki Linnakangas
808969d0e7 Add a message type header to the CopyData messages sent from primary
to standby in streaming replication. While we only have one message type
at the moment, adding a message type header makes this easier to extend.
2010-02-03 09:47:19 +00:00
Tom Lane
70a2b05a59 Assorted cleanups in preparation for using a map file to support altering
the relfilenode of currently-not-relocatable system catalogs.

1. Get rid of inval.c's dependency on relfilenode, by not having it emit
smgr invalidations as a result of relcache flushes.  Instead, smgr sinval
messages are sent directly from smgr.c when an actual relation delete or
truncate is done.  This makes considerably more structural sense and allows
elimination of a large number of useless smgr inval messages that were
formerly sent even in cases where nothing was changing at the
physical-relation level.  Note that this reintroduces the concept of
nontransactional inval messages, but that's okay --- because the messages
are sent by smgr.c, they will be sent in Hot Standby slaves, just from a
lower logical level than before.

2. Move setNewRelfilenode out of catalog/index.c, where it never logically
belonged, into relcache.c; which is a somewhat debatable choice as well but
better than before.  (I considered catalog/storage.c, but that seemed too
low level.)  Rename to RelationSetNewRelfilenode.

3. Cosmetic cleanups of some other relfilenode manipulations.
2010-02-03 01:14:17 +00:00
Magnus Hagander
0a27347141 Make RADIUS authentication use pg_getaddrinfo_all() to get address of
the server.

Gets rid of a fairly ugly hack for Solaris, and also provides hostname
and IPV6 support.
2010-02-02 19:09:37 +00:00
Robert Haas
d8db6a6096 Fold FindConversion() into FindConversionByName() and remove ACL check.
All callers of FindConversionByName() already do suitable permissions
checking already apart from this function, but this is not just dead
code removal: the unnecessary permissions check can actually lead to
spurious failures - there's no reason why inability to execute the
underlying function should prohibit renaming the conversion, for example.
(The error messages in these cases were also rather poor:
FindConversion would return InvalidOid, eventually leading to a complaint
that the conversion "did not exist", which was not correct.)

KaiGai Kohei
2010-02-02 18:52:33 +00:00
Robert Haas
63f9282f6e Tighten integrity checks on ALTER TABLE ... ALTER COLUMN ... RENAME.
When a column is renamed, we recursively rename the same column in
all descendent tables.  But if one of those tables also inherits that
column from a table outside the inheritance hierarchy rooted at the
named table, we must throw an error.  The previous coding correctly
prohibited the rename when the parent had inherited the column from
elsewhere, but overlooked the case where the parent was OK but a child
table also inherited the same column from a second, unrelated parent.

For now, not backpatched due to lack of complaints from the field.

KaiGai Kohei, with further changes by me.
Reviewed by Bernd Helme and Tom Lane.
2010-02-01 19:28:56 +00:00
Robert Haas
42a8ab0a14 Augment EXPLAIN output with more details on Hash nodes.
We show the number of buckets, the number of batches (and also the original
number if it has changed), and the peak space used by the hash table.  Minor
executor changes to track peak space used.
2010-02-01 15:43:36 +00:00
Simon Riggs
296578feb4 Revoke augmentation of WAL records for btree delete, per discussion. 2010-02-01 13:40:28 +00:00
Itagaki Takahiro
9ea9918e37 Add string_agg aggregate functions. The one argument version concatenates
the input values into a string. The two argument version also does the same
thing, but inserts delimiters between elements.

Original patch by Pavel Stehule, reviewed by David E. Wheeler and me.
2010-02-01 03:14:45 +00:00
Simon Riggs
c85c941470 Detect early deadlock in Hot Standby when Startup is already waiting. First
stage of required deadlock detection to allow re-enabling max_standby_delay
setting of -1, which is now essential in the absence of improved relation-
specific conflict resoluton. Requested by Greg Stark et al.
2010-01-31 19:01:11 +00:00
Tom Lane
0c0203d0a9 Parenthesize this macro, just in case. 2010-01-31 17:35:46 +00:00
Simon Riggs
6d2bc0a6cf Augment WAL records for btree delete with GetOldestXmin() to reduce
false positives during Hot Standby conflict processing. Simple
patch to enhance conflict processing, following previous discussions.
Controlled by parameter minimize_standby_conflicts = on | off, with
default off allows measurement of performance impact to see whether
it should be set on all the time.
2010-01-29 18:39:05 +00:00
Simon Riggs
76be0c81cc Filter recovery conflicts based upon dboid from relfilenode of WAL
records for heap and btree. Minor change, mostly API changes to
pass through the required values. This is a simple change though
also provides the refactoring required for further enhancements
to conflict processing using the relOid. Changes only have effect
during Hot Standby.
2010-01-29 17:10:05 +00:00
Peter Eisentraut
e7b3349a8a Type table feature
This adds the CREATE TABLE name OF type command, per SQL standard.
2010-01-28 23:21:13 +00:00
Magnus Hagander
083e1b0f27 Add functions to reset the statistics counter for a single table/index or
a single function.
2010-01-28 14:25:41 +00:00
Magnus Hagander
21d3ae09da Define INADDR_NONE on Solaris when it's missing. Per a couple of buildfarm
members complaining.
2010-01-28 11:36:14 +00:00
Heikki Linnakangas
e0e8b96345 Change a few remaining calls of XLogArchivingActive() to use
XLogIsNeeded() instead, to determine if an otherwise non-logged operation
needs to be logged in WAL for standby servers.

Fujii Masao
2010-01-28 07:31:42 +00:00
Heikki Linnakangas
1bb2558046 Make standby server continuously retry restoring the next WAL segment with
restore_command, if the connection to the primary server is lost. This
ensures that the standby can recover automatically, if the connection is
lost for a long time and standby falls behind so much that the required
WAL segments have been archived and deleted in the master.

This also makes standby_mode useful without streaming replication; the
server will keep retrying restore_command every few seconds until the
trigger file is found. That's the same basic functionality pg_standby
offers, but without the bells and whistles.

To implement that, refactor the ReadRecord/FetchRecord functions. The
FetchRecord() function introduced in the original streaming replication
patch is removed, and all the retry logic is now in a new function called
XLogReadPage(). XLogReadPage() is now responsible for executing
restore_command, launching walreceiver, and waiting for new WAL to arrive
from primary, as required.

This also changes the life cycle of walreceiver. When launched, it now only
tries to connect to the master once, and exits if the connection fails, or
is lost during streaming for any reason. The startup process detects the
death, and re-launches walreceiver if necessary.
2010-01-27 15:27:51 +00:00
Magnus Hagander
b3daac5a9c Add support for RADIUS authentication. 2010-01-27 12:12:00 +00:00
Tom Lane
d879697cd2 Remove the default_do_language parameter, instead making DO use a hardwired
default of "plpgsql".  This is more reasonable than it was when the DO patch
was written, because we have since decided that plpgsql should be installed
by default.  Per discussion, having a parameter for this doesn't seem useful
enough to justify the risk of application breakage if the value is changed
unexpectedly.
2010-01-26 16:33:40 +00:00
Tom Lane
9507c8a1db Add get_bit/set_bit functions for bit strings, paralleling those for bytea,
and implement OVERLAY() for bit strings and bytea.

In passing also convert text OVERLAY() to a true built-in, instead of
relying on a SQL function.

Leonardo F, reviewed by Kevin Grittner
2010-01-25 20:55:32 +00:00
Simon Riggs
959ac58c04 In HS, Startup process sets SIGALRM when waiting for buffer pin. If
woken by alarm we send SIGUSR1 to all backends requesting that they
check to see if they are blocking Startup process. If so, they throw
ERROR/FATAL as for other conflict resolutions. Deadlock stop gap
removed. max_standby_delay = -1 option removed to prevent deadlock.
2010-01-23 16:37:12 +00:00
Robert Haas
d779199175 Fix several oversights in previous commit - attribute options patch.
I failed to 'cvs add' the new files and also neglected to bump catversion.
2010-01-22 16:42:31 +00:00
Robert Haas
76a47c0e74 Replace ALTER TABLE ... SET STATISTICS DISTINCT with a more general mechanism.
Attributes can now have options, just as relations and tablespaces do, and
the reloptions code is used to parse, validate, and store them.  For
simplicity and because these options are not performance critical, we store
them in a separate cache rather than the main relcache.

Thanks to Alex Hunsaker for the review.
2010-01-22 16:40:19 +00:00
Peter Eisentraut
adb7764030 PL/Python DO handler
Also cleaned up some redundancies between the primary error messages and the
error context in PL/Python.

Hannu Valtonen
2010-01-22 15:45:15 +00:00
Heikki Linnakangas
09b115f706 Write a WAL record whenever we perform an operation without WAL-logging
that would've been WAL-logged if archiving was enabled. If we encounter
such records in archive recovery anyway, we know that some data is
missing from the log. A WARNING is emitted in that case.

Original patch by Fujii Masao, with changes by me.
2010-01-20 19:43:40 +00:00
Heikki Linnakangas
47ce95a7b9 Now that much of walreceiver has been pulled back into the postgres
binary, revert PGDLLIMPORT decoration of global variables. I'm not sure
if there's any real harm from unnecessary PGDLLIMPORTs, but these are all
internal variables that external modules really shouldn't be messing
with. ThisTimeLineID still needs PGDLLIMPORT.
2010-01-20 18:54:27 +00:00
Heikki Linnakangas
32bc08b1d4 Rethink the way walreceiver is linked into the backend. Instead than shoving
walreceiver as whole into a dynamically loaded module, split the
libpq-specific parts of it into dynamically loaded module and keep the rest
in the main backend binary.

Although Tom fixed the Windows compilation problems with the old walreceiver
module already, this is a cleaner division of labour and makes the code
more readable. There's also the prospect of adding new transport methods
as pluggable modules in the future, which this patch makes easier, though for
now the API between libpqwalreceiver and walreceiver process should be
considered private.

The libpq-specific module is now in src/backend/replication/libpqwalreceiver,
and the part linked with postgres binary is in
src/backend/replication/walreceiver.c.
2010-01-20 09:16:24 +00:00
Magnus Hagander
7e40cdc075 Add pg_stat_reset_shared('bgwriter') to reset the cluster-wide shared
statistics of the bgwriter.

Greg Smith
2010-01-19 14:11:32 +00:00
Tom Lane
4f15699d70 Add pg_table_size() and pg_indexes_size() to provide more user-friendly
wrappers around the pg_relation_size() function.

Bernd Helmle, reviewed by Greg Smith
2010-01-19 05:50:18 +00:00
Tom Lane
9a915e596f Improve the handling of SET CONSTRAINTS commands by having them search
pg_constraint before searching pg_trigger.  This allows saner handling of
corner cases; in particular we now say "constraint is not deferrable"
rather than "constraint does not exist" when the command is applied to
a constraint that's inherently non-deferrable.  Per a gripe several months
ago from hubert depesz lubaczewski.

To make this work without breaking user-defined constraint triggers,
we have to add entries for them to pg_constraint.  However, in return
we can remove the pgconstrname column from pg_constraint, which represents
a fairly sizable space savings.  I also replaced the tgisconstraint column
with tgisinternal; the old meaning of tgisconstraint can now be had by
testing for nonzero tgconstraint, while there is no other way to get
the old meaning of nonzero tgconstraint, namely that the trigger was
internally generated rather than being user-created.

In passing, fix an old misstatement in the docs and comments, namely that
pg_trigger.tgdeferrable is exactly redundant with pg_constraint.condeferrable.
Actually, we mark RI action triggers as nondeferrable even when they belong to
a nominally deferrable FK constraint.  The SET CONSTRAINTS code now relies on
that instead of hard-coding a list of exception OIDs.
2010-01-17 22:56:23 +00:00
Simon Riggs
a8ce974cdd Teach standby conflict resolution to use SIGUSR1
Conflict reason is passed through directly to the backend, so we can
take decisions about the effect of the conflict based upon the local
state. No specific changes, as yet, though this prepares for later work.
CancelVirtualTransaction() sends signals while holding ProcArrayLock.
Introduce errdetail_abort() to give message detail explaining that the
abort was caused by conflict processing. Remove CONFLICT_MODE states
in favour of using PROCSIG_RECOVERY_CONFLICT states directly, for clarity.
2010-01-16 10:05:59 +00:00
Tom Lane
c9dc53be77 Huh, apparently on cygwin we HAVE_SIGPROCMASK, so both variants of
the BlockSig/UnBlockSig declaration have to be PGDLLIMPORT'ified.
Per buildfarm results.
2010-01-16 05:52:29 +00:00
Tom Lane
47a09eda89 PGDLLIMPORT-ize the remaining variables needed by walreceiver. 2010-01-16 00:04:41 +00:00
Tom Lane
08f8d478eb Do parse analysis of an EXPLAIN's contained statement during the normal
parse analysis phase, rather than at execution time.  This makes parameter
handling work the same as it does in ordinary plannable queries, and in
particular fixes the incompatibility that Pavel pointed out with plpgsql's
new handling of variable references.  plancache.c gets a little bit
grottier, but the alternatives seem worse.
2010-01-15 22:36:35 +00:00
Heikki Linnakangas
40f908bdcd Introduce Streaming Replication.
This includes two new kinds of postmaster processes, walsenders and
walreceiver. Walreceiver is responsible for connecting to the primary server
and streaming WAL to disk, while walsender runs in the primary server and
streams WAL from disk to the client.

Documentation still needs work, but the basics are there. We will probably
pull the replication section to a new chapter later on, as well as the
sections describing file-based replication. But let's do that as a separate
patch, so that it's easier to see what has been added/changed. This patch
also adds a new section to the chapter about FE/BE protocol, documenting the
protocol used by walsender/walreceivxer.

Bump catalog version because of two new functions,
pg_last_xlog_receive_location() and pg_last_xlog_replay_location(), for
monitoring the progress of replication.

Fujii Masao, with additional hacking by me
2010-01-15 09:19:10 +00:00
Teodor Sigaev
4cbe473938 Add point_ops opclass for GiST. 2010-01-14 16:31:09 +00:00
Simon Riggs
e99767bc28 First part of refactoring of code for ResolveRecoveryConflict. Purposes
of this are to centralise the conflict code to allow further change,
as well as to allow passing through the full reason for the conflict
through to the conflicting backends. Backend state alters how we
can handle different types of conflict so this is now required.
As originally suggested by Heikki, no longer optional.
2010-01-14 11:08:02 +00:00
Bruce Momjian
228170410d Please tablespace directories in their own subdirectory so pg_migrator
can upgrade clusters without renaming the tablespace directories.  New
directory structure format is, e.g.:

	$PGDATA/pg_tblspc/20981/PG_8.5_201001061/719849/83292814
2010-01-12 02:42:52 +00:00
Tom Lane
6164b4bc19 Some trivial adjustments in comments for struct RelationData. 2010-01-10 22:19:17 +00:00
Simon Riggs
3bfcccc295 During Hot Standby, fix drop database when sessions idle.
Previously we only cancelled sessions that were in-transaction.

Simple fix is to just cancel all sessions without waiting. Doing
it this way avoids complicating common code paths, which would
not be worth the trouble to cover this rare case.

Problem report and fix by Andres Freund, edited somewhat by me
2010-01-10 15:44:28 +00:00
Magnus Hagander
87091cb1f1 Create typedef pgsocket for storing socket descriptors.
This silences some warnings on Win64. Not using the proper SOCKET datatype
was actually wrong on Win32 as well, but didn't cause any warnings there.

Also create define PGINVALID_SOCKET to indicate an invalid/non-existing
socket, instead of using a hardcoded -1 value.
2010-01-10 14:16:08 +00:00
Robert Haas
84b6d5f359 Remove partial, broken support for NULL pointers when fetching attributes.
Previously, fastgetattr() and heap_getattr() tested their fourth argument
against a null pointer, but any attempt to use them with a literal-NULL
fourth argument evaluated to *(void *)0, resulting in a compiler error.
Remove these NULL tests to avoid leading future readers of this code to
believe that this has a chance of working.  Also clean up related legacy
code in nocachegetattr(), heap_getsysattr(), and nocache_index_getattr().

The new coding standard is that any code which calls a getattr-type
function or macro which takes an isnull argument MUST pass a valid
boolean pointer.  Per discussion with Bruce Momjian, Tom Lane, Alvaro
Herrera.
2010-01-10 04:26:36 +00:00
Simon Riggs
42edbd16fb During Hot Standby, set DatabasePath correctly during relcache init file
deletion, so that we attempt to unlink the correct filepath. unlink()
errors are ignorable there, so lack of a DatabasePath initialization step
did not cause visible problems until a related bug showed up on Solaris.

Code refactored from xact_redo_commit() to
ProcessCommittedInvalidationMessages() in inval.c. Recovery may replay
shared invalidation messages for many databases, so we cannot
SetDatabasePath() once as we do in normal backends. Read the databaseid
from the shared invalidation messages, then set DatabasePath
temporarily before calling RelationCacheInitFileInvalidate().

Problem report by Robert Treat, analysis and fix by me.
2010-01-09 16:49:27 +00:00
Itagaki Takahiro
83a5a338bf pgBufferUsage needs PGDLLIMPORT for pg_stat_statements on Windows. 2010-01-08 00:48:56 +00:00
Tom Lane
50626efe0a Fix 3-parameter form of bit substring() to throw error for negative length,
as required by SQL standard.
2010-01-07 20:17:44 +00:00
Tom Lane
901be0fad4 Remove all the special-case code for INT64_IS_BUSTED, per decision that
we're not going to support that anymore.

I did keep the 64-bit-CRC-with-32-bit-arithmetic code, since it has a
performance excuse to live.  It's a bit moot since that's all ifdef'd
out, of course.
2010-01-07 04:53:35 +00:00
Robert Haas
814c8a03ba Further fixes for per-tablespace options patch.
Add missing varlena header to TableSpaceOpts structure.  And, per
Tom Lane, instead of calling tablespace_reloptions in CacheMemoryContext,
call it in the caller's memory context and copy the value over
afterwards, to reduce the chances of a session-lifetime memory leak.
2010-01-07 03:53:08 +00:00
Tom Lane
d15cb38dec Alter the configure script to fail immediately if the C compiler does not
provide a working 64-bit integer datatype.  As recently noted, we've been
broken on such platforms since early in the 8.4 development cycle.  Since
it took nearly two years for anyone to even notice, it seems that the
rationale for continuing to support such platforms has reached the point
of non-existence.  Rather than thrashing around to try to make it work
again, we'll just admit up front that this no longer works.

Back-patch to 8.4 since that branch is also broken.

We should go around to remove INT64_IS_BUSTED support, but just in HEAD,
so that seems like material for a separate commit.
2010-01-07 00:25:05 +00:00
Itagaki Takahiro
946cf229e8 Support rewritten-based full vacuum as VACUUM FULL. Traditional
VACUUM FULL was renamed to VACUUM FULL INPLACE. Also added a new
option -i, --inplace for vacuumdb to perform FULL INPLACE vacuuming.

Since the new VACUUM FULL uses CLUSTER infrastructure, we cannot
use it for system tables. VACUUM FULL for system tables always
fall back into VACUUM FULL INPLACE silently.

Itagaki Takahiro, reviewed by Jeff Davis and Simon Riggs.
2010-01-06 05:31:14 +00:00
Bruce Momjian
28f6cab61a binary upgrade:
Preserve relfilenodes for views and composite types --- even though we
don't store data in, them, they do consume relfilenodes.

Bump catalog version.
2010-01-06 05:18:18 +00:00
Bruce Momjian
2391396b3d Update catalog version for recent relfilenode patch, so pg_migrator can
identify the new API.
2010-01-06 03:07:24 +00:00
Bruce Momjian
f98fbc78c3 Preserve relfilenodes:
Add support to pg_dump --binary-upgrade to preserve all relfilenodes,
for use by pg_migrator.
2010-01-06 03:04:03 +00:00
Bruce Momjian
8cdb85b512 Remove tabs in SGML.
Move OIDCHARS to proper include file.
2010-01-06 02:41:37 +00:00
Tom Lane
90f4c2d960 Add support for doing FULL JOIN ON FALSE. While this is really a rather
peculiar variant of UNION ALL, and so wouldn't likely get written directly
as-is, it's possible for it to arise as a result of simplification of
less-obviously-silly queries.  In particular, now that we can do flattening
of subqueries that have constant outputs and are underneath an outer join,
it's possible for the case to result from simplification of queries of the
type exhibited in bug #5263.  Back-patch to 8.4 to avoid a functionality
regression for this type of query.
2010-01-05 23:25:36 +00:00
Robert Haas
d86d51a958 Support ALTER TABLESPACE name SET/RESET ( tablespace_options ).
This patch only supports seq_page_cost and random_page_cost as parameters,
but it provides the infrastructure to scalably support many more.
In particular, we may want to add support for effective_io_concurrency,
but I'm leaving that as future work for now.

Thanks to Tom Lane for design help and Alvaro Herrera for the review.
2010-01-05 21:54:00 +00:00
Magnus Hagander
ce92f8b463 Use _mm_pause() for win64 spin_delay(), per note from Tsutomu Yamada. 2010-01-05 11:06:28 +00:00
Tom Lane
64737e9313 Get rid of the need for manual maintenance of the initial contents of
pg_attribute, by having genbki.pl derive the information from the various
catalog header files.  This greatly simplifies modification of the
"bootstrapped" catalogs.

This patch finally kills genbki.sh and Gen_fmgrtab.sh; we now rely entirely on
Perl scripts for those build steps.  To avoid creating a Perl build dependency
where there was not one before, the output files generated by these scripts
are now treated as distprep targets, ie, they will be built and shipped in
tarballs.  But you will need a reasonably modern Perl (probably at least
5.6) if you want to build from a CVS pull.

The changes to the MSVC build process are untested, and may well break ---
we'll soon find out from the buildfarm.

John Naylor, based on ideas from Robert Haas and others
2010-01-05 01:06:57 +00:00
Magnus Hagander
305e85b098 Add a Win64-specific spin_delay() function.
We can't use the same as before, since MSVC on Win64 doesn't
support inline assembly.
2010-01-04 17:10:24 +00:00
Heikki Linnakangas
06f82b2961 Write an end-of-backup WAL record at pg_stop_backup(), and wait for it at
recovery instead of reading the backup history file. This is more robust,
as it stops you from prematurely starting up an inconsisten cluster if the
backup history file is lost for some reason, or if the base backup was
never finished with pg_stop_backup().

This also paves the way for a simpler streaming replication patch, which
doesn't need to care about backup history files anymore.

The backup history file is still created and archived as before, but it's
not used by the system anymore. It's just for informational purposes now.

Bump PG_CONTROL_VERSION as the location of the backup startpoint is now
written to a new field in pg_control, and catversion because initdb is
required

Original patch by Fujii Masao per Simon's idea, with further fixes by me.
2010-01-04 12:50:50 +00:00
Tom Lane
40608e7f94 When estimating the selectivity of an inequality "column > constant" or
"column < constant", and the comparison value is in the first or last
histogram bin or outside the histogram entirely, try to fetch the actual
column min or max value using an index scan (if there is an index on the
column).  If successful, replace the lower or upper histogram bound with
that value before carrying on with the estimate.  This limits the
estimation error caused by moving min/max values when the comparison
value is close to the min or max.  Per a complaint from Josh Berkus.

It is tempting to consider using this mechanism for mergejoinscansel as well,
but that would inject index fetches into main-line join estimation not just
endpoint cases.  I'm refraining from that until we can get a better handle
on the costs of doing this type of lookup.
2010-01-04 02:44:40 +00:00
Magnus Hagander
c3705d8cc5 Make ssize_t 64-bit on Win64, for compatibility with for example plpython. 2010-01-02 22:47:37 +00:00
Bruce Momjian
0239800893 Update copyright for the year 2010. 2010-01-02 16:58:17 +00:00
Magnus Hagander
8491998d3d Set proper sizes for size_t and void* on 64-bit Windows builds.
Tsutomu Yamada
2010-01-02 13:56:37 +00:00
Magnus Hagander
2de9a463ff Support 64-bit shared memory when building on 64-bit Windows.
Tsutomu Yamada
2010-01-02 12:18:45 +00:00
Tom Lane
7839d35991 Add an "argisrow" field to NullTest nodes, following a plan made way back in
8.2beta but never carried out.  This avoids repetitive tests of whether the
argument is of scalar or composite type.  Also, be a bit more paranoid about
composite arguments in some places where we previously weren't checking.
2010-01-01 23:03:10 +00:00
Tom Lane
29c4ad9829 Support "x IS NOT NULL" clauses as indexscan conditions. This turns out
to be just a minor extension of the previous patch that made "x IS NULL"
indexable, because we can treat the IS NOT NULL condition as if it were
"x < NULL" or "x > NULL" (depending on the index's NULLS FIRST/LAST option),
just like IS NULL is treated like "x = NULL".  Aside from any possible
usefulness in its own right, this is an important improvement for
index-optimized MAX/MIN aggregates: it is now reliably possible to get
a column's min or max value cheaply, even when there are a lot of nulls
cluttering the interesting end of the index.
2010-01-01 21:53:49 +00:00
Tom Lane
85d02a6586 Redefine Datum as uintptr_t, instead of unsigned long.
This is more in keeping with modern practice, and is a first step towards
porting to Win64 (which has sizeof(pointer) > sizeof(long)).

Tsutomu Yamada, Magnus Hagander, Tom Lane
2009-12-31 19:41:37 +00:00
Tom Lane
48c192c15e Revise pgstat's tracking of tuple changes to improve the reliability of
decisions about when to auto-analyze.

The previous code depended on n_live_tuples + n_dead_tuples - last_anl_tuples,
where all three of these numbers could be bad estimates from ANALYZE itself.
Even worse, in the presence of a steady flow of HOT updates and matching
HOT-tuple reclamations, auto-analyze might never trigger at all, even if all
three numbers are exactly right, because n_dead_tuples could hold steady.

To fix, replace last_anl_tuples with an accurately tracked count of the total
number of committed tuple inserts + updates + deletes since the last ANALYZE
on the table.  This can still be compared to the same threshold as before, but
it's much more trustworthy than the old computation.  Tracking this requires
one more intra-transaction counter per modified table within backends, but no
additional memory space in the stats collector.  There probably isn't any
measurable speed difference; if anything it might be a bit faster than before,
since I was able to eliminate some per-tuple arithmetic operations in favor of
adding sums once per (sub)transaction.

Also, simplify the logic around pgstat vacuum and analyze reporting messages
by not trying to fold VACUUM ANALYZE into a single pgstat message.

The original thought behind this patch was to allow scheduling of analyzes
on parent tables by artificially inflating their changes_since_analyze count.
I've left that for a separate patch since this change seems to stand on its
own merit.
2009-12-30 20:32:14 +00:00
Tom Lane
540e69a061 Add an index on pg_inherits.inhparent, and use it to avoid seqscans in
find_inheritance_children().  This is a complete no-op in databases without
any inheritance.  In databases where there are just a few entries in
pg_inherits, it could conceivably be a small loss.  However, in databases with
many inheritance parents, it can be a big win.
2009-12-29 22:00:14 +00:00
Tom Lane
649b5ec7c8 Add the ability to store inheritance-tree statistics in pg_statistic,
and teach ANALYZE to compute such stats for tables that have subclasses.
Per my proposal of yesterday.

autovacuum still needs to be taught about running ANALYZE on parent tables
when their subclasses change, but the feature is useful even without that.
2009-12-29 20:11:45 +00:00
Bruce Momjian
e5b457c2ac Add backend and pg_dump code to allow preservation of pg_enum oids, for
use in binary upgrades.

Bump catalog version for detection by pg_migrator of new backend API.
2009-12-27 14:50:46 +00:00
Bruce Momjian
c44327afa4 Binary upgrade:
Modify pg_dump --binary-upgrade and add backend support routines to
support the preservation of pg_type oids when doing a binary upgrade.
This allows user-defined composite types and arrays to be binary
upgraded.
2009-12-24 22:09:24 +00:00
Tom Lane
d68e08d1fe Allow the index name to be omitted in CREATE INDEX, causing the system to
choose an index name the same as it would do for an unnamed index constraint.
(My recent changes to the index naming logic have helped to ensure that this
will be a reasonable choice.)  Per a suggestion from Peter.

A necessary side-effect is to promote CONCURRENTLY to type_func_name_keyword
status, ie, it can't be a table/column/index name anymore unless quoted.
This is not all bad, since we have heard more than once of people typing
CREATE INDEX CONCURRENTLY ON foo (...) and getting a normal index build of
an index named "concurrently", which was not what they wanted.  Now this
syntax will result in a concurrent build of an index with system-chosen
name; which they can rename afterwards if they want something else.
2009-12-23 17:41:45 +00:00
Tom Lane
cfc5008a51 Adjust naming of indexes and their columns per recent discussion.
Index expression columns are now named after the FigureColname result for
their expressions, rather than always being "pg_expression_N".  Digits are
appended to this name if needed to make the column name unique within the
index.  (That happens for regular columns too, thus fixing the old problem
that CREATE INDEX fooi ON foo (f1, f1) fails.  Before exclusion indexes
there was no real reason to do such a thing, but now maybe there is.)

Default names for indexes and associated constraints now include the column
names of all their columns, not only the first one as in previous practice.
(Of course, this will be truncated as needed to fit in NAMEDATALEN.  Also,
pkey indexes retain the historical behavior of not naming specific columns
at all.)

An example of the results:

regression=# create table foo (f1 int, f2 text,
regression(# exclude (f1 with =, lower(f2) with =));
NOTICE:  CREATE TABLE / EXCLUDE will create implicit index "foo_f1_lower_exclusion" for table "foo"
CREATE TABLE
regression=# \d foo_f1_lower_exclusion
Index "public.foo_f1_lower_exclusion"
 Column |  Type   | Definition
--------+---------+------------
 f1     | integer | f1
 lower  | text    | lower(f2)
btree, for table "public.foo"
2009-12-23 02:35:25 +00:00
Tom Lane
4fca795de4 Bump catversion to reflect the fact that HS patch changed pg_proc
contents, and PG_CONTROL_VERSION to reflect the fact that it changed
pg_control contents.  (I see we did at least remember to change
XLOG_PAGE_MAGIC for the WAL contents changes.)
2009-12-19 04:08:32 +00:00
Simon Riggs
efc16ea520 Allow read only connections during recovery, known as Hot Standby.
Enabled by recovery_connections = on (default) and forcing archive recovery using a recovery.conf. Recovery processing now emulates the original transactions as they are replayed, providing full locking and MVCC behaviour for read only queries. Recovery must enter consistent state before connections are allowed, so there is a delay, typically short, before connections succeed. Replay of recovering transactions can conflict and in some cases deadlock with queries during recovery; these result in query cancellation after max_standby_delay seconds have expired. Infrastructure changes have minor effects on normal running, though introduce four new types of WAL record.

New test mode "make standbycheck" allows regression tests of static command behaviour on a standby server while in recovery. Typical and extreme dynamic behaviours have been checked via code inspection and manual testing. Few port specific behaviours have been utilised, though primary testing has been on Linux only so far.

This commit is the basic patch. Additional changes will follow in this release to enhance some aspects of behaviour, notably improved handling of conflicts, deadlock detection and query cancellation. Changes to VACUUM FULL are also required.

Simon Riggs, with significant and lengthy review by Heikki Linnakangas, including streamlined redesign of snapshot creation and two-phase commit.

Important contributions from Florian Pflug, Mark Kirkwood, Merlin Moncure, Greg Stark, Gianni Ciolli, Gabriele Bartolini, Hannu Krosing, Robert Haas, Tatsuo Ishii, Hiroyuki Yamada plus support and feedback from many other community members.
2009-12-19 01:32:45 +00:00
Peter Eisentraut
d6de43099a Don't unblock SIGQUIT in the SIGQUIT handler
This was possibly linked to a deadlock-like situation in glibc syslog code
invoked by the ereport call in quickdie().  In any case, a signal handler
should not unblock its own signal unless there is a specific reason to.
2009-12-16 23:05:00 +00:00
Peter Eisentraut
b63b967a7e If there is no sigdelset(), define it as a macro.
This removes some duplicate code that recreated the identical workaround
when the newer signal API is missing.
2009-12-16 22:55:34 +00:00
Peter Eisentraut
dd4cd55c15 Python 3 support in PL/Python
Behaves more or less unchanged compared to Python 2, but the new language
variant is called plpython3u.  Documentation describing the naming scheme
is included.
2009-12-15 22:59:55 +00:00
Tom Lane
a5495cd841 Add a hook to let loadable modules get control at ProcessUtility execution,
and use it to extend contrib/pg_stat_statements to track utility commands.

Itagaki Takahiro, reviewed by Euler Taveira de Oliveira.
2009-12-15 20:04:49 +00:00
Tom Lane
34d26872ed Support ORDER BY within aggregate function calls, at long last providing a
non-kluge method for controlling the order in which values are fed to an
aggregate function.  At the same time eliminate the old implementation
restriction that DISTINCT was only supported for single-argument aggregates.

Possibly release-notable behavioral change: formerly, agg(DISTINCT x)
dropped null values of x unconditionally.  Now, it does so only if the
agg transition function is strict; otherwise nulls are treated as DISTINCT
normally would, ie, you get one copy.

Andrew Gierth, reviewed by Hitoshi Harada
2009-12-15 17:57:48 +00:00
Robert Haas
cddca5ec13 Add an EXPLAIN (BUFFERS) option to show buffer-usage statistics.
This patch also removes buffer-usage statistics from the track_counts
output, since this (or the global server statistics) is deemed to be a better
interface to this information.

Itagaki Takahiro, reviewed by Euler Taveira de Oliveira.
2009-12-15 04:57:48 +00:00
Tom Lane
a620d5005d Fix a bug introduced when set-returning SQL functions were made inline-able:
we have to cope with the possibility that the declared result rowtype contains
dropped columns.  This fails in 8.4, as per bug #5240.

While at it, be more paranoid about inserting binary coercions when inlining.
The pre-8.4 code did not really need to worry about that because it could not
inline at all in any case where an added coercion could change the behavior
of the function's statement.  However, when inlining a SRF we allow sorting,
grouping, and set-ops such as UNION.  In these cases, modifying one of the
targetlist entries that the sort/group/setop depends on could conceivably
change the behavior of the function's statement --- so don't inline when
such a case applies.
2009-12-14 02:15:54 +00:00
Magnus Hagander
0182d6f646 Allow LDAP authentication to operate in search+bind mode, meaning it
does a search for the user in the directory first, and then binds with
the DN found for this user.

This allows for LDAP logins in scenarios where the DN of the user cannot
be determined simply by prefix and suffix, such as the case where different
users are located in different containers.

The old way of authentication can be significantly faster, so it's kept
as an option.

Robert Fleming and Magnus Hagander
2009-12-12 21:35:21 +00:00
Robert Haas
02490d4692 Export ExplainBeginOutput() and ExplainEndOutput() for auto_explain.
Without these functions, anyone outside of explain.c can't actually use
ExplainPrintPlan, because the ExplainState won't be initialized properly.
The user-visible result of this was a crash when using auto_explain with
the JSON output format.

Report by Euler Taveira de Oliveira.  Analysis by Tom Lane.  Patch by me.
2009-12-12 00:35:34 +00:00
Itagaki Takahiro
f1325ce213 Add large object access control.
A new system catalog pg_largeobject_metadata manages
ownership and access privileges of large objects.

KaiGai Kohei, reviewed by Jaime Casanova.
2009-12-11 03:34:57 +00:00
Andrew Dunstan
324385d67f Add YAML to list of EXPLAIN formats. Greg Sabino Mullane, reviewed by Takahiro Itagaki. 2009-12-11 01:33:35 +00:00
Tom Lane
62aba76568 Prevent indirect security attacks via changing session-local state within
an allegedly immutable index function.  It was previously recognized that
we had to prevent such a function from executing SET/RESET ROLE/SESSION
AUTHORIZATION, or it could trivially obtain the privileges of the session
user.  However, since there is in general no privilege checking for changes
of session-local state, it is also possible for such a function to change
settings in a way that might subvert later operations in the same session.
Examples include changing search_path to cause an unexpected function to
be called, or replacing an existing prepared statement with another one
that will execute a function of the attacker's choosing.

The present patch secures VACUUM, ANALYZE, and CREATE INDEX/REINDEX against
these threats, which are the same places previously deemed to need protection
against the SET ROLE issue.  GUC changes are still allowed, since there are
many useful cases for that, but we prevent security problems by forcing a
rollback of any GUC change after completing the operation.  Other cases are
handled by throwing an error if any change is attempted; these include temp
table creation, closing a cursor, and creating or deleting a prepared
statement.  (In 7.4, the infrastructure to roll back GUC changes doesn't
exist, so we settle for rejecting changes of "search_path" in these contexts.)

Original report and patch by Gurjeet Singh, additional analysis by
Tom Lane.

Security: CVE-2009-4136
2009-12-09 21:57:51 +00:00
Tom Lane
0cb65564e5 Add exclusion constraints, which generalize the concept of uniqueness to
support any indexable commutative operator, not just equality.  Two rows
violate the exclusion constraint if "row1.col OP row2.col" is TRUE for
each of the columns in the constraint.

Jeff Davis, reviewed by Robert Haas
2009-12-07 05:22:23 +00:00
Tom Lane
8de7472b45 Don't use a duplicate OID for aclexplode(). 2009-12-06 02:55:54 +00:00
Peter Eisentraut
36f887c41c Speed up information schema privilege views
Instead of expensive cross joins to resolve the ACL, add table-returning
function aclexplode() that expands the ACL into a useful form, and join
against that.

Also, implement the role_*_grants views as a thin layer over the respective
*_privileges views instead of essentially repeating the same code twice.

fixes bug #4596

by Joachim Wieland, with cleanup by me
2009-12-05 21:43:36 +00:00
Heikki Linnakangas
ab3148b712 Fix bug in temporary file management with subtransactions. A cursor opened
in a subtransaction stays open even if the subtransaction is aborted, so
any temporary files related to it must stay alive as well. With the patch,
we use ResourceOwners to track open temporary files and don't automatically
close them at subtransaction end (though in the normal case temporary files
are registered with the subtransaction resource owner and will therefore be
closed).

At end of top transaction, we still check that there's no temporary files
marked as close-at-end-of-transaction open, but that's now just a debugging
cross-check as the resource owner cleanup should've closed them already.
2009-12-03 11:03:29 +00:00
Tom Lane
0d32342501 Teach the regular expression functions to do case-insensitive matching and
locale-dependent character classification properly when the database encoding
is UTF8.

The previous coding worked okay in single-byte encodings, or in any case for
ASCII characters, but failed entirely on multibyte characters.  The fix
assumes that the <wctype.h> functions use Unicode code points as the wchar
representation for Unicode, ie, wchar matches pg_wchar.

This is only a partial solution, since we're still stupid about non-ASCII
characters in multibyte encodings other than UTF8.  The practical effect
of that is limited, however, since those cases are generally Far Eastern
glyphs for which concepts like case-folding don't apply anyway.  Certainly
all or nearly all of the field reports of problems have been about UTF8.
A more general solution would require switching to the platform's wchar
representation for all regex operations; which is possible but would have
substantial disadvantages.  Let's try this and see if it's sufficient in
practice.
2009-12-01 21:00:24 +00:00
Bruce Momjian
ef51395e24 Revert due to Tom's concerns:
Add ProcessUtility_hook() to handle all DDL to
contrib/pg_stat_statements.
2009-12-01 02:31:13 +00:00
Bruce Momjian
d85cb27293 ProcessUtility_hook:
Add ProcessUtility_hook() to handle all DDL to contrib/pg_stat_statements.

Itagaki Takahiro
2009-12-01 01:08:46 +00:00
Tom Lane
0c61cff57a Make pg_stat_activity.application_name visible to all users, rather than
being hidden when current_query is.  Relocate it to a column position
more consistent with that behavior.  Per discussion.
2009-11-29 18:14:32 +00:00
Tom Lane
42b2907d12 Add support for anonymous code blocks (DO blocks) to PL/Perl.
Joshua Tolley, reviewed by Brendan Jurd and Tim Bunce
2009-11-29 03:02:27 +00:00
Tom Lane
8217cfbd99 Add support for an application_name parameter, which is displayed in
pg_stat_activity and recorded in log entries.

Dave Page, reviewed by Andres Freund
2009-11-28 23:38:08 +00:00
Tom Lane
1a95f12702 Eliminate a lot of list-management overhead within join_search_one_level
by adding a requirement that build_join_rel add new join RelOptInfos to the
appropriate list immediately at creation.  Per report from Robert Haas,
the list_concat_unique_ptr() calls that this change eliminates were taking
the lion's share of the runtime in larger join problems.  This doesn't do
anything to fix the fundamental combinatorial explosion in large join
problems, but it should push out the threshold of pain a bit further.

Note: because this changes the order in which joinrel lists are built,
it might result in changes in selected plans in cases where different
alternatives have exactly the same costs.  There is one example in the
regression tests.
2009-11-28 00:46:19 +00:00
Heikki Linnakangas
cd87b6f8a5 Fix an old bug in multixact and two-phase commit. Prepared transactions can
be part of multixacts, so allocate a slot for each prepared transaction in
the "oldest member" array in multixact.c. On PREPARE TRANSACTION, transfer
the oldest member value from the current backends slot to the prepared xact
slot. Also save and recover the value from the 2pc state file.

The symptom of the bug was that after a transaction prepared, a shared lock
still held by the prepared transaction was sometimes ignored by other
transactions.

Fix back to 8.1, where both 2PC and multixact were introduced.
2009-11-23 09:58:36 +00:00
Tom Lane
7fc0f06221 Add a WHEN clause to CREATE TRIGGER, allowing a boolean expression to be
checked to determine whether the trigger should be fired.

For BEFORE triggers this is mostly a matter of spec compliance; but for AFTER
triggers it can provide a noticeable performance improvement, since queuing of
a deferred trigger event and re-fetching of the row(s) at end of statement can
be short-circuited if the trigger does not need to be fired.

Takahiro Itagaki, reviewed by KaiGai Kohei.
2009-11-20 20:38:12 +00:00
Tom Lane
c742b795dd Add a hook to CREATE/ALTER ROLE to allow an external module to check the
strength of database passwords, and create a sample implementation of
such a hook as a new contrib module "passwordcheck".

Laurenz Albe, reviewed by Takahiro Itagaki
2009-11-18 21:57:56 +00:00
Tom Lane
5e66a51c2e Provide a parenthesized-options syntax for VACUUM, analogous to that recently
adopted for EXPLAIN.  This will allow additional options to be implemented
in future without having to make them fully-reserved keywords.  The old syntax
remains available for existing options, however.

Itagaki Takahiro
2009-11-16 21:32:07 +00:00
Tom Lane
caf9c830d9 Improve planning of Materialize nodes inserted atop the inner input of a
mergejoin to shield it from doing mark/restore and refetches.  Put an explicit
flag in MergePath so we can centralize the logic that knows about this,
and add costing logic that considers using Materialize even when it's not
forced by the previously-existing considerations.  This is in response to
a discussion back in August that suggested that materializing an inner
indexscan can be helpful when the refetch percentage is high enough.
2009-11-15 02:45:35 +00:00
Magnus Hagander
da8d684d39 Add inheritable ACE when creating a restricted token for execution on
Win32.

Also refactor the code around it to be more clear.

Jesse Morris
2009-11-14 15:39:36 +00:00
Tom Lane
82121aff12 Avoid assuming that enum CreateStmtLikeOption is unsigned. Zdenek Kotala 2009-11-13 23:44:19 +00:00
Tom Lane
19d802767d Remove pg_parse_string_token() --- not needed anymore. 2009-11-12 01:13:12 +00:00
Alvaro Herrera
e7ec022266 Fix longstanding problems in VACUUM caused by untimely interruptions
In VACUUM FULL, an interrupt after the initial transaction has been recorded
as committed can cause postmaster to restart with the following error message:
PANIC: cannot abort transaction NNNN, it was already committed
This problem has been reported many times.

In lazy VACUUM, an interrupt after the table has been truncated by
lazy_truncate_heap causes other backends' relcache to still point to the
removed pages; this can cause future INSERT and UPDATE queries to error out
with the following error message:
could not read block XX of relation 1663/NNN/MMMM: read only 0 of 8192 bytes
The window to this race condition is extremely narrow, but it has been seen in
the wild involving a cancelled autovacuum process.

The solution for both problems is to inhibit interrupts in both operations
until after the respective transactions have been committed.  It's not a
complete solution, because the transaction could theoretically be aborted by
some other error, but at least fixes the most common causes of both problems.
2009-11-10 18:00:06 +00:00
Tom Lane
10bcfa189b Re-refactor the core scanner's API, in order to get out from under the problem
of different parsers having different YYSTYPE unions that they want to use
with it.  I defined a new union core_YYSTYPE that is just the (very short)
list of semantic values returned by the core scanner.  I had originally
worried that this would require an extra interface layer, but actually we can
have parser.c's base_yylex (formerly filtered_base_yylex) take care of that at
no extra cost.  Names associated with the core scanner are now "core_yy_foo",
with "base_yy_foo" being used in the core Bison parser and the parser.c
interface layer.

This solves the last serious stumbling block to eliminating plpgsql's separate
lexer.  One restriction that will still be present is that plpgsql and the
core will have to agree on the token numbers assigned to tokens that can be
returned by the core lexer.  Since Bison doesn't seem willing to accept
external assignments of those numbers, we'll have to live with decreeing that
core and plpgsql grammars declare these tokens first and in the same order.
2009-11-09 18:38:48 +00:00
Andrew Dunstan
b79f49c780 Keep track of language's trusted flag in InlineCodeBlock. Needed to support DO blocks for languages that have both trusted and untrusted variants. 2009-11-06 21:57:57 +00:00
Tom Lane
593f4b854a Don't treat NEW and OLD as reserved words anymore. For the purposes of rules
it works just as well to have them be ordinary identifiers, and this gets rid
of a number of ugly special cases.  Plus we aren't interfering with non-rule
usage of these names.

catversion bump because the names change internally in stored rules.
2009-11-05 23:24:27 +00:00
Tom Lane
6bef82b38a Rename some encoding conversion modules to keep pathnames in our source
tarballs under 100 characters.  This should avoid failures with certain
untarring tools (WinZip and Midnight Commander have been mentioned as
likely suspects).  Per my proposal of yesterday.
catversion bumped since the initial contents of pg_proc change.
2009-11-04 23:47:04 +00:00
Tom Lane
9bedd128d6 Add support for invoking parser callback hooks via SPI and in cached plans.
As proof of concept, modify plpgsql to use the hooks.  plpgsql is still
inserting $n symbols textually, but the "back end" of the parsing process now
goes through the ParamRef hook instead of using a fixed parameter-type array,
and then execution only fetches actually-referenced parameters, using a hook
added to ParamListInfo.

Although there's a lot left to be done in plpgsql, this already cures the
"if (TG_OP = 'INSERT' and NEW.foo ...)"  problem, as illustrated by the
changed regression test.
2009-11-04 22:26:08 +00:00
Tom Lane
7d535ebe5b Dept of second thoughts: after studying index_getnext() a bit more I realize
that it can scribble on scan->xs_ctup.t_self while following HOT chains,
so we can't rely on that to stay valid between hashgettuple() calls.
Introduce a private variable in HashScanOpaque, instead.
2009-11-01 22:30:54 +00:00
Tom Lane
c4afdca4c2 Fix two serious bugs introduced into hash indexes by the 8.4 patch that made
hash indexes keep entries sorted by hash value.  First, the original plans for
concurrency assumed that insertions would happen only at the end of a page,
which is no longer true; this could cause scans to transiently fail to find
index entries in the presence of concurrent insertions.  We can compensate
by teaching scans to re-find their position after re-acquiring read locks.
Second, neither the bucket split nor the bucket compaction logic had been
fixed to preserve hashvalue ordering, so application of either of those
processes could lead to permanent corruption of an index, in the sense
that searches might fail to find entries that are present.

This patch fixes the split and compaction logic to preserve hashvalue
ordering, but it cannot do anything about pre-existing corruption.  We will
need to recommend reindexing all hash indexes in the 8.4.2 release notes.

To buy back the performance loss hereby induced in split and compaction,
fix them to use PageIndexMultiDelete instead of retail PageIndexDelete
operations.  We might later want to do something with qsort'ing the
page contents rather than doing a binary search for each insertion,
but that seemed more invasive than I cared to risk in a back-patch.

Per bug #5157 from Jeff Janes and subsequent investigation.
2009-11-01 21:25:25 +00:00
Tom Lane
fb5d05805b Implement parser hooks for processing ColumnRef and ParamRef nodes, as per my
recent proposal.  As proof of concept, remove knowledge of Params from the
core parser, arranging for them to be handled entirely by parser hook
functions.  It turns out we need an additional hook for that --- I had
forgotten about the code that handles inferring a parameter's type from
context.

This is a preliminary step towards letting plpgsql handle its variables
through parser hooks.  Additional work remains to be done to expose the
facility through SPI, but I think this is all the changes needed in the core
parser.
2009-10-31 01:41:31 +00:00
Tom Lane
cbcd1701f1 Fix AcquireRewriteLocks to be sure that it acquires the right lock strength
when FOR UPDATE is propagated down into a sub-select expanded from a view.
Similar bug to parser's isLockedRel issue that I fixed yesterday; likewise
seems not quite worth the effort to back-patch.
2009-10-28 17:36:50 +00:00
Tom Lane
46e3a16b05 When FOR UPDATE/SHARE is used with LIMIT, put the LockRows plan node
underneath the Limit node, not atop it.  This fixes the old problem that such
a query might unexpectedly return fewer rows than the LIMIT says, due to
LockRows discarding updated rows.

There is a related problem that LockRows might destroy the sort ordering
produced by earlier steps; but fixing that by pushing LockRows below Sort
would create serious performance problems that are unjustified in many
real-world applications, as well as potential deadlock problems from locking
many more rows than expected.  Instead, keep the present semantics of applying
FOR UPDATE after ORDER BY within a single query level; but allow the user to
specify the other way by writing FOR UPDATE in a sub-select.  To make that
work, track whether FOR UPDATE appeared explicitly in sub-selects or got
pushed down from the parent, and don't flatten a sub-select that contained an
explicit FOR UPDATE.
2009-10-28 14:55:47 +00:00
Tom Lane
61e5328208 Make FOR UPDATE/SHARE in the primary query not propagate into WITH queries;
for example in
  WITH w AS (SELECT * FROM foo) SELECT * FROM w, bar ... FOR UPDATE
the FOR UPDATE will now affect bar but not foo.  This is more useful and
consistent than the original 8.4 behavior, which tried to propagate FOR UPDATE
into the WITH query but always failed due to assorted implementation
restrictions.  Even though we are in process of removing those restrictions,
it seems correct on philosophical grounds to not let the outer query's
FOR UPDATE affect the WITH query.

In passing, fix isLockedRel which frequently got things wrong in
nested-subquery cases: "FOR UPDATE OF foo" applies to an alias foo in the
current query level, not subqueries.  This has been broken for a long time,
but it doesn't seem worth back-patching further than 8.4 because the actual
consequences are minimal.  At worst the parser would sometimes get
RowShareLock on a relation when it should be AccessShareLock or vice versa.
That would only make a difference if someone were using ExclusiveLock
concurrently, which no standard operation does, and anyway FOR UPDATE
doesn't result in visible changes so it's not clear that the someone would
notice any problem.  Between that and the fact that FOR UPDATE barely works
with subqueries at all in existing releases, I'm not excited about worrying
about it.
2009-10-27 17:11:18 +00:00
Heikki Linnakangas
2078e384a3 Fix range check in date_recv that tried to limit accepted values to only
those accepted by date_in(). I confused julian day numbers and number of
days since the postgres epoch 2000-01-01 in the original patch.

I just noticed that it's still easy to get such out-of-range values into
the database using to_date or +- operators, but this patch doesn't do
anything about those functions.

Per report from James Pye.
2009-10-26 16:13:11 +00:00
Tom Lane
9f2ee8f287 Re-implement EvalPlanQual processing to improve its performance and eliminate
a lot of strange behaviors that occurred in join cases.  We now identify the
"current" row for every joined relation in UPDATE, DELETE, and SELECT FOR
UPDATE/SHARE queries.  If an EvalPlanQual recheck is necessary, we jam the
appropriate row into each scan node in the rechecking plan, forcing it to emit
only that one row.  The former behavior could rescan the whole of each joined
relation for each recheck, which was terrible for performance, and what's much
worse could result in duplicated output tuples.

Also, the original implementation of EvalPlanQual could not re-use the recheck
execution tree --- it had to go through a full executor init and shutdown for
every row to be tested.  To avoid this overhead, I've associated a special
runtime Param with each LockRows or ModifyTable plan node, and arranged to
make every scan node below such a node depend on that Param.  Thus, by
signaling a change in that Param, the EPQ machinery can just rescan the
already-built test plan.

This patch also adds a prohibition on set-returning functions in the
targetlist of SELECT FOR UPDATE/SHARE.  This is needed to avoid the
duplicate-output-tuple problem.  It seems fairly reasonable since the
other restrictions on SELECT FOR UPDATE are meant to ensure that there
is a unique correspondence between source tuples and result tuples,
which an output SRF destroys as much as anything else does.
2009-10-26 02:26:45 +00:00
Tom Lane
ab61df9e52 Remove regex_flavor GUC, so that regular expressions are always "advanced"
style by default.  Per discussion, there seems to be hardly anything that
really relies on being able to change the regex flavor, so the ability to
select it via embedded options ought to be enough for any stragglers.
Also, if we didn't remove the GUC, we'd really be morally obligated to
mark the regex functions non-immutable, which'd possibly create performance
issues.
2009-10-21 20:38:58 +00:00
Tom Lane
289e2905c8 Remove add_missing_from GUC and associated parser support for "implicit RTEs".
Per recent discussion, add_missing_from has been deprecated for long enough to
consider removing, and it's getting in the way of planned parser refactoring.
The system now always behaves as though add_missing_from were OFF.
2009-10-21 20:22:38 +00:00
Magnus Hagander
748771379b Write to the Windows eventlog in UTF16, converting the message encoding
as necessary.

Itagaki Takahiro with some changes from me
2009-10-17 00:24:51 +00:00
Tom Lane
b2734a0d79 Support SQL-compliant triggers on columns, ie fire only if certain columns
are named in the UPDATE's SET list.

Note: the schema of pg_trigger has not actually changed; we've just started
to use a column that was there all along.  catversion bumped anyway so that
this commit is included in the history of potentially interesting changes
to system catalog contents.

Itagaki Takahiro
2009-10-14 22:14:25 +00:00
Alvaro Herrera
201e5b282b Add new PGC_S_DATABASE_USER enum value to several places missed by my patch
last week.

Per note and patch from Jeff Davis.
2009-10-13 14:18:40 +00:00
Tom Lane
8d54c2482b Code review for LIKE INCLUDING patch --- clean up some cosmetic and not
so cosmetic stuff.
2009-10-13 00:53:08 +00:00
Tom Lane
11ca04b4b7 Support GRANT/REVOKE ON ALL TABLES/SEQUENCES/FUNCTIONS IN SCHEMA.
Petr Jelinek
2009-10-12 20:39:42 +00:00
Andrew Dunstan
faa1afc6c1 CREATE LIKE INCLUDING COMMENTS and STORAGE, and INCLUDING ALL shortcut. Itagaki Takahiro. 2009-10-12 19:49:24 +00:00