Commit Graph

11518 Commits

Author SHA1 Message Date
Peter Eisentraut 39b8843296 Implement remaining fields of information_schema.sequences view
Add new function pg_sequence_parameters that returns a sequence's start,
minimum, maximum, increment, and cycle values, and use that in the view.
(bug #5662; design suggestion by Tom Lane)

Also slightly adjust the view's column order and permissions after review of
SQL standard.
2011-01-02 15:15:21 +02:00
Robert Haas e657b55e66 Fix typo.
Noted by Magnus Hagander.
2011-01-02 07:26:10 -05:00
Robert Haas 0d692a0dc9 Basic foreign table support.
Foreign tables are a core component of SQL/MED.  This commit does
not provide a working SQL/MED infrastructure, because foreign tables
cannot yet be queried.  Support for foreign table scans will need to
be added in a future patch.  However, this patch creates the necessary
system catalog structure, syntax support, and support for ancillary
operations such as COMMENT and SECURITY LABEL.

Shigeru Hanada, heavily revised by Robert Haas
2011-01-01 23:48:11 -05:00
Peter Eisentraut 6a208aa404 Allow casting a table's row type to the table's supertype if it's a typed table
This is analogous to the existing facility that allows casting a row type to a
supertable's row type.
2011-01-01 23:04:14 +02:00
Bruce Momjian 5d950e3b0c Stamp copyrights for year 2011. 2011-01-01 13:18:15 -05:00
Bruce Momjian 30aeda4394 Include the first valid listen address in pg_ctl to improve server start
"wait" detection and add postmaster start time to help determine if the
postmaster is actually using the specified data directory.
2010-12-31 17:25:02 -05:00
Tom Lane 39c8dd6620 Invert and rename flag variable to improve code readability.
No change in functionality.  Per discussion with Robert.
2010-12-31 11:59:38 -05:00
Tom Lane 7b46401557 Move symbols for ExecMergeJoin's state machine into nodeMergejoin.c.
There's no reason for these values to be known anywhere else.  After
doing this, executor/execdefs.h is vestigial and can be removed.
2010-12-30 22:12:40 -05:00
Tom Lane f4e4b32743 Support RIGHT and FULL OUTER JOIN in hash joins.
This is advantageous first because it allows us to hash the smaller table
regardless of the outer-join type, and second because hash join can be more
flexible than merge join in dealing with arbitrary join quals in a FULL
join.  For merge join all the join quals have to be mergejoinable, but hash
join will work so long as there's at least one hashjoinable qual --- the
others can be any condition.  (This is true essentially because we don't
keep per-inner-tuple match flags in merge join, while hash join can do so.)

To do this, we need a has-it-been-matched flag for each tuple in the
hashtable, not just one for the current outer tuple.  The key idea that
makes this practical is that we can store the match flag in the tuple's
infomask, since there are lots of bits there that are of no interest for a
MinimalTuple.  So we aren't increasing the size of the hashtable at all for
the feature.

To write this without turning the hash code into even more of a pile of
spaghetti than it already was, I rewrote ExecHashJoin in a state-machine
style, similar to ExecMergeJoin.  Other than that decision, it was pretty
straightforward.
2010-12-30 20:26:08 -05:00
Alvaro Herrera 55573990ca Avoid unnecessary public struct declaration in slru.h
Instead, declare a public wrapper of the sole function using it for
external callers, so that they don't have to always pass a NULL
argument.

Author: Kevin Grittner
2010-12-30 12:09:17 -03:00
Robert Haas 53dbc27c62 Support unlogged tables.
The contents of an unlogged table are WAL-logged; thus, they are not
available on standby servers and are truncated whenever the database
system enters recovery.  Indexes on unlogged tables are also unlogged.
Unlogged GiST indexes are not currently supported.
2010-12-29 06:48:53 -05:00
Magnus Hagander 9b8aff8c19 Add REPLICATION privilege for ROLEs
This privilege is required to do Streaming Replication, instead of
superuser, making it possible to set up a SR slave that doesn't
have write permissions on the master.

Superuser privileges do NOT override this check, so in order to
use the default superuser account for replication it must be
explicitly granted the REPLICATION permissions. This is backwards
incompatible change, in the interest of higher default security.
2010-12-29 11:05:03 +01:00
Tom Lane f2ba1e994c Avoid unexpected conversion overflow in planner for distant date values.
The "date" type supports a wider range of dates than int64 timestamps do.
However, there is pre-int64-timestamp code in the planner that assumes that
all date values can be converted to timestamp with impunity.  Fortunately,
what we really need out of the conversion is always a double (float8)
value; so even when the date is out of timestamp's range it's possible to
produce a sane answer.  All we need is a code path that doesn't try to
force the result into int64.  Per trouble report from David Rericha.

Back-patch to all supported versions.  Although this is surely a corner
case, there's not much point in advertising a date range wider than
timestamp's if we will choke on such values in unexpected places.
2010-12-28 22:49:57 -05:00
Bruce Momjian b4d3792daa Another fix for larger postmaster.pid files. 2010-12-28 09:34:46 -05:00
Bruce Momjian bada44a2a2 Fix code to properly pull out shared memory key now that the
postmaster.pid file is larger than in previous major versions.
This is a bug introduced when I added lines to the file recently.
2010-12-27 23:11:33 -05:00
Tom Lane 84fc571395 Rename the C functions bitand(), bitor() to bit_and(), bit_or().
This is to avoid use of the C++ keywords "bitand" and "bitor" in
the header file utils/varbit.h.  Note the functions' SQL-level
names are not changed, only their C-level names.

In passing, make some comments in varbit.c conform to project-standard
layout.
2010-12-27 14:57:41 -05:00
Tom Lane 275411912d Fix ill-chosen use of "private" as an argument and struct field name.
"private" is a keyword in C++, so this breaks the poorly-enforced policy
that header files should be include-able in C++ code.  Per report from
Craig Ringer and some investigation with cpluspluscheck.
2010-12-27 11:26:19 -05:00
Andrew Dunstan a534728afb Only build in crashdump support on Windows if there's a working dbghelp.h. 2010-12-26 10:34:47 -05:00
Bruce Momjian 5000472112 Remove quotes from boolean recovery.conf.sample parameters, now that the
quotes are not required.  This now matches postgresql.conf's
specification of booleans.
2010-12-24 11:51:51 -05:00
Bruce Momjian 075354ad1b Improve "pg_ctl -w start" server detection by writing the postmaster
port and socket directory into postmaster.pid, and have pg_ctl read from
that file, for use by PQping().
2010-12-24 09:45:52 -05:00
Heikki Linnakangas 9de3aa65f0 Rewrite the GiST insertion logic so that we don't need the post-recovery
cleanup stage to finish incomplete inserts or splits anymore. There was two
reasons for the cleanup step:

1. When a new tuple was inserted to a leaf page, the downlink in the parent
needed to be updated to contain (ie. to be consistent with) the new key.
Updating the parent in turn might require recursively updating the parent of
the parent. We now handle that by updating the parent while traversing down
the tree, so that when we insert the leaf tuple, all the parents are already
consistent with the new key, and the tree is consistent at every step.

2. When a page is split, we need to insert the downlink for the new right
page(s), and update the downlink for the original page to not include keys
that moved to the right page(s). We now handle that by setting a new flag,
F_FOLLOW_RIGHT, on the non-rightmost pages in the split. When that flag is
set, scans always follow the rightlink, regardless of the NSN mechanism used
to detect concurrent page splits. That way the tree is consistent right after
split, even though the downlink is still missing. This is very similar to the
way B-tree splits are handled. When the downlink is inserted in the parent,
the flag is cleared. To keep the insertion algorithm simple, when an
insertion sees an incomplete split, indicated by the F_FOLLOW_RIGHT flag, it
finishes the split before doing anything else.

These changes allow removing the whole "invalid tuple" mechanism, but I
retained the scan code to still follow invalid tuples correctly. While we
don't create any such tuples anymore, we want to handle them gracefully in
case you pg_upgrade a GiST index that has them. If we encounter any on an
insert, though, we just throw an error saying that you need to REINDEX.

The issue that got me into doing this is that if you did a checkpoint while
an insert or split was in progress, and the checkpoint finishes quickly so
that there is no WAL record related to the insert between RedoRecPtr and the
checkpoint record, recovery from that checkpoint would not know to finish
the incomplete insert. IOW, we have the same issue we solved with the
rm_safe_restartpoint mechanism during normal operation too. It's highly
unlikely to happen in practice, and this fix is far too large to backpatch,
so we're just going to live with in previous versions, but this refactoring
fixes it going forward.

With this patch, you don't get the annoying
'index "FOO" needs VACUUM or REINDEX to finish crash recovery' notices
anymore if you crash at an unfortunate moment.
2010-12-23 16:21:47 +02:00
Robert Haas 32ba2b5160 Use memcmp() rather than strncmp() when shorter string length is known.
It appears that this will be faster for all but the shortest strings;
at least one some platforms, memcmp() can use word-at-a-time comparisons.

Noah Misch, somewhat pared down.
2010-12-21 22:11:40 -05:00
Robert Haas c5160b7eec Fix typos.
Andreas Karlsson
2010-12-21 17:58:53 -05:00
Robert Haas 24ecde7742 Work around unfortunate getppid() behavior on BSD-ish systems.
On MacOS X, and apparently also on other BSD-derived systems, attaching
a debugger causes getppid() to return the pid of the debugging process
rather than the actual parent PID.  As a result, debugging the
autovacuum launcher, startup process, or WAL sender on such systems
causes it to exit, because the previous coding of PostmasterIsAlive()
detects postmaster death by testing whether getppid() == PostmasterPid.

Work around that behavior by checking the return value of getppid()
more carefully.  If it's PostmasterPid, the postmaster must be alive;
if it's 1, assume the postmaster is dead.  If it's any other value,
assume we've been debugged and fall through to the less-reliable
kill() test.

Review by Tom Lane.
2010-12-21 06:30:32 -05:00
Robert Haas f6a0863e3c Allow transactions that don't write WAL to commit asynchronously.
This case can arise if a transaction has written data, but only to
temporary tables.  Loss of the commit record in case of a crash won't
matter, because the temporary tables will be lost anyway.

Reviewed by Heikki Linnakangas and Simon Riggs.
2010-12-20 12:59:33 -05:00
Magnus Hagander d382828f6e Remove thread dumping constant that requires newer Platform SDK
Since we're not multithreaded it only provides marginally useful
information, and it does require a newer version of the Platform SDK
than we target. We may want to reconsider this in the future along
with a fix for MinGW.
2010-12-19 21:32:58 +01:00
Tom Lane 1b19e2c0ba Fix up handling of simple-form CASE with constant test expression.
eval_const_expressions() can replace CaseTestExprs with constants when
the surrounding CASE's test expression is a constant.  This confuses
ruleutils.c's heuristic for deparsing simple-form CASEs, leading to
Assert failures or "unexpected CASE WHEN clause" errors.  I had put in
a hack solution for that years ago (see commit
514ce7a331 of 2006-10-01), but bug #5794
from Peter Speck shows that that solution failed to cover all cases.

Fortunately, there's a much better way, which came to me upon reflecting
that Peter's "CASE TRUE WHEN" seemed pretty redundant: we can "simplify"
the simple-form CASE to the general form of CASE, by simply omitting the
constant test expression from the rebuilt CASE construct.  This is
intuitively valid because there is no need for the executor to evaluate
the test expression at runtime; it will never be referenced, because any
CaseTestExprs that would have referenced it are now replaced by constants.
This won't save a whole lot of cycles, since evaluating a Const is pretty
cheap, but a cycle saved is a cycle earned.  In any case it beats kluging
ruleutils.c still further.  So this patch improves const-simplification
and reverts the previous change in ruleutils.c.

Back-patch to all supported branches.  The bug exists in 8.1 too, but it's
out of warranty.
2010-12-19 15:30:44 -05:00
Tom Lane abc1026269 Fix erroneous parsing of tsquery input "... & !(subexpression) | ..."
After parsing a parenthesized subexpression, we must pop all pending
ANDs and NOTs off the stack, just like the case for a simple operand.
Per bug #5793.

Also fix clones of this routine in contrib/intarray and contrib/ltree,
where input of types query_int and ltxtquery had the same problem.

Back-patch to all supported versions.
2010-12-19 12:48:34 -05:00
Magnus Hagander dcb09b595f Support for collecting crash dumps on Windows
Add support for collecting "minidump" style crash dumps on
Windows, by setting up an exception handling filter. Crash
dumps will be generated in PGDATA/crashdumps if the directory
is created (the existance of the directory is used as on/off
switch for the generation of the dumps).

Craig Ringer and Magnus Hagander
2010-12-19 16:45:28 +01:00
Magnus Hagander 4754dbf4c3 Make GUC variables for syslog and SSL always visible
Make the variables visible (but not used) even when
support is not compiled in.
2010-12-18 16:53:59 +01:00
Alvaro Herrera 3026027ec3 set_ps_display when calling functions via fastpath
This improves tag output by log_line_prefix
2010-12-17 18:51:22 -03:00
Alvaro Herrera b68193c0c7 Remove unnecessary definition for autovacuum in SignalSomeChildren. 2010-12-17 15:59:19 -03:00
Robert Haas 8bd4b89e24 Try to save a kernel call in ResolveRecoveryConflictWithVirtualXIDs.
If there's no work to be done, just exit quickly, before initialization.
2010-12-17 11:32:02 -05:00
Robert Haas 611fed3712 Reset 'ps' display just once when resolving VXID conflicts.
This prevents the word "waiting" from briefly disappearing from the ps
status line when ResolveRecoveryConflictWithVirtualXIDs begins a new
iteration of the outer loop.

Along the way, remove some useless pgstat_report_waiting() calls;
the startup process doesn't appear in pg_stat_activity.

Fujii Masao
2010-12-17 08:30:57 -05:00
Tom Lane 14ed7735f5 Improve comments around startup_hacks() code.
These comments were not updated when we added the EXEC_BACKEND
mechanism for Windows, even though it rendered them inaccurate.

Also unify two unnecessarily-separate #ifdef __alpha code blocks.
2010-12-16 17:57:57 -05:00
Tom Lane 61b53695fb Remove optreset from src/port/ implementations of getopt and getopt_long.
We don't actually need optreset, because we can easily fix the code to
ensure that it's cleanly restartable after having completed a scan over the
argv array; which is the only case we need to restart in.  Getting rid of
it avoids a class of interactions with the system libraries and allows
reversion of my change of yesterday in postmaster.c and postgres.c.

Back-patch to 8.4.  Before that the getopt code was a bit different anyway.
2010-12-16 16:23:05 -05:00
Alvaro Herrera 16ca75baeb Simplify SignalSomeChildren(BACKEND_TYPE_ALL) to SignalChildren() 2010-12-16 12:23:07 -03:00
Tom Lane 5cdd65f324 Fix up getopt() reset management so it works on recent mingw.
The mingw people don't appear to care about compatibility with non-GNU
versions of getopt, so force use of our own copy of getopt on Windows.
Also, ensure that we make use of optreset when using our own copy.

Per report from Andrew Dunstan.  Back-patch to all versions supported
on Windows.
2010-12-15 23:50:41 -05:00
Robert Haas 290f1603b4 Some copy editing of pg_read_binary_file() patch. 2010-12-15 21:02:31 -05:00
Itagaki Takahiro 03db44eae3 Add pg_read_binary_file() and whole-file-at-once versions of pg_read_file().
One of the usages of the binary version is to read files in a different
encoding from the server encoding.

Dimitri Fontaine and Itagaki Takahiro.
2010-12-16 06:56:28 +09:00
Robert Haas 34c70c7ac4 Instrument checkpoint sync calls.
Greg Smith, reviewed by Jeff Janes
2010-12-14 09:26:19 -05:00
Robert Haas d368e1a2a7 Allow plugins to suppress inlining and hook function entry/exit/abort.
This is intended as infrastructure to allow an eventual SE-Linux plugin to
support trusted procedures.

KaiGai Kohei
2010-12-13 19:15:53 -05:00
Robert Haas 5f7b58fad8 Generalize concept of temporary relations to "relation persistence".
This commit replaces pg_class.relistemp with pg_class.relpersistence;
and also modifies the RangeVar node type to carry relpersistence rather
than istemp.  It also removes removes rd_istemp from RelationData and
instead performs the correct computation based on relpersistence.

For clarity, we add three new macros: RelationNeedsWAL(),
RelationUsesLocalBuffers(), and RelationUsesTempNamespace(), so that we
can clarify the purpose of each check that previous depended on
rd_istemp.

This is intended as infrastructure for the upcoming unlogged tables
patch, as well as for future possible work on global temporary tables.
2010-12-13 12:34:26 -05:00
Tom Lane 0c90442355 Reset all database-level stats in pgstat_recv_resetcounter().
We were failing to zero out some pg_stat_database counters that have
been added since the initial pgstats coding.  This is a bug, but not
back-patching the fix since changing this behavior in a minor release
seems a cure worse than the disease.

Report and patch by Tomas Vondra.
2010-12-12 15:09:53 -05:00
Robert Haas d3d414696f Allow bidirectional copy messages in streaming replication mode.
Fujii Masao.  Review by Alvaro Herrera, Tom Lane, and myself.
2010-12-11 09:27:37 -05:00
Tom Lane 04f4e10cfc Use symbolic names not octal constants for file permission flags.
Purely cosmetic patch to make our coding standards more consistent ---
we were doing symbolic some places and octal other places.  This patch
fixes all C-coded uses of mkdir, chmod, and umask.  There might be some
other calls I missed.  Inconsistency noted while researching tablespace
directory permissions issue.
2010-12-10 17:35:33 -05:00
Tom Lane 244407a710 Fix efficiency problems in tuplestore_trim().
The original coding in tuplestore_trim() was only meant to work efficiently
in cases where each trim call deleted most of the tuples in the store.
Which, in fact, was the pattern of the original usage with a Material node
supporting mark/restore operations underneath a MergeJoin.  However,
WindowAgg now uses tuplestores and it has considerably less friendly
trimming behavior.  In particular it can attempt to trim one tuple at a
time off a large tuplestore.  tuplestore_trim() had O(N^2) runtime in this
situation because of repeatedly shifting its tuple pointer array.  Fix by
avoiding shifting the array until a reasonably large number of tuples have
been deleted.  This can waste some pointer space, but we do still reclaim
the tuples themselves, so the percentage wastage should be pretty small.

Per Jie Li's report of slow percent_rank() evaluation.  cume_dist() and
ntile() would certainly be affected as well, along with any other window
function that has a moving frame start and requires reading substantially
ahead of the current row.

Back-patch to 8.4, where window functions were introduced.  There's no
need to tweak it before that.
2010-12-10 11:33:38 -05:00
Simon Riggs 9975c683b1 Self review of previous patch. Fix assumption that xmax >= xmin. 2010-12-09 10:20:49 +00:00
Simon Riggs b9075a6d2f Reduce spurious Hot Standby conflicts from never-visible records.
Hot Standby conflicts only with tuples that were visible at
some point. So ignore tuples from aborted transactions or for
tuples updated/deleted during the inserting transaction when
generating the conflict transaction ids.

Following detailed analysis and test case by Noah Misch.
Original report covered btree delete records, correctly observed
by Heikki Linnakangas that this applies to other cases also.
Fix covers all sources of cleanup records via common code.
2010-12-09 09:41:47 +00:00
Tom Lane 576477e73c Force default wal_sync_method to be fdatasync on Linux.
Recent versions of the Linux system header files cause xlogdefs.h to
believe that open_datasync should be the default sync method, whereas
formerly fdatasync was the default on Linux.  open_datasync is a bad
choice, first because it doesn't actually outperform fdatasync (in fact
the reverse), and second because we try to use O_DIRECT with it, causing
failures on certain filesystems (e.g., ext4 with data=journal option).
This part of the patch is largely per a proposal from Marti Raudsepp.
More extensive changes are likely to follow in HEAD, but this is as much
change as we want to back-patch.

Also clean up confusing code and incorrect documentation surrounding the
fsync_writethrough option.  Those changes shouldn't result in any actual
behavioral change, but I chose to back-patch them anyway to keep the
branches looking similar in this area.

In 9.0 and HEAD, also do some copy-editing on the WAL Reliability
documentation section.

Back-patch to all supported branches, since any of them might get used
on modern Linux versions.
2010-12-08 20:01:09 -05:00