Commit Graph

2748 Commits

Author SHA1 Message Date
Andres Freund 3bdcf6a5a7 Don't allow to disable backend assertions via the debug_assertions GUC.
The existance of the assert_enabled variable (backing the
debug_assertions GUC) reduced the amount of knowledge some static code
checkers (like coverity and various compilers) could infer from the
existance of the assertion. That could have been solved by optionally
removing the assertion_enabled variable from the Assert() et al macros
at compile time when some special macro is defined, but the resulting
complication doesn't seem to be worth the gain from having
debug_assertions. Recompiling is fast enough.

The debug_assertions GUC is still available, but readonly, as it's
useful when diagnosing problems. The commandline/client startup option
-A, which previously also allowed to enable/disable assertions, has
been removed as it doesn't serve a purpose anymore.

While at it, reduce code duplication in bufmgr.c and localbuf.c
assertions checking for spurious buffer pins. That code had to be
reindented anyway to cope with the assert_enabled removal.
2014-06-20 11:09:17 +02:00
Alvaro Herrera 7937910781 Fix typos 2014-06-12 14:01:01 -04:00
Tom Lane e416830a29 Prevent auto_explain from changing the output of a user's EXPLAIN.
Commit af7914c662, which introduced the
EXPLAIN (TIMING) option, for some reason coded explain.c to look at
planstate->instrument->need_timer rather than es->timing to decide
whether to print timing info.  However, the former flag might get set
as a result of contrib/auto_explain wanting timing information.  We
certainly don't want activation of auto_explain to change user-visible
statement behavior, so fix that.

Also fix an independent bug introduced in the same patch: in the code
path for a never-executed node with a machine-friendly output format,
if timing was selected, it would fail to print the Actual Rows and Actual
Loops items.

Per bug #10404 from Tomonari Katsumata.  Back-patch to 9.2 where the
faulty code was introduced.
2014-05-20 12:20:47 -04:00
Bruce Momjian 0a78320057 pgindent run for 9.4
This includes removing tabs after periods in C comments, which was
applied to back branches, so this change should not effect backpatching.
2014-05-06 12:12:18 -04:00
Tom Lane 2d00190495 Rationalize common/relpath.[hc].
Commit a730183926 created rather a mess by
putting dependencies on backend-only include files into include/common.
We really shouldn't do that.  To clean it up:

* Move TABLESPACE_VERSION_DIRECTORY back to its longtime home in
catalog/catalog.h.  We won't consider this symbol part of the FE/BE API.

* Push enum ForkNumber from relfilenode.h into relpath.h.  We'll consider
relpath.h as the source of truth for fork numbers, since relpath.c was
already partially serving that function, and anyway relfilenode.h was
kind of a random place for that enum.

* So, relfilenode.h now includes relpath.h rather than vice-versa.  This
direction of dependency is fine.  (That allows most, but not quite all,
of the existing explicit #includes of relpath.h to go away again.)

* Push forkname_to_number from catalog.c to relpath.c, just to centralize
fork number stuff a bit better.

* Push GetDatabasePath from catalog.c to relpath.c; it was rather odd
that the previous commit didn't keep this together with relpath().

* To avoid needing relfilenode.h in common/, redefine the underlying
function (now called GetRelationPath) as taking separate OID arguments,
and make the APIs using RelFileNode or RelFileNodeBackend into macro
wrappers.  (The macros have a potential multiple-eval risk, but none of
the existing call sites have an issue with that; one of them had such a
risk already anyway.)

* Fix failure to follow the directions when "init" fork type was added;
specifically, the errhint in forkname_to_number wasn't updated, and neither
was the SGML documentation for pg_relation_size().

* Fix tablespace-path-too-long check in CreateTableSpace() to account for
fork-name component of maximum-length pathnames.  This requires putting
FORKNAMECHARS into a header file, but it was rather useless (and
actually unreferenced) where it was.

The last couple of items are potentially back-patchable bug fixes,
if anyone is sufficiently excited about them; but personally I'm not.

Per a gripe from Christoph Berg about how include/common wasn't
self-contained.
2014-04-30 17:30:50 -04:00
Tom Lane f0fedfe82c Allow polymorphic aggregates to have non-polymorphic state data types.
Before 9.4, such an aggregate couldn't be declared, because its final
function would have to have polymorphic result type but no polymorphic
argument, which CREATE FUNCTION would quite properly reject.  The
ordered-set-aggregate patch found a workaround: allow the final function
to be declared as accepting additional dummy arguments that have types
matching the aggregate's regular input arguments.  However, we failed
to notice that this problem applies just as much to regular aggregates,
despite the fact that we had a built-in regular aggregate array_agg()
that was known to be undeclarable in SQL because its final function
had an illegal signature.  So what we should have done, and what this
patch does, is to decouple the extra-dummy-arguments behavior from
ordered-set aggregates and make it generally available for all aggregate
declarations.  We have to put this into 9.4 rather than waiting till
later because it slightly alters the rules for declaring ordered-set
aggregates.

The patch turned out a bit bigger than I'd hoped because it proved
necessary to record the extra-arguments option in a new pg_aggregate
column.  I'd thought we could just look at the final function's pronargs
at runtime, but that didn't work well for variadic final functions.
It's probably just as well though, because it simplifies life for pg_dump
to record the option explicitly.

While at it, fix array_agg() to have a valid final-function signature,
and add an opr_sanity test to notice future deviations from polymorphic
consistency.  I also marked the percentile_cont() aggregates as not
needing extra arguments, since they don't.
2014-04-23 19:17:41 -04:00
Heikki Linnakangas 8d34f68628 Avoid transient bogus page contents when creating a sequence.
Don't use simple_heap_insert to insert the tuple to a sequence relation.
simple_heap_insert creates a heap insertion WAL record, and replaying that
will create a regular heap page without the special area containing the
sequence magic constant, which is wrong for a sequence. That was not a bug
because we always created a sequence WAL record after that, and replaying
that overwrote the bogus heap page, and the transient state could never be
seen by another backend because it was only done when creating a new
sequence relation. But it's simpler and cleaner to avoid that in the first
place.
2014-04-22 10:40:23 +03:00
Heikki Linnakangas 2a8e1ac598 Set the all-visible flag on heap page before writing WAL record, not after.
If we set the all-visible flag after writing WAL record, and XLogInsert
takes a full-page image of the page, the image would not include the flag.
We will then proceed to set the VM bit, which would then be set without the
corresponding all-visible flag on the heap page.

Found by comparing page images on master and standby, after writing/replaying
each WAL record. (There is still a discrepancy: the all-visible flag won't
be set after replaying the HEAP_CLEAN record, even though it is set in the
master. However, it will be set when replaying the HEAP2_VISIBLE record and
setting the VM bit, so the all-visible flag and VM bit are always consistent
on the standby, even though they are momentarily out-of-sync with master)

Backpatch to 9.3 where this code was introduced.
2014-04-17 17:47:50 +03:00
Tom Lane 5f86cbd714 Rename EXPLAIN ANALYZE's "total runtime" output to "execution time".
Now that EXPLAIN also outputs a "planning time" measurement, the use of
"total" here seems rather confusing: it sounds like it might include the
planning time which of course it doesn't.  Majority opinion was that
"execution time" is a better label, so we'll call it that.

This should be noted as a backwards incompatibility for tools that examine
EXPLAIN ANALYZE output.

In passing, I failed to resist the temptation to do a little editing on the
materialized-view example affected by this change.
2014-04-16 20:48:59 -04:00
Tom Lane e0c91a7ff0 Improve some O(N^2) behavior in window function evaluation.
Repositioning the tuplestore seek pointer in window_gettupleslot() turns
out to be a very significant expense when the window frame is sizable and
the frame end can move.  To fix, introduce a tuplestore function for
skipping an arbitrary number of tuples in one call, parallel to the one we
introduced for tuplesort objects in commit 8d65da1f.  This reduces the cost
of window_gettupleslot() to O(1) if the tuplestore has not spilled to disk.
As in the previous commit, I didn't try to do any real optimization of
tuplestore_skiptuples for the case where the tuplestore has spilled to
disk.  There is probably no practical way to get the cost to less than O(N)
anyway, but perhaps someone can think of something later.

Also fix PersistHoldablePortal() to make use of this API now that we have
it.

Based on a suggestion by Dean Rasheed, though this turns out not to look
much like his patch.
2014-04-13 13:59:17 -04:00
Stephen Frost 842faa714c Make security barrier views automatically updatable
Views which are marked as security_barrier must have their quals
applied before any user-defined quals are called, to prevent
user-defined functions from being able to see rows which the
security barrier view is intended to prevent them from seeing.

Remove the restriction on security barrier views being automatically
updatable by adding a new securityQuals list to the RTE structure
which keeps track of the quals from security barrier views at each
level, independently of the user-supplied quals.  When RTEs are
later discovered which have securityQuals populated, they are turned
into subquery RTEs which are marked as security_barrier to prevent
any user-supplied quals being pushed down (modulo LEAKPROOF quals).

Dean Rasheed, reviewed by Craig Ringer, Simon Riggs, KaiGai Kohei
2014-04-12 21:04:58 -04:00
Tom Lane a9d9acbf21 Create infrastructure for moving-aggregate optimization.
Until now, when executing an aggregate function as a window function
within a window with moving frame start (that is, any frame start mode
except UNBOUNDED PRECEDING), we had to recalculate the aggregate from
scratch each time the frame head moved.  This patch allows an aggregate
definition to include an alternate "moving aggregate" implementation
that includes an inverse transition function for removing rows from
the aggregate's running state.  As long as this can be done successfully,
runtime is proportional to the total number of input rows, rather than
to the number of input rows times the average frame length.

This commit includes the core infrastructure, documentation, and regression
tests using user-defined aggregates.  Follow-on commits will update some
of the built-in aggregates to use this feature.

David Rowley and Florian Pflug, reviewed by Dean Rasheed; additional
hacking by me
2014-04-12 12:03:30 -04:00
Simon Riggs e5550d5fec Reduce lock levels of some ALTER TABLE cmds
VALIDATE CONSTRAINT

CLUSTER ON
SET WITHOUT CLUSTER

ALTER COLUMN SET STATISTICS
ALTER COLUMN SET ()
ALTER COLUMN RESET ()

All other sub-commands use AccessExclusiveLock

Simon Riggs and Noah Misch

Reviews by Robert Haas and Andres Freund
2014-04-06 11:13:43 -04:00
Tom Lane abe075dfff Fix tablespace creation WAL replay to work on Windows.
The code segment that removes the old symlink (if present) wasn't clued
into the fact that on Windows, symlinks are junction points which have
to be removed with rmdir().

Backpatch to 9.0, where the failing code was introduced.

MauMau, reviewed by Muhammad Asif Naeem and Amit Kapila
2014-04-04 23:09:35 -04:00
Noah Misch 7cbe57c34d Offer triggers on foreign tables.
This covers all the SQL-standard trigger types supported for regular
tables; it does not cover constraint triggers.  The approach for
acquiring the old row mirrors that for view INSTEAD OF triggers.  For
AFTER ROW triggers, we spool the foreign tuples to a tuplestore.

This changes the FDW API contract; when deciding which columns to
populate in the slot returned from data modification callbacks, writable
FDWs will need to check for AFTER ROW triggers in addition to checking
for a RETURNING clause.

In support of the feature addition, refactor the TriggerFlags bits and
the assembly of old tuples in ModifyTable.

Ronan Dunklau, reviewed by KaiGai Kohei; some additional hacking by me.
2014-03-23 02:16:34 -04:00
Noah Misch 6115480c54 Improve comments about AfterTriggerBeginQuery() query level usage. 2014-03-23 02:15:52 -04:00
Tom Lane f7271c4427 Fix relcache reference leak in refresh_by_match_merge().
One path through the loop over indexes forgot to do index_close().  Rather
than adding a fourth call, restructure slightly so that there's only one.

In passing, get rid of an unnecessary syscache lookup: the pg_index struct
for the index is already available from its relcache entry.

Per report from YAMAMOTO Takashi, though this is a bit different from his
suggested patch.  This is new code in HEAD, so no need for back-patch.
2014-03-18 11:36:53 -04:00
Tom Lane 7bae0284ee Avoid transaction-commit race condition while receiving a NOTIFY message.
Use TransactionIdIsInProgress, then TransactionIdDidCommit, to distinguish
whether a NOTIFY message's originating transaction is in progress,
committed, or aborted.  The previous coding could accept a message from a
transaction that was still in-progress according to the PGPROC array;
if the client were fast enough at starting a new transaction, it might fail
to see table rows added/updated by the message-sending transaction.  Which
of course would usually be the point of receiving the message.  We noted
this type of race condition long ago in tqual.c, but async.c overlooked it.

The race condition probably cannot occur unless there are multiple NOTIFY
senders in action, since an individual backend doesn't send NOTIFY signals
until well after it's done committing.  But if two senders commit in close
succession, it's certainly possible that we could see the second sender's
message within the race condition window while responding to the signal
from the first one.

Per bug #9557 from Marko Tiikkaja.  This patch is slightly more invasive
than what he proposed, since it removes the now-redundant
TransactionIdDidAbort call.

Back-patch to 9.0, where the current NOTIFY implementation was introduced.
2014-03-13 12:02:54 -04:00
Bruce Momjian 5024044a20 C comments: improve description of relfilenode uniqueness
Report by Antonin Houska
2014-03-08 12:20:30 -05:00
Tom Lane 7c31874945 Avoid getting more than AccessShareLock when deparsing a query.
In make_ruledef and get_query_def, we have long used AcquireRewriteLocks
to ensure that the querytree we are about to deparse is up-to-date and
the schemas of the underlying relations aren't changing.  Howwever, that
function thinks the query is about to be executed, so it acquires locks
that are stronger than necessary for the purpose of deparsing.  Thus for
example, if pg_dump asks to deparse a rule that includes "INSERT INTO t",
we'd acquire RowExclusiveLock on t.  That results in interference with
concurrent transactions that might for example ask for ShareLock on t.
Since pg_dump is documented as being purely read-only, this is unexpected.
(Worse, it used to actually be read-only; this behavior dates back only
to 8.1, cf commit ba4200246.)

Fix this by adding a parameter to AcquireRewriteLocks to tell it whether
we want the "real" execution locks or only AccessShareLock.

Report, diagnosis, and patch by Dean Rasheed.  Back-patch to all supported
branches.
2014-03-06 19:31:05 -05:00
Andrew Dunstan 3b5e03dca2 Provide a FORCE NULL option to COPY in CSV mode.
This forces an input field containing the quoted null string to be
returned as a NULL. Without this option, only unquoted null strings
behave this way. This helps where some CSV producers insist on quoting
every field, whether or not it is needed. The option takes a list of
fields, and only applies to those columns. There is an equivalent
column-level option added to file_fdw.

Ian Barwick, with some tweaking by Andrew Dunstan, reviewed by Payal
Singh.
2014-03-04 17:31:59 -05:00
Robert Haas af2543e884 Allow VACUUM FULL/CLUSTER to bump freeze horizons even for pg_class.
pg_class is a special case for CLUSTER and VACUUM FULL, so although
commit 3cff1879f8 caused these
operations to advance relfrozenxid and relminmxid for all other
tables, it did not provide the same benefit for pg_class.  This
plugs that gap.

Andres Freund
2014-03-04 11:08:18 -05:00
Robert Haas b89e151054 Introduce logical decoding.
This feature, building on previous commits, allows the write-ahead log
stream to be decoded into a series of logical changes; that is,
inserts, updates, and deletes and the transactions which contain them.
It is capable of handling decoding even across changes to the schema
of the effected tables.  The output format is controlled by a
so-called "output plugin"; an example is included.  To make use of
this in a real replication system, the output plugin will need to be
modified to produce output in the format appropriate to that system,
and to perform filtering.

Currently, information can be extracted from the logical decoding
system only via SQL; future commits will add the ability to stream
changes via walsender.

Andres Freund, with review and other contributions from many other
people, including Álvaro Herrera, Abhijit Menon-Sen, Peter Gheogegan,
Kevin Grittner, Robert Haas, Heikki Linnakangas, Fujii Masao, Abhijit
Menon-Sen, Michael Paquier, Simon Riggs, Craig Ringer, and Steve
Singer.
2014-03-03 16:32:18 -05:00
Robert Haas cf6aa68bbd Update a few comments to mention materialized views.
Etsuro Fujita
2014-02-25 13:40:12 -05:00
Tom Lane 769065c1b2 Prefer pg_any_to_server/pg_server_to_any over pg_do_encoding_conversion.
A large majority of the callers of pg_do_encoding_conversion were
specifying the database encoding as either source or target of the
conversion, meaning that we can use the less general functions
pg_any_to_server/pg_server_to_any instead.

The main advantage of using the latter functions is that they can make use
of a cached conversion-function lookup in the common case that the other
encoding is the current client_encoding.  It's notationally cleaner too in
most cases, not least because of the historical artifact that the latter
functions use "char *" rather than "unsigned char *" in their APIs.

Note that pg_any_to_server will apply an encoding verification step in
some cases where pg_do_encoding_conversion would have just done nothing.
This seems to me to be a good idea at most of these call sites, though
it partially negates the performance benefit.

Per discussion of bug #9210.
2014-02-23 16:59:05 -05:00
Robert Haas 5f173040e3 Avoid repeated name lookups during table and index DDL.
If the name lookups come to different conclusions due to concurrent
activity, we might perform some parts of the DDL on a different table
than other parts.  At least in the case of CREATE INDEX, this can be
used to cause the permissions checks to be performed against a
different table than the index creation, allowing for a privilege
escalation attack.

This changes the calling convention for DefineIndex, CreateTrigger,
transformIndexStmt, transformAlterTableStmt, CheckIndexCompatible
(in 9.2 and newer), and AlterTable (in 9.1 and older).  In addition,
CheckRelationOwnership is removed in 9.2 and newer and the calling
convention is changed in older branches.  A field has also been added
to the Constraint node (FkConstraint in 8.4).  Third-party code calling
these functions or using the Constraint node will require updating.

Report by Andres Freund.  Patch by Robert Haas and Andres Freund,
reviewed by Tom Lane.

Security: CVE-2014-0062
2014-02-17 09:33:31 -05:00
Noah Misch 537cbd35c8 Prevent privilege escalation in explicit calls to PL validators.
The primary role of PL validators is to be called implicitly during
CREATE FUNCTION, but they are also normal functions that a user can call
explicitly.  Add a permissions check to each validator to ensure that a
user cannot use explicit validator calls to achieve things he could not
otherwise achieve.  Back-patch to 8.4 (all supported versions).
Non-core procedural language extensions ought to make the same two-line
change to their own validators.

Andres Freund, reviewed by Tom Lane and Noah Misch.

Security: CVE-2014-0061
2014-02-17 09:33:31 -05:00
Noah Misch fea164a72a Shore up ADMIN OPTION restrictions.
Granting a role without ADMIN OPTION is supposed to prevent the grantee
from adding or removing members from the granted role.  Issuing SET ROLE
before the GRANT bypassed that, because the role itself had an implicit
right to add or remove members.  Plug that hole by recognizing that
implicit right only when the session user matches the current role.
Additionally, do not recognize it during a security-restricted operation
or during execution of a SECURITY DEFINER function.  The restriction on
SECURITY DEFINER is not security-critical.  However, it seems best for a
user testing his own SECURITY DEFINER function to see the same behavior
others will see.  Back-patch to 8.4 (all supported versions).

The SQL standards do not conflate roles and users as PostgreSQL does;
only SQL roles have members, and only SQL users initiate sessions.  An
application using PostgreSQL users and roles as SQL users and roles will
never attempt to grant membership in the role that is the session user,
so the implicit right to add or remove members will never arise.

The security impact was mostly that a role member could revoke access
from others, contrary to the wishes of his own grantor.  Unapproved role
member additions are less notable, because the member can still largely
achieve that by creating a view or a SECURITY DEFINER function.

Reviewed by Andres Freund and Tom Lane.  Reported, independently, by
Jonas Sundman and Noah Misch.

Security: CVE-2014-0060
2014-02-17 09:33:31 -05:00
Alvaro Herrera 801c2dc72c Separate multixact freezing parameters from xid's
Previously we were piggybacking on transaction ID parameters to freeze
multixacts; but since there isn't necessarily any relationship between
rates of Xid and multixact consumption, this turns out not to be a good
idea.

Therefore, we now have multixact-specific freezing parameters:

vacuum_multixact_freeze_min_age: when to remove multis as we come across
them in vacuum (default to 5 million, i.e. early in comparison to Xid's
default of 50 million)

vacuum_multixact_freeze_table_age: when to force whole-table scans
instead of scanning only the pages marked as not all visible in
visibility map (default to 150 million, same as for Xids).  Whichever of
both which reaches the 150 million mark earlier will cause a whole-table
scan.

autovacuum_multixact_freeze_max_age: when for cause emergency,
uninterruptible whole-table scans (default to 400 million, double as
that for Xids).  This means there shouldn't be more frequent emergency
vacuuming than previously, unless multixacts are being used very
rapidly.

Backpatch to 9.3 where multixacts were made to persist enough to require
freezing.  To avoid an ABI break in 9.3, VacuumStmt has a couple of
fields in an unnatural place, and StdRdOptions is split in two so that
the newly added fields can go at the end.

Patch by me, reviewed by Robert Haas, with additional input from Andres
Freund and Tom Lane.
2014-02-13 19:36:31 -03:00
Peter Eisentraut 66c04c981d Mark some more variables as static or include the appropriate header
Detected by clang's -Wmissing-variable-declarations.

From: Andres Freund <andres@anarazel.de>
2014-02-08 21:21:46 -05:00
Tom Lane 571addd729 Fix unsafe references to errno within error messaging logic.
Various places were supposing that errno could be expected to hold still
within an ereport() nest or similar contexts.  This isn't true necessarily,
though in some cases it accidentally failed to fail depending on how the
compiler chanced to order the subexpressions.  This class of thinko
explains recent reports of odd failures on clang-built versions, typically
missing or inappropriate HINT fields in messages.

Problem identified by Christian Kruse, who also submitted the patch this
commit is based on.  (I fixed a few issues in his patch and found a couple
of additional places with the same disease.)

Back-patch as appropriate to all supported branches.
2014-01-29 20:04:43 -05:00
Robert Haas 9347baa5bb Include planning time in EXPLAIN ANALYZE output.
This doesn't work for prepared queries, but it's not too easy to get
the information in that case and there's some debate as to exactly
what the right thing to measure is, so just do this for now.

Andreas Karlsson, with slight doc changes by me.
2014-01-29 16:09:15 -05:00
Stephen Frost fbe19ee3b8 ALTER TABLESPACE ... MOVE ... OWNED BY
Add the ability to specify the objects to move by who those objects are
owned by (as relowner) and change ALL to mean ALL objects.  This
makes the command always operate against a well-defined set of objects
and not have the objects-to-be-moved based on the role of the user
running the command.

Per discussion with Simon and Tom.
2014-01-23 23:52:40 -05:00
Alvaro Herrera b152c6cd0d Make DROP IF EXISTS more consistently not fail
Some cases were still reporting errors and aborting, instead of a NOTICE
that the object was being skipped.  This makes it more difficult to
cleanly handle pg_dump --clean, so change that to instead skip missing
objects properly.

Per bug #7873 reported by Dave Rolsky; apparently this affects a large
number of users.

Authors: Pavel Stehule and Dean Rasheed.  Some tweaks by Álvaro Herrera
2014-01-23 14:40:29 -03:00
Alvaro Herrera d2458e3b20 Expose a routine to print triggers during EXPLAIN ANALYZE
This is so that auto_explain can use it.

Kyotaro HORIGUCHI
2014-01-20 17:13:47 -03:00
Fujii Masao 5363c7f2bc Fix typo in comment.
Sawada Masahiko
2014-01-21 02:24:17 +09:00
Simon Riggs 4d1e2aeb1a Speed up COPY into tables with DEFAULT nextval()
Previously the presence of a nextval() prevented the
use of batch-mode COPY.  This patch introduces a
special case just for nextval() functions. In future
we will introduce a general case solution for
labelling volatile functions as safe for use.
2014-01-20 17:22:38 +00:00
Stephen Frost 5254958e92 Add CREATE TABLESPACE ... WITH ... Options
Tablespaces have a few options which can be set on them to give PG hints
as to how the tablespace behaves (perhaps it's faster for sequential
scans, or better able to handle random access, etc).  These options were
only available through the ALTER TABLESPACE command.

This adds the ability to set these options at CREATE TABLESPACE time,
removing the need to do both a CREATE TABLESPACE and ALTER TABLESPACE to
get the correct options set on the tablespace.

Vik Fearing, reviewed by Michael Paquier.
2014-01-18 20:59:31 -05:00
Tom Lane 115f414124 Fix VACUUM's reporting of dead-tuple counts to the stats collector.
Historically, VACUUM has just reported its new_rel_tuples estimate
(the same thing it puts into pg_class.reltuples) to the stats collector.
That number counts both live and dead-but-not-yet-reclaimable tuples.
This behavior may once have been right, but modern versions of the
pgstats code track live and dead tuple counts separately, so putting
the total into n_live_tuples and zero into n_dead_tuples is surely
pretty bogus.  Fix it to report live and dead tuple counts separately.

This doesn't really do much for situations where updating transactions
commit concurrently with a VACUUM scan (possibly causing double-counting or
omission of the tuples they add or delete); but it's clearly an improvement
over what we were doing before.

Hari Babu, reviewed by Amit Kapila
2014-01-18 19:24:33 -05:00
Stephen Frost 76e91b38ba Add ALTER TABLESPACE ... MOVE command
This adds a 'MOVE' sub-command to ALTER TABLESPACE which allows moving sets of
objects from one tablespace to another.  This can be extremely handy and avoids
a lot of error-prone scripting.  ALTER TABLESPACE ... MOVE will only move
objects the user owns, will notify the user if no objects were found, and can
be used to move ALL objects or specific types of objects (TABLES, INDEXES, or
MATERIALIZED VIEWS).
2014-01-18 18:56:40 -05:00
Stephen Frost 6f25c62d78 Allow SET TABLESPACE to database default
We've always allowed CREATE TABLE to create tables in the database's default
tablespace without checking for CREATE permissions on that tablespace.
Unfortunately, the original implementation of ALTER TABLE ... SET TABLESPACE
didn't pick up on that exception.

This changes ALTER TABLE ... SET TABLESPACE to allow the database's default
tablespace without checking for CREATE rights on that tablespace, just as
CREATE TABLE works today.  Users could always do this through a series of
commands (CREATE TABLE ... AS SELECT * FROM ...; DROP TABLE ...; etc), so
let's fix the oversight in SET TABLESPACE's original implementation.
2014-01-18 18:41:52 -05:00
Tom Lane 0d79c0a8cc Make various variables const (read-only).
These changes should generally improve correctness/maintainability.
A nice side benefit is that several kilobytes move from initialized
data to text segment, allowing them to be shared across processes and
probably reducing copy-on-write overhead while forking a new backend.
Unfortunately this doesn't seem to help libpq in the same way (at least
not when it's compiled with -fpic on x86_64), but we can hope the linker
at least collects all nominally-const data together even if it's not
actually part of the text segment.

Also, make pg_encname_tbl[] static in encnames.c, since there seems
no very good reason for any other code to use it; per a suggestion
from Wim Lewis, who independently submitted a patch that was mostly
a subset of this one.

Oskari Saarenmaa, with some editorialization by me
2014-01-18 16:04:32 -05:00
Robert Haas 2bb1f14b89 Make bitmap heap scans show exact/lossy block info in EXPLAIN ANALYZE.
Etsuro Fujita
2014-01-13 14:42:16 -05:00
Tom Lane 6286526207 Fix compute_scalar_stats() for case that all values exceed WIDTH_THRESHOLD.
The standard typanalyze functions skip over values whose detoasted size
exceeds WIDTH_THRESHOLD (1024 bytes), so as to limit memory bloat during
ANALYZE.  However, we (I think I, actually :-() failed to consider the
possibility that *every* non-null value in a column is too wide.  While
compute_minimal_stats() seems to behave reasonably anyway in such a case,
compute_scalar_stats() just fell through and generated no pg_statistic
entry at all.  That's unnecessarily pessimistic: we can still produce
valid stanullfrac and stawidth values in such cases, since we do include
too-wide values in the average-width calculation.  Furthermore, since the
general assumption in this code is that too-wide values are probably all
distinct from each other, it seems reasonable to set stadistinct to -1
("all distinct").

Per complaint from Kadri Raudsepp.  This has been like this since roughly
neolithic times, so back-patch to all supported branches.
2014-01-11 13:42:42 -05:00
Bruce Momjian 7e04792a1c Update copyright for 2014
Update all files in head, and files COPYRIGHT and legal.sgml in all back
branches.
2014-01-07 16:05:30 -05:00
Peter Eisentraut edc43458d7 Add more use of psprintf() 2014-01-06 21:30:26 -05:00
Alvaro Herrera 1a3e82a7f9 Restore some comments lost during 15732b34e8
Michael Paquier
2014-01-03 13:22:03 -03:00
Robert Haas 3cff1879f8 Aggressively freeze tables when CLUSTER or VACUUM FULL rewrites them.
We haven't wanted to do this in the past on the grounds that in rare
cases the original xmin value will be needed for forensic purposes, but
commit 37484ad2aa removes that objection,
so now we can.

Per extensive discussion, among many people, on pgsql-hackers.
2014-01-02 15:15:51 -05:00
Tom Lane c01bc51f8d Fix broken support for event triggers as extension members.
CREATE EVENT TRIGGER forgot to mark the event trigger as a member of its
extension, and pg_dump didn't pay any attention anyway when deciding
whether to dump the event trigger.  Per report from Moshe Jacobson.

Given the obvious lack of testing here, it's rather astonishing that
ALTER EXTENSION ADD/DROP EVENT TRIGGER work, but they seem to.
2013-12-30 14:00:02 -05:00
Tom Lane 8d65da1f01 Support ordered-set (WITHIN GROUP) aggregates.
This patch introduces generic support for ordered-set and hypothetical-set
aggregate functions, as well as implementations of the instances defined in
SQL:2008 (percentile_cont(), percentile_disc(), rank(), dense_rank(),
percent_rank(), cume_dist()).  We also added mode() though it is not in the
spec, as well as versions of percentile_cont() and percentile_disc() that
can compute multiple percentile values in one pass over the data.

Unlike the original submission, this patch puts full control of the sorting
process in the hands of the aggregate's support functions.  To allow the
support functions to find out how they're supposed to sort, a new API
function AggGetAggref() is added to nodeAgg.c.  This allows retrieval of
the aggregate call's Aggref node, which may have other uses beyond the
immediate need.  There is also support for ordered-set aggregates to
install cleanup callback functions, so that they can be sure that
infrastructure such as tuplesort objects gets cleaned up.

In passing, make some fixes in the recently-added support for variadic
aggregates, and make some editorial adjustments in the recent FILTER
additions for aggregates.  Also, simplify use of IsBinaryCoercible() by
allowing it to succeed whenever the target type is ANY or ANYELEMENT.
It was inconsistent that it dealt with other polymorphic target types
but not these.

Atri Sharma and Andrew Gierth; reviewed by Pavel Stehule and Vik Fearing,
and rather heavily editorialized upon by Tom Lane
2013-12-23 16:11:35 -05:00
Robert Haas 37484ad2aa Change the way we mark tuples as frozen.
Instead of changing the tuple xmin to FrozenTransactionId, the combination
of HEAP_XMIN_COMMITTED and HEAP_XMIN_INVALID, which were previously never
set together, is now defined as HEAP_XMIN_FROZEN.  A variety of previous
proposals to freeze tuples opportunistically before vacuum_freeze_min_age
is reached have foundered on the objection that replacing xmin by
FrozenTransactionId might hinder debugging efforts when things in this
area go awry; this patch is intended to solve that problem by keeping
the XID around (but largely ignoring the value to which it is set).

Third-party code that checks for HEAP_XMIN_INVALID on tuples where
HEAP_XMIN_COMMITTED might be set will be broken by this change.  To fix,
use the new accessor macros in htup_details.h rather than consulting the
bits directly.  HeapTupleHeaderGetXmin has been modified to return
FrozenTransactionId when the infomask bits indicate that the tuple is
frozen; use HeapTupleHeaderGetRawXmin when you already know that the
tuple isn't marked commited or frozen, or want the raw value anyway.
We currently do this in routines that display the xmin for user consumption,
in tqual.c where it's known to be safe and important for the avoidance of
extra cycles, and in the function-caching code for various procedural
languages, which shouldn't invalidate the cache just because the tuple
gets frozen.

Robert Haas and Andres Freund
2013-12-22 15:49:09 -05:00
Bruce Momjian 527fdd9df1 Move pg_upgrade_support global variables to their own include file
Previously their declarations were spread around to avoid accidental
access.
2013-12-19 16:10:07 -05:00
Robert Haas 001a573a20 Allow on-detach callbacks for dynamic shared memory segments.
Just as backends must clean up their shared memory state (releasing
lwlocks, buffer pins, etc.) before exiting, they must also perform
any similar cleanups related to dynamic shared memory segments they
have mapped before unmapping those segments.  So add a mechanism to
ensure that.

Existing on_shmem_exit hooks include both "user level" cleanup such
as transaction abort and removal of leftover temporary relations and
also "low level" cleanup that forcibly released leftover shared
memory resources.  On-detach callbacks should run after the first
group but before the second group, so create a new before_shmem_exit
function for registering the early callbacks and keep on_shmem_exit
for the regular callbacks.  (An earlier draft of this patch added an
additional argument to on_shmem_exit, but that had a much larger
footprint and probably a substantially higher risk of breaking third
party code for no real gain.)

Patch by me, reviewed by KaiGai Kohei and Andres Freund.
2013-12-18 13:09:09 -05:00
Bruce Momjian dba5a9dda9 Comment: COPY comment improvement
Etsuro Fujita
2013-12-17 12:51:16 -05:00
Alvaro Herrera 3b97e6823b Rework tuple freezing protocol
Tuple freezing was broken in connection to MultiXactIds; commit
8e53ae025d tried to fix it, but didn't go far enough.  As noted by
Noah Misch, freezing a tuple whose Xmax is a multi containing an aborted
update might cause locks in the multi to go ignored by later
transactions.  This is because the code depended on a multixact above
their cutoff point not having any lock-only member older than the cutoff
point for Xids, which is easily defeated in READ COMMITTED transactions.

The fix for this involves creating a new MultiXactId when necessary.
But this cannot be done during WAL replay, and moreover multixact
examination requires using CLOG access routines which are not supposed
to be used during WAL replay either; so tuple freezing cannot be done
with the old freeze WAL record.  Therefore, separate the freezing
computation from its execution, and change the WAL record to carry all
necessary information.  At WAL replay time, it's easy to re-execute
freezing because we don't need to re-compute the new infomask/Xmax
values but just take them from the WAL record.

While at it, restructure the coding to ensure all page changes occur in
a single critical section without much room for failures.  The previous
coding wasn't using a critical section, without any explanation as to
why this was acceptable.

In replication scenarios using the 9.3 branch, standby servers must be
upgraded before their master, so that they are prepared to deal with the
new WAL record once the master is upgraded; failure to do so will cause
WAL replay to die with a PANIC message.  Later upgrade of the standby
will allow the process to continue where it left off, so there's no
disruption of the data in the standby in any case.  Standbys know how to
deal with the old WAL record, so it's okay to keep the master running
the old code for a while.

In master, the old freeze WAL record is gone, for cleanliness' sake;
there's no compatibility concern there.

Backpatch to 9.3, where the original bug was introduced and where the
previous fix was backpatched.

Álvaro Herrera and Andres Freund
2013-12-16 11:29:50 -03:00
Tom Lane 2efc6dc256 Add HOLD/RESUME_INTERRUPTS in HandleCatchupInterrupt/HandleNotifyInterrupt.
This prevents a possible longjmp out of the signal handler if a timeout
or SIGINT occurs while something within the handler has transiently set
ImmediateInterruptOK.  For safety we must hold off the timeout or cancel
error until we're back in mainline, or at least till we reach the end of
the signal handler when ImmediateInterruptOK was true at entry.  This
syncs these functions with the logic now present in handle_sig_alarm.

AFAICT there is no live bug here in 9.0 and up, because I don't think we
currently can wait for any heavyweight lock inside these functions, and
there is no other code (except read-from-client) that will turn on
ImmediateInterruptOK.  However, that was not true pre-9.0: in older
branches ProcessIncomingNotify might block trying to lock pg_listener, and
then a SIGINT could lead to undesirable control flow.  It might be all
right anyway given the relatively narrow code ranges in which NOTIFY
interrupts are enabled, but for safety's sake I'm back-patching this.
2013-12-13 14:05:51 -05:00
Heikki Linnakangas dde6282500 Fix more instances of "the the" in comments.
Plus one instance of "to to" in the docs.
2013-12-13 20:02:01 +02:00
Heikki Linnakangas a49633d8dc Fix WAL-logging of setting the visibility map bit.
The operation that removes the remaining dead tuples from the page must
be WAL-logged before the setting of the VM bit. Otherwise, if you replay
the WAL to between those two records, you end up with the VM bit set, but
the dead tuples are still there.

Backpatch to 9.3, where this bug was introduced.
2013-12-13 14:15:04 +02:00
Tom Lane f26099057a Improve EXPLAIN to print the grouping columns in Agg and Group nodes.
Per request from Kevin Grittner.
2013-12-12 11:24:38 -05:00
Simon Riggs 8693559cac New autovacuum_work_mem parameter
If autovacuum_work_mem is set, autovacuum workers now use
this parameter in preference to maintenance_work_mem.

Peter Geoghegan
2013-12-12 11:42:39 +00:00
Robert Haas 66abc2608c Add a new reloption, user_catalog_table.
When this reloption is set and wal_level=logical is configured,
we'll record the CIDs stamped by inserts, updates, and deletes to
the table just as we would for an actual catalog table.  This will
allow logical decoding to use historical MVCC snapshots to access
such tables just as they access ordinary catalog tables.

Replication solutions built around the logical decoding machinery
will likely need to set this operation for their configuration
tables; it might also be needed by extensions which perform table
access in their output functions.

Andres Freund, reviewed by myself and others.
2013-12-10 19:17:34 -05:00
Robert Haas e55704d8b2 Add new wal_level, logical, sufficient for logical decoding.
When wal_level=logical, we'll log columns from the old tuple as
configured by the REPLICA IDENTITY facility added in commit
07cacba983.  This makes it possible
a properly-configured logical replication solution to correctly
follow table updates even if they change the chosen key columns,
or, with REPLICA IDENTITY FULL, even if the table has no key at
all.  Note that updates which do not modify the replica identity
column won't log anything extra, making the choice of a good key
(i.e. one that will rarely be changed) important to performance
when wal_level=logical is configured.

Each insert, update, or delete to a catalog table will also log
the CMIN and/or CMAX values of stamped by the current transaction.
This is necessary because logical decoding will require access to
historical snapshots of the catalog in order to decode some data
types, and the CMIN/CMAX values that we may need in order to judge
row visibility may have been overwritten by the time we need them.

Andres Freund, reviewed in various versions by myself, Heikki
Linnakangas, KONDO Mitsumasa, and many others.
2013-12-10 19:01:40 -05:00
Heikki Linnakangas 9e857436ef Don't include unused space in LOG_NEWPAGE records.
This is the same trick we use when taking a full page image of a buffer
passed to XLogInsert.
2013-12-04 00:10:47 +02:00
Alvaro Herrera f54106f77e Fix full-table-vacuum request mechanism for MultiXactIds
While autovacuum dutifully launched anti-multixact-wraparound vacuums
when the multixact "age" was reached, the vacuum code was not aware that
it needed to make them be full table vacuums.  As the resulting
partial-table vacuums aren't capable of actually increasing relminmxid,
autovacuum continued to launch anti-wraparound vacuums that didn't have
the intended effect, until age of relfrozenxid caused the vacuum to
finally be a full table one via vacuum_freeze_table_age.

To fix, introduce logic for multixacts similar to that for plain
TransactionIds, using the same GUCs.

Backpatch to 9.3, where permanent MultiXactIds were introduced.

Andres Freund, some cleanup by Álvaro
2013-11-29 21:47:13 -03:00
Robert Haas 8e18d04d4d Refine our definition of what constitutes a system relation.
Although user-defined relations can't be directly created in
pg_catalog, it's possible for them to end up there, because you can
create them in some other schema and then use ALTER TABLE .. SET SCHEMA
to move them there.  Previously, such relations couldn't afterwards
be manipulated, because IsSystemRelation()/IsSystemClass() rejected
all attempts to modify objects in the pg_catalog schema, regardless
of their origin.  With this patch, they now reject only those
objects in pg_catalog which were created at initdb-time, allowing
most operations on user-created tables in pg_catalog to proceed
normally.

This patch also adds new functions IsCatalogRelation() and
IsCatalogClass(), which is similar to IsSystemRelation() and
IsSystemClass() but with a slightly narrower definition: only TOAST
tables of system catalogs are included, rather than *all* TOAST tables.
This is currently used only for making decisions about when
invalidation messages need to be sent, but upcoming logical decoding
patches will find other uses for this information.

Andres Freund, with some modifications by me.
2013-11-28 20:57:20 -05:00
Heikki Linnakangas 82b43f7df2 Don't update relfrozenxid if any pages were skipped.
Vacuum recognizes that it can update relfrozenxid by checking whether it has
processed all pages of a relation. Unfortunately it performed that check
after truncating the dead pages at the end of the relation, and used the new
number of pages to decide whether all pages have been scanned. If the new
number of pages happened to be smaller or equal to the number of pages
scanned, it incorrectly decided that all pages were scanned.

This can lead to relfrozenxid being updated, even though some pages were
skipped that still contain old XIDs. That can lead to data loss due to xid
wraparounds with some rows suddenly missing. This likely has escaped notice
so far because it takes a large number (~2^31) of xids being used to see the
effect, while a full-table vacuum before that would fix the issue.

The incorrect logic was introduced by commit
b4b6923e03. Backpatch this fix down to 8.4,
like that commit.

Andres Freund, with some modifications by me.
2013-11-27 13:43:27 +02:00
Tom Lane 784e762e88 Support multi-argument UNNEST(), and TABLE() syntax for multiple functions.
This patch adds the ability to write TABLE( function1(), function2(), ...)
as a single FROM-clause entry.  The result is the concatenation of the
first row from each function, followed by the second row from each
function, etc; with NULLs inserted if any function produces fewer rows than
others.  This is believed to be a much more useful behavior than what
Postgres currently does with multiple SRFs in a SELECT list.

This syntax also provides a reasonable way to combine use of column
definition lists with WITH ORDINALITY: put the column definition list
inside TABLE(), where it's clear that it doesn't control the ordinality
column as well.

Also implement SQL-compliant multiple-argument UNNEST(), by turning
UNNEST(a,b,c) into TABLE(unnest(a), unnest(b), unnest(c)).

The SQL standard specifies TABLE() with only a single function, not
multiple functions, and it seems to require an implicit UNNEST() which is
not what this patch does.  There may be something wrong with that reading
of the spec, though, because if it's right then the spec's TABLE() is just
a pointless alternative spelling of UNNEST().  After further review of
that, we might choose to adopt a different syntax for what this patch does,
but in any case this functionality seems clearly worthwhile.

Andrew Gierth, reviewed by Zoltán Böszörményi and Heikki Linnakangas, and
significantly revised by me
2013-11-21 19:37:20 -05:00
Heikki Linnakangas 4c697d8f48 Count locked pages that don't need vacuuming as scanned.
Previously, if VACUUM skipped vacuuming a page because it's pinned, it
didn't count that page as scanned. However, that meant that relfrozenxid
was not bumped up either, which prevented anti-wraparound vacuum from
doing its job.

Report by Миша Тюрин, analysis and patch by Sergey Burladyn and Jeff Janes.
Backpatch to 9.2, where the skip-locked-pages behavior was introduced.
2013-11-18 09:51:09 +02:00
Tom Lane 6cb86143e8 Allow aggregates to provide estimates of their transition state data size.
Formerly the planner had a hard-wired rule of thumb for guessing the amount
of space consumed by an aggregate function's transition state data.  This
estimate is critical to deciding whether it's OK to use hash aggregation,
and in many situations the built-in estimate isn't very good.  This patch
adds a column to pg_aggregate wherein a per-aggregate estimate can be
provided, overriding the planner's default, and infrastructure for setting
the column via CREATE AGGREGATE.

It may be that additional smarts will be required in future, perhaps even
a per-aggregate estimation function.  But this is already a step forward.

This is extracted from a larger patch to improve the performance of numeric
and int8 aggregates.  I (tgl) thought it was worth reviewing and committing
this infrastructure separately.  In this commit, all built-in aggregates
are given aggtransspace = 0, so no behavior should change.

Hadi Moshayedi, reviewed by Pavel Stehule and Tomas Vondra
2013-11-16 16:03:40 -05:00
Tom Lane 80e3a470ba Minor comment corrections for sequence hashtable patch.
There were enough typos in the comments to annoy me ...
2013-11-15 12:17:12 -05:00
Heikki Linnakangas 5cb719beee Fix bogus hash table creation.
Andres Freund
2013-11-15 14:23:40 +02:00
Heikki Linnakangas 21025d4a53 Use a hash table to store current sequence values.
This speeds up nextval() and currval(), when you touch a lot of different
sequences in the same backend.

David Rowley
2013-11-15 12:29:38 +02:00
Peter Eisentraut 001e114b8d Fix whitespace issues found by git diff --check, add gitattributes
Set per file type attributes in .gitattributes to fine-tune whitespace
checks.  With the associated cleanups, the tree is now clean for git
2013-11-10 14:48:29 -05:00
Robert Haas 07cacba983 Add the notion of REPLICA IDENTITY for a table.
Pending patches for logical replication will use this to determine
which columns of a tuple ought to be considered as its candidate key.

Andres Freund, with minor, mostly cosmetic adjustments by me
2013-11-08 12:30:43 -05:00
Tom Lane 060b22a99a Fix subtly-wrong volatility checking in BeginCopyFrom().
contain_volatile_functions() is best applied to the output of
expression_planner(), not its input, so that insertion of function
default arguments and constant-folding have been done.  (See comments
at CheckMutability, for instance.)  It's perhaps unlikely that anyone
will notice a difference in practice, but still we should do it properly.

In passing, change variable type from Node* to Expr* to reduce the net
number of casts needed.

Noted while perusing uses of contain_volatile_functions().
2013-11-08 08:59:39 -05:00
Kevin Grittner 5829082a57 Keep heap open until new heap generated in RMV.
Early close became apparent when invalidation messages were
processed in a new location under CLOBBER_CACHE_ALWAYS builds, due
to additional locking.

Back-patch to 9.3
2013-11-06 12:27:52 -06:00
Kevin Grittner 2636ecf78b Lock relation used to generate fresh data for RMV.
The relation should not be accessible to any other process, but it
should be locked for consistency.  Since this is not known to
cause any bug, it will not be back-patch, at least for now.

Per report from Andres Freund
2013-11-05 15:36:33 -06:00
Kevin Grittner 2a781d57dc Acquire appropriate locks when rewriting during RMV.
Since the query has not been freshly parsed when executing REFRESH
MATERIALIZED VIEW, locks must be explicitly taken before rewrite.

Backpatch to 9.3.

Andres Freund
2013-11-02 19:18:08 -05:00
Tom Lane 45f64f1bbf Remove CTimeZone/HasCTZSet, root and branch.
These variables no longer have any useful purpose, since there's no reason
to special-case brute force timezones now that we have a valid
session_timezone setting for them.  Remove the variables, and remove the
SET/SHOW TIME ZONE code that deals with them.

The user-visible impact of this is that SHOW TIME ZONE will now show a
POSIX-style zone specification, in the form "<+-offset>-+offset", rather
than an interval value when a brute-force zone has been set.  While perhaps
less intuitive, this is a better definition than before because it's
actually possible to give that string back to SET TIME ZONE and get the
same behavior, unlike what used to happen.

We did not previously mention the angle-bracket syntax when describing
POSIX timezone specifications; add some documentation so that people
can figure out what these strings do.  (There's still quite a lot of
undocumented functionality there, but anybody who really cares can
go read the POSIX spec to find out about it.  In practice most people
seem to prefer Olsen-style city names anyway.)
2013-11-01 13:57:31 -04:00
Tom Lane 631dc390f4 Fix some odd behaviors when using a SQL-style simple GMT offset timezone.
Formerly, when using a SQL-spec timezone setting with a fixed GMT offset
(called a "brute force" timezone in the code), the session_timezone
variable was not updated to match the nominal timezone; rather, all code
was expected to ignore session_timezone if HasCTZSet was true.  This is
of course obviously fragile, though a search of the code finds only
timeofday() failing to honor the rule.  A bigger problem was that
DetermineTimeZoneOffset() supposed that if its pg_tz parameter was
pointer-equal to session_timezone, then HasCTZSet should override the
parameter.  This would cause datetime input containing an explicit zone
name to be treated as referencing the brute-force zone instead, if the
zone name happened to match the session timezone that had prevailed
before installing the brute-force zone setting (as reported in bug #8572).
The same malady could affect AT TIME ZONE operators.

To fix, set up session_timezone so that it matches the brute-force zone
specification, which we can do using the POSIX timezone definition syntax
"<abbrev>offset", and get rid of the bogus lookaside check in
DetermineTimeZoneOffset().  Aside from fixing the erroneous behavior in
datetime parsing and AT TIME ZONE, this will cause the timeofday() function
to print its result in the user-requested time zone rather than some
previously-set zone.  It might also affect results in third-party
extensions, if there are any that make use of session_timezone without
considering HasCTZSet, but in all cases the new behavior should be saner
than before.

Back-patch to all supported branches.
2013-11-01 12:13:18 -04:00
Robert Haas cacbdd7810 Use appendStringInfoString instead of appendStringInfo where possible.
This shaves a few cycles, and generally seems like good programming
practice.

David Rowley
2013-10-31 10:55:59 -04:00
Tom Lane c2b51cf190 Improve documentation about usage of FDW validator functions.
SGML documentation, as well as code comments, failed to note that an FDW's
validator will be applied to foreign-table options for foreign tables using
the FDW.

Etsuro Fujita
2013-10-28 10:28:35 -04:00
Heikki Linnakangas 83eb54001c Fix two bugs in setting the vm bit of empty pages.
Use a critical section when setting the all-visible flag on an empty page,
and WAL-logging it. log_newpage_buffer() contains an assertion that it
must be called inside a critical section, and it's the right thing to do
when modifying a buffer anyway.

Also, the page should be marked dirty before calling log_newpage_buffer(),
per the comment in log_newpage_buffer() and src/backend/access/transam/README.

Patch by Andres Freund, in response to my report. Backpatch to 9.2, like
the patch that introduced these bugs (a6370fd9).
2013-10-23 14:24:37 +03:00
Robert Haas cab5dc5daf Allow only some columns of a view to be auto-updateable.
Previously, unless all columns were auto-updateable, we wouldn't
inserts, updates, or deletes, or at least not without a rule or trigger;
now, we'll allow inserts and updates that target only the auto-updateable
columns, and deletes even if there are no auto-updateable columns at
all provided the view definition is otherwise suitable.

Dean Rasheed, reviewed by Marko Tiikkaja
2013-10-18 10:35:36 -04:00
Peter Eisentraut 5b6d08cd29 Add use of asprintf()
Add asprintf(), pg_asprintf(), and psprintf() to simplify string
allocation and composition.  Replacement implementations taken from
NetBSD.

Reviewed-by: Álvaro Herrera <alvherre@2ndquadrant.com>
Reviewed-by: Asif Naeem <anaeem.it@gmail.com>
2013-10-13 00:09:18 -04:00
Kevin Grittner f566515192 Add record_image_ops opclass for matview concurrent refresh.
REFRESH MATERIALIZED VIEW CONCURRENTLY was broken for any matview
containing a column of a type without a default btree operator
class.  It also did not produce results consistent with a non-
concurrent REFRESH or a normal view if any column was of a type
which allowed user-visible differences between values which
compared as equal according to the type's default btree opclass.
Concurrent matview refresh was modified to use the new operators
to solve these problems.

Documentation was added for record comparison, both for the
default btree operator class for record, and the newly added
operators.  Regression tests now check for proper behavior both
for a matview with a box column and a matview containing a citext
column.

Reviewed by Steve Singer, who suggested some of the doc language.
2013-10-09 14:26:09 -05:00
Robert Haas 16a906f535 Make DISCARD SEQUENCES also discard the last used sequence.
Otherwise, we access already-freed memory.  Oops.

Report by Michael Paquier.  Fix by me.
2013-10-07 15:55:56 -04:00
Robert Haas 0f1ef79095 Fix silly thinko in ResetSequenceCaches.
Report from Kevin Hale Boyes.
2013-10-03 20:17:51 -04:00
Robert Haas d90ced8bb2 Add DISCARD SEQUENCES command.
DISCARD ALL will now discard cached sequence information, as well.

Fabrízio de Royes Mello, reviewed by Zoltán Böszörményi, with some
further tweaks by me.
2013-10-03 16:23:31 -04:00
Alvaro Herrera 15732b34e8 Add WaitForLockers in lmgr, refactoring index.c code
This is in support of a future REINDEX CONCURRENTLY feature.

Michael Paquier
2013-10-01 17:57:01 -03:00
Heikki Linnakangas adaba2751f Fix spurious warning after vacuuming a page on a table with no indexes.
There is a rare race condition, when a transaction that inserted a tuple
aborts while vacuum is processing the page containing the inserted tuple.
Vacuum prunes the page first, which normally removes any dead tuples, but
if the inserting transaction aborts right after that, the loop after
pruning will see a dead tuple and remove it instead. That's OK, but if the
page is on a table with no indexes, and the page becomes completely empty
after removing the dead tuple (or tuples) on it, it will be immediately
marked as all-visible. That's OK, but the sanity check in vacuum would
throw a warning because it thinks that the page contains dead tuples and
was nevertheless marked as all-visible, even though it just vacuumed away
the dead tuples and so it doesn't actually contain any.

Spotted this while reading the code. It's difficult to hit the race
condition otherwise, but can be done by putting a breakpoint after the
heap_page_prune() call.

Backpatch all the way to 8.4, where this code first appeared.
2013-09-26 11:31:53 +03:00
Robert Haas ba3d39c969 Don't allow system columns in CHECK constraints, except tableoid.
Previously, arbitray system columns could be mentioned in table
constraints, but they were not correctly checked at runtime, because
the values weren't actually set correctly in the tuple.  Since it
seems easy enough to initialize the table OID properly, do that,
and continue allowing that column, but disallow the rest unless and
until someone figures out a way to make them work properly.

No back-patch, because this doesn't seem important enough to take the
risk of destabilizing the back branches.  In fact, this will pose a
dump-and-reload hazard for those upgrading from previous versions:
constraints that were accepted before but were not correctly enforced
will now either be enforced correctly or not accepted at all.  Either
could result in restore failures, but in practice I think very few
users will notice the difference, since the use case is pretty
marginal anyway and few users will be relying on features that have
not historically worked.

Amit Kapila, reviewed by Rushabh Lathia, with doc changes by me.
2013-09-23 13:31:22 -04:00
Alvaro Herrera dd778e9d88 Rename various "freeze multixact" variables
It seems to make more sense to use "cutoff multixact" terminology
throughout the backend code; "freeze" is associated with replacing of an
Xid with FrozenTransactionId, which is not what we do for MultiXactIds.

Andres Freund
Some adjustments by Álvaro Herrera
2013-09-16 15:47:31 -03:00
Tom Lane 0c66a22377 Update comments concerning PGC_S_TEST.
This GUC context value was once only used by ALTER DATABASE SET and
ALTER USER SET.  That's not true anymore, though, so rewrite the
comments to be a bit more general.

Patch in HEAD only, since this is just an internal documentation issue.
2013-09-03 18:56:22 -04:00
Tom Lane 0d3f4406df Allow aggregate functions to be VARIADIC.
There's no inherent reason why an aggregate function can't be variadic
(even VARIADIC ANY) if its transition function can handle the case.
Indeed, this patch to add the feature touches none of the planner or
executor, and little of the parser; the main missing stuff was DDL and
pg_dump support.

It is true that variadic aggregates can create the same sort of ambiguity
about parameters versus ORDER BY keys that was complained of when we
(briefly) had both one- and two-argument forms of string_agg().  However,
the policy formed in response to that discussion only said that we'd not
create any built-in aggregates with varying numbers of arguments, not that
we shouldn't allow users to do it.  So the logical extension of that is
we can allow users to make variadic aggregates as long as we're wary about
shipping any such in core.

In passing, this patch allows aggregate function arguments to be named, to
the extent of remembering the names in pg_proc and dumping them in pg_dump.
You can't yet call an aggregate using named-parameter notation.  That seems
like a likely future extension, but it'll take some work, and it's not what
this patch is really about.  Likewise, there's still some work needed to
make window functions handle VARIADIC fully, but I left that for another
day.

initdb forced because of new aggvariadic field in Aggref parse nodes.
2013-09-03 17:08:46 -04:00
Robert Haas 090d0f2050 Allow discovery of whether a dynamic background worker is running.
Using the infrastructure provided by this patch, it's possible either
to wait for the startup of a dynamically-registered background worker,
or to poll the status of such a worker without waiting.  In either
case, the current PID of the worker process can also be obtained.
As usual, worker_spi is updated to demonstrate the new functionality.

Patch by me.  Review by Andres Freund.
2013-08-28 14:08:13 -04:00
Kevin Grittner 28154bb23b Remove relcache entry invalidation in REFRESH MATERIALIZED VIEW.
This was added as part of the attempt to support unlogged matviews
along with a populated status.  It got missed when unlogged
support was removed pre-commit.

Noticed by Noah Misch.  Back-patched to 9.3 branch.
2013-08-18 16:19:22 -05:00
Kevin Grittner 3f78b1715c Don't allow ALTER MATERIALIZED VIEW ADD UNIQUE.
Was accidentally allowed, but not documented and lacked support
for rename or drop once created.

Per report from Noah Misch.
2013-08-15 13:14:48 -05:00
Kevin Grittner e2cd368678 Remove Assert that matview is not in system schema from REFRESH.
We don't want to prevent an extension which creates a matview from
being installed in pg_catalog.

Issue was raised by Hitoshi Harada.
Backpatched to 9.3.
2013-08-14 12:36:55 -05:00
Kevin Grittner 841c29c8b3 Various cleanups for REFRESH MATERIALIZED VIEW CONCURRENTLY.
Open and lock each index before checking definition in RMVC.  The
ExclusiveLock on the related table is not viewed as sufficient to
ensure that no changes are made to the index definition, and
invalidation messages from other backends might have been missed.
Additionally, use RelationGetIndexExpressions() and check for NIL
rather than doing our own loop.

Protect against redefinition of tid and rowvar operators in RMVC.
While working on this, noticed that the fixes for bugs found during
the CF made the UPDATE statement useless, since no rows could
qualify for that treatment any more.  Ripping out code to support
the UPDATE statement simplified the operator cleanups.

Change slightly confusing local field name.

Use meaningful alias names on queries in refresh_by_match_merge().

Per concerns of raised by Andres Freund and comments and
suggestions from Noah Misch.  Some additional issues remain, which
will be addressed separately.
2013-08-05 09:57:56 -05:00