Commit Graph

26808 Commits

Author SHA1 Message Date
Tom Lane 1dc5ebc907 Support "expanded" objects, particularly arrays, for better performance.
This patch introduces the ability for complex datatypes to have an
in-memory representation that is different from their on-disk format.
On-disk formats are typically optimized for minimal size, and in any case
they can't contain pointers, so they are often not well-suited for
computation.  Now a datatype can invent an "expanded" in-memory format
that is better suited for its operations, and then pass that around among
the C functions that operate on the datatype.  There are also provisions
(rudimentary as yet) to allow an expanded object to be modified in-place
under suitable conditions, so that operations like assignment to an element
of an array need not involve copying the entire array.

The initial application for this feature is arrays, but it is not hard
to foresee using it for other container types like JSON, XML and hstore.
I have hopes that it will be useful to PostGIS as well.

In this initial implementation, a few heuristics have been hard-wired
into plpgsql to improve performance for arrays that are stored in
plpgsql variables.  We would like to generalize those hacks so that
other datatypes can obtain similar improvements, but figuring out some
appropriate APIs is left as a task for future work.  (The heuristics
themselves are probably not optimal yet, either, as they sometimes
force expansion of arrays that would be better left alone.)

Preliminary performance testing shows impressive speed gains for plpgsql
functions that do element-by-element access or update of large arrays.
There are other cases that get a little slower, as a result of added array
format conversions; but we can hope to improve anything that's annoyingly
bad.  In any case most applications should see a net win.

Tom Lane, reviewed by Andres Freund
2015-05-14 12:08:49 -04:00
Robert Haas 61f68e0bed Fix comment.
Commit 78efd5c1ed overlooked this.

Report by Peter Geoghegan.
2015-05-13 15:27:41 -04:00
Robert Haas 78efd5c1ed Extend abbreviated key infrastructure to datum tuplesorts.
Andrew Gierth, reviewed by Peter Geoghegan and by me.
2015-05-13 14:36:26 -04:00
Tom Lane 0bb8528b5c Fix postgres_fdw to return the right ctid value in EvalPlanQual cases.
If a postgres_fdw foreign table is a non-locked source relation in an
UPDATE, DELETE, or SELECT FOR UPDATE/SHARE, and the query selects its
ctid column, the wrong value would be returned if an EvalPlanQual
recheck occurred.  This happened because the foreign table's result row
was copied via the ROW_MARK_COPY code path, and EvalPlanQualFetchRowMarks
just unconditionally set the reconstructed tuple's t_self to "invalid".

To fix that, we can have EvalPlanQualFetchRowMarks copy the composite
datum's t_ctid field, and be sure to initialize that along with t_self
when postgres_fdw constructs a tuple to return.

If we just did that much then EvalPlanQualFetchRowMarks would start
returning "(0,0)" as ctid for all other ROW_MARK_COPY cases, which perhaps
does not matter much, but then again maybe it might.  The cause of that is
that heap_form_tuple, which is the ultimate source of all composite datums,
simply leaves t_ctid as zeroes in newly constructed tuples.  That seems
like a bad idea on general principles: a field that's really not been
initialized shouldn't appear to have a valid value.  So let's eat the
trivial additional overhead of doing "ItemPointerSetInvalid(&(td->t_ctid))"
in heap_form_tuple.

This closes out our handling of Etsuro Fujita's report that tableoid and
ctid weren't correctly set in postgres_fdw EvalPlanQual cases.  Along the
way we did a great deal of work to improve FDWs' ability to control row
locking behavior; which was not wasted effort by any means, but it didn't
end up being a fix for this problem because that feature would be too
expensive for postgres_fdw to use all the time.

Although the fix for the tableoid misbehavior was back-patched, I'm
hesitant to do so here; it seems far less likely that people would care
about remote ctid than tableoid, and even such a minor behavioral change
as this in heap_form_tuple is perhaps best not back-patched.  So commit
to HEAD only, at least for the moment.

Etsuro Fujita, with some adjustments by me
2015-05-13 14:05:29 -04:00
Andrew Dunstan 3f2cec797e Fix jsonb replace and delete on scalars and empty structures
These operations now error out if attempted on scalars, and simply
return the input if attempted on empty arrays or objects. Along the way
we remove the unnecessary cloning of the input when it's known to be
unchanged. Regression tests covering these cases are added.
2015-05-13 13:52:08 -04:00
Robert Haas ae6157164f Remove useless assertion.
Here, snapshot->xcnt is an unsigned type, so it will always be
non-negative.
2015-05-13 11:01:10 -04:00
Peter Eisentraut dcf5e31908 PL/Python: Remove procedure cache invalidation
This was added to react to changes in the pg_transform catalog, but
building with CLOBBER_CACHE_ALWAYS showed that PL/Python was not
prepared for having its procedure cache cleared.  Since this is a
marginal use case, and we don't do this for other catalogs anyway, we
can postpone this to another day.
2015-05-12 22:52:18 -04:00
Andres Freund 4af6e61a36 Fix ON CONFLICT bugs that manifest when used in rules.
Specifically the tlist and rti of the pseudo "excluded" relation weren't
properly treated by expression_tree_walker, which lead to errors when
excluded was referenced inside a rule because the varnos where not
properly adjusted.  Similar omissions in OffsetVarNodes and
expression_tree_mutator had less impact, but should obviously be fixed
nonetheless.

A couple tests of for ON CONFLICT UPDATE into INSERT rule bearing
relations have been added.

In passing I updated a couple comments.
2015-05-13 00:13:22 +02:00
Andrew Dunstan 5c7df74204 Fix some errors from jsonb functions patch.
The catalog version should have been bumped, and the alternative
regression result file was not up to date with the name of jsonb_pretty.
2015-05-12 16:54:38 -04:00
Andrew Dunstan c6947010ce Additional functions and operators for jsonb
jsonb_pretty(jsonb) produces nicely indented json output.
jsonb || jsonb concatenates two jsonb values.
jsonb - text removes a key and its associated value from the json
jsonb - int removes the designated array element
jsonb - text[] removes a key and associated value or array element at
the designated path
jsonb_replace(jsonb,text[],jsonb) replaces the array element designated
by the path or the value associated with the key designated by the path
with the given value.

Original work by Dmitry Dolgov, adapted and reworked for PostgreSQL core
by Andrew Dunstan, reviewed and tidied up by Petr Jelinek.
2015-05-12 15:52:45 -04:00
Tom Lane afb9249d06 Add support for doing late row locking in FDWs.
Previously, FDWs could only do "early row locking", that is lock a row as
soon as it's fetched, even though local restriction/join conditions might
discard the row later.  This patch adds callbacks that allow FDWs to do
late locking in the same way that it's done for regular tables.

To make use of this feature, an FDW must support the "ctid" column as a
unique row identifier.  Currently, since ctid has to be of type TID,
the feature is of limited use, though in principle it could be used by
postgres_fdw.  We may eventually allow FDWs to specify another data type
for ctid, which would make it possible for more FDWs to use this feature.

This commit does not modify postgres_fdw to use late locking.  We've
tested some prototype code for that, but it's not in committable shape,
and besides it's quite unclear whether it actually makes sense to do late
locking against a remote server.  The extra round trips required are likely
to outweigh any benefit from improved concurrency.

Etsuro Fujita, reviewed by Ashutosh Bapat, and hacked up a lot by me
2015-05-12 14:10:17 -04:00
Stephen Frost aa4a0b9571 pgbench: Don't fail during startup
In pgbench, report, but ignore, any errors returned when attempting to
vacuum/truncate the default tables during startup.  If the tables are
needed, we'll error out soon enough anyway.

Per discussion with Tatsuo, David Rowley, Jim Nasby, Robert, Andres,
Fujii, Fabrízio de Royes Mello, Tomas Vondra, Michael Paquier, Peter,
based on a suggestion from Jeff Janes, patch from Robert, additional
message wording from Tom.
2015-05-12 13:13:12 -04:00
Andrew Dunstan 97e0aa6979 pg_basebackup -F t now succeeds with a long symlink target 2015-05-12 13:09:34 -04:00
Bruce Momjian ea12b3ca8c doc build: use unique Makefile variable to control temp install 2015-05-12 12:30:50 -04:00
Alvaro Herrera 007c932e5a "Fix" test_ddl_deparse regress test schedule
MSVC is not smart enough to figure it out, so dumb down the Makefile and
remove the schedule file.

Also add a .gitignore file.

Author: Michael Paquier
2015-05-12 12:12:39 -03:00
Bruce Momjian e8c19263e4 doc: prevent SGML 'make check' from building temp install
Report by Alvaro Herrera
2015-05-12 11:01:25 -04:00
Andrew Dunstan 72d422a522 Map basebackup tablespaces using a tablespace_map file
Windows can't reliably restore symbolic links from a tar format, so
instead during backup start we create a tablespace_map file, which is
used by the restoring postgres to create the correct links in pg_tblspc.
The backup protocol also now has an option to request this file to be
included in the backup stream, and this is used by pg_basebackup when
operating in tar mode.

This is done on all platforms, not just Windows.

This means that pg_basebackup will not not work in tar mode against 9.4
and older servers, as this protocol option isn't implemented there.

Amit Kapila, reviewed by Dilip Kumar, with a little editing from me.
2015-05-12 09:29:10 -04:00
Peter Eisentraut d02f16470f Replace some appendStringInfo* calls with more appropriate variants
Author: David Rowley <dgrowleyml@gmail.com>
2015-05-11 20:38:55 -04:00
Alvaro Herrera b488c580ae Allow on-the-fly capture of DDL event details
This feature lets user code inspect and take action on DDL events.
Whenever a ddl_command_end event trigger is installed, DDL actions
executed are saved to a list which can be inspected during execution of
a function attached to ddl_command_end.

The set-returning function pg_event_trigger_ddl_commands can be used to
list actions so captured; it returns data about the type of command
executed, as well as the affected object.  This is sufficient for many
uses of this feature.  For the cases where it is not, we also provide a
"command" column of a new pseudo-type pg_ddl_command, which is a
pointer to a C structure that can be accessed by C code.  The struct
contains all the info necessary to completely inspect and even
reconstruct the executed command.

There is no actual deparse code here; that's expected to come later.
What we have is enough infrastructure that the deparsing can be done in
an external extension.  The intention is that we will add some deparsing
code in a later release, as an in-core extension.

A new test module is included.  It's probably insufficient as is, but it
should be sufficient as a starting point for a more complete and
future-proof approach.

Authors: Álvaro Herrera, with some help from Andres Freund, Ian Barwick,
Abhijit Menon-Sen.

Reviews by Andres Freund, Robert Haas, Amit Kapila, Michael Paquier,
Craig Ringer, David Steele.
Additional input from Chris Browne, Dimitri Fontaine, Stephen Frost,
Petr Jelínek, Tom Lane, Jim Nasby, Steven Singer, Pavel Stěhule.

Based on original work by Dimitri Fontaine, though I didn't use his
code.

Discussion:
  https://www.postgresql.org/message-id/m2txrsdzxa.fsf@2ndQuadrant.fr
  https://www.postgresql.org/message-id/20131108153322.GU5809@eldon.alvh.no-ip.org
  https://www.postgresql.org/message-id/20150215044814.GL3391@alvh.no-ip.org
2015-05-11 19:14:31 -03:00
Stephen Frost fa2642438f Allow LOCK TABLE .. ROW EXCLUSIVE MODE with INSERT
INSERT acquires RowExclusiveLock during normal operation and therefore
it makes sense to allow LOCK TABLE .. ROW EXCLUSIVE MODE to be executed
by users who have INSERT rights on a table (even if they don't have
UPDATE or DELETE).

Not back-patching this as it's a behavior change which, strictly
speaking, loosens security restrictions.

Per discussion with Tom and Robert (circa 2013).
2015-05-11 15:44:12 -04:00
Bruce Momjian 9d15292cfc pg_upgrade: use single or double-quotes in command-line strings
This is platform-dependent.
2015-05-11 12:57:48 -04:00
Tom Lane 20781765f7 Fix incorrect checking of deferred exclusion constraint after a HOT update.
If a row that potentially violates a deferred exclusion constraint is
HOT-updated later in the same transaction, the exclusion constraint would
be reported as violated when the check finally occurs, even if the row(s)
the new row originally conflicted with have since been removed.  This
happened because the wrong TID was passed to check_exclusion_constraint(),
causing the live HOT-updated row to be seen as a conflicting row rather
than recognized as the row-under-test.

Per bug #13148 from Evan Martin.  It's been broken since exclusion
constraints were invented, so back-patch to all supported branches.
2015-05-11 12:25:43 -04:00
Robert Haas b4d4ce1d50 Increase threshold for multixact member emergency autovac to 50%.
Analysis by Noah Misch shows that the 25% threshold set by commit
53bb309d2d is lower than any other,
similar autovac threshold.  While we don't know exactly what value
will be optimal for all users, it is better to err a little on the
high side than on the low side.  A higher value increases the risk
that users might exhaust the available space and start seeing errors
before autovacuum can clean things up sufficiently, but a user who
hits that problem can compensate for it by reducing
autovacuum_multixact_freeze_max_age to a value dependent on their
average multixact size.  On the flip side, if the emergency cap
imposed by that patch kicks in too early, the user will experience
excessive wraparound scanning and will be unable to mitigate that
problem by configuration.  The new value will hopefully reduce the
risk of such bad experiences while still providing enough headroom
to avoid multixact member exhaustion for most users.

Along the way, adjust the documentation to reflect the effects of
commit 04e6d3b877, which taught
autovacuum to run for multixact wraparound even when autovacuum
is configured off.
2015-05-11 12:15:50 -04:00
Bruce Momjian 2200713aa8 initdb: only recommend pg_ctl to start the server
Previously we mentioned the 'postgres' binary method as well.
2015-05-11 12:14:57 -04:00
Bruce Momjian c71e273402 pg_dump: suppress "Tablespace:" comment for default tablespaces
Report by Hans Ginzel
2015-05-11 11:45:43 -04:00
Robert Haas 04e6d3b877 Even when autovacuum=off, force it for members as we do in other cases.
Thomas Munro, with some adjustments by me.
2015-05-11 10:51:14 -04:00
Robert Haas f6a6c46d7f Advance the stop point for multixact offset creation only at checkpoint.
Commit b69bf30b9b advanced the stop point
at vacuum time, but this has subsequently been shown to be unsafe as a
result of analysis by myself and Thomas Munro and testing by Thomas
Munro.  The crux of the problem is that the SLRU deletion logic may
get confused about what to remove if, at exactly the right time during
the checkpoint process, the head of the SLRU crosses what used to be
the tail.

This patch, by me, fixes the problem by advancing the stop point only
following a checkpoint.  This has the additional advantage of making
the removal logic work during recovery more like the way it works during
normal running, which is probably good.

At least one of the calls to DetermineSafeOldestOffset which this patch
removes was already dead, because MultiXactAdvanceOldest is called only
during recovery and DetermineSafeOldestOffset was set up to do nothing
during recovery.  That, however, is inconsistent with the principle that
recovery and normal running should work similarly, and was confusing to
boot.

Along the way, fix some comments that previous patches in this area
neglected to update.  It's not clear to me whether there's any
concrete basis for the decision to use only half of the multixact ID
space, but it's neither necessary nor sufficient to prevent multixact
member wraparound, so the comments should not say otherwise.
2015-05-10 22:21:20 -04:00
Robert Haas 312747c224 Fix DetermineSafeOldestOffset for the case where there are no mxacts.
Commit b69bf30b9b failed to take into
account the possibility that there might be no multixacts in existence
at all.

Report by Thomas Munro; patch by me.
2015-05-10 21:34:26 -04:00
Tom Lane 1a8a4e5cde Code review for foreign/custom join pushdown patch.
Commit e7cb7ee145 included some design
decisions that seem pretty questionable to me, and there was quite a lot
of stuff not to like about the documentation and comments.  Clean up
as follows:

* Consider foreign joins only between foreign tables on the same server,
rather than between any two foreign tables with the same underlying FDW
handler function.  In most if not all cases, the FDW would simply have had
to apply the same-server restriction itself (far more expensively, both for
lack of caching and because it would be repeated for each combination of
input sub-joins), or else risk nasty bugs.  Anyone who's really intent on
doing something outside this restriction can always use the
set_join_pathlist_hook.

* Rename fdw_ps_tlist/custom_ps_tlist to fdw_scan_tlist/custom_scan_tlist
to better reflect what they're for, and allow these custom scan tlists
to be used even for base relations.

* Change make_foreignscan() API to include passing the fdw_scan_tlist
value, since the FDW is required to set that.  Backwards compatibility
doesn't seem like an adequate reason to expect FDWs to set it in some
ad-hoc extra step, and anyway existing FDWs can just pass NIL.

* Change the API of path-generating subroutines of add_paths_to_joinrel,
and in particular that of GetForeignJoinPaths and set_join_pathlist_hook,
so that various less-used parameters are passed in a struct rather than
as separate parameter-list entries.  The objective here is to reduce the
probability that future additions to those parameter lists will result in
source-level API breaks for users of these hooks.  It's possible that this
is even a small win for the core code, since most CPU architectures can't
pass more than half a dozen parameters efficiently anyway.  I kept root,
joinrel, outerrel, innerrel, and jointype as separate parameters to reduce
code churn in joinpath.c --- in particular, putting jointype into the
struct would have been problematic because of the subroutines' habit of
changing their local copies of that variable.

* Avoid ad-hocery in ExecAssignScanProjectionInfo.  It was probably all
right for it to know about IndexOnlyScan, but if the list is to grow
we should refactor the knowledge out to the callers.

* Restore nodeForeignscan.c's previous use of the relcache to avoid
extra GetFdwRoutine lookups for base-relation scans.

* Lots of cleanup of documentation and missed comments.  Re-order some
code additions into more logical places.
2015-05-10 14:36:36 -04:00
Tom Lane c594c75078 Add missing "static" marker.
Per buildfarm member pademelon.
2015-05-09 23:39:36 -04:00
Andrew Dunstan cb9fa802b3 Add new OID alias type regnamespace
Catalog version bumped

Kyotaro HORIGUCHI
2015-05-09 13:36:52 -04:00
Andrew Dunstan 0c90f6769d Add new OID alias type regrole
The new type has the scope of whole the database cluster so it doesn't
behave the same as the existing OID alias types which have database
scope,
concerning object dependency. To avoid confusion constants of the new
type are prohibited from appearing where dependencies are made involving
it.

Also, add a note to the docs about possible MVCC violation and
optimization issues, which are general over the all reg* types.

Kyotaro Horiguchi
2015-05-09 13:06:49 -04:00
Stephen Frost 0cf56f14dd Improve ParseConfigFp comment wrt head/tail
The head_p and tail_p pointers passed to ParseConfigFp() are actually
input/output parameters, not strictly output paramaters.  This updates
the function comment to reflect that.

Per discussion with Tom.
2015-05-09 11:13:37 -04:00
Stephen Frost 9a0884176f Change default for include_realm to 1
The default behavior for GSS and SSPI authentication methods has long
been to strip the realm off of the principal, however, this is not a
secure approach in multi-realm environments and the use-case for the
parameter at all has been superseded by the regex-based mapping support
available in pg_ident.conf.

Change the default for include_realm to be '1', meaning that we do
NOT remove the realm from the principal by default.  Any installations
which depend on the existing behavior will need to update their
configurations (ideally by leaving include_realm set to 1 and adding a
mapping in pg_ident.conf, but alternatively by explicitly setting
include_realm=0 prior to upgrading).  Note that the mapping capability
exists in all currently supported versions of PostgreSQL and so this
change can be done today.  Barring that, existing users can update their
configurations today to explicitly set include_realm=0 to ensure that
the prior behavior is maintained when they upgrade.

This needs to be noted in the release notes.

Per discussion with Magnus and Peter.
2015-05-08 19:39:42 -04:00
Stephen Frost f91feba877 Modify pg_stat_get_activity to build a tuplestore
This updates pg_stat_get_activity() to build a tuplestore for its
results instead of using the old-style multiple-call method.  This
simplifies the function, though that wasn't the primary motivation for
the change, which is that we may turn it into a helper function which
can filter the results (or not) much more easily.
2015-05-08 19:25:30 -04:00
Stephen Frost 4b342fb591 Bump catversion for pg_file_settings
Pointed out by Andres (thanks!)

Apologies for not including it in the initial patch.
2015-05-08 19:14:32 -04:00
Stephen Frost a97e0c3354 Add pg_file_settings view and function
The function and view added here provide a way to look at all settings
in postgresql.conf, any #include'd files, and postgresql.auto.conf
(which is what backs the ALTER SYSTEM command).

The information returned includes the configuration file name, line
number in that file, sequence number indicating when the parameter is
loaded (useful to see if it is later masked by another definition of the
same parameter), parameter name, and what it is set to at that point.
This information is updated on reload of the server.

This is unfiltered, privileged, information and therefore access is
restricted to superusers through the GRANT system.

Author: Sawada Masahiko, various improvements by me.
Reviewers: David Steele
2015-05-08 19:09:26 -04:00
Andres Freund bab64ef9e8 Fix two problems in infer_arbiter_indexes().
The first is a pretty simple bug where a relcache entry is used after
the relation is closed. In this particular situation it does not appear
to have bad consequences unless compiled with RELCACHE_FORCE_RELEASE.

The second is that infer_arbiter_indexes() skipped indexes that aren't
yet valid according to indcheckxmin. That's not required here, because
uniqueness checks don't care about visibility according to an older
snapshot.  While thats not really a bug, it makes things undesirably
non-deterministic.  There is some hope that this explains a test failure
on buildfarm member jaguarundi.

Discussion: 9096.1431102730@sss.pgh.pa.us
2015-05-08 22:28:23 +02:00
Heikki Linnakangas de7688442f At promotion, archive last segment from old timeline with .partial suffix.
Previously, we would archive the possible-incomplete WAL segment with its
normal filename, but that causes trouble if the server owning that timeline
is still running, and tries to archive the same segment later. It's not nice
for the standby to trip up the master's archival like that. And it's pretty
confusing, anyway, to have an incomplete segment in the archive that's
indistinguishable from a normal, complete segment.

To avoid such confusion, add a .partial suffix to the file. Or to be more
precise, make a copy of the old segment under the .partial suffix, and
archive that instead of the original file. pg_receivexlog also uses the
.partial suffix for the same purpose, to tell apart incompletely streamed
files from complete ones.

There is no automatic mechanism to use the .partial files at recovery, so
they will go unused, unless the administrator manually copies to them to
the pg_xlog directory (and removes the .partial suffix). Recovery won't
normally need the WAL - when recovering to the new timeline, it will find
the same WAL on the first segment on the new timeline instead - but it
nevertheless feels better to archive the file with the .partial suffix, for
debugging purposes if nothing else.
2015-05-08 21:59:01 +03:00
Heikki Linnakangas 179cdd0981 Add macros to check if a filename is a WAL segment or other such file.
We had many instances of the strlen + strspn combination to check for that.
This makes the code a bit easier to read.
2015-05-08 21:58:57 +03:00
Peter Eisentraut 16c73e773b Fix whitespace 2015-05-08 14:45:53 -04:00
Andres Freund e8898e9169 Minor ON CONFLICT related comments and doc fixes.
Geoff Winkless, Stephen Frost, Peter Geoghegan and me.
2015-05-08 19:24:14 +02:00
Robert Haas 53bb309d2d Teach autovacuum about multixact member wraparound.
The logic introduced in commit b69bf30b9b
and repaired in commits 669c7d20e6 and
7be47c56af helps to ensure that we don't
overwrite old multixact member information while it is still needed,
but a user who creates many large multixacts can still exhaust the
member space (and thus start getting errors) while autovacuum stands
idly by.

To fix this, progressively ramp down the effective value (but not the
actual contents) of autovacuum_multixact_freeze_max_age as member space
utilization increases.  This makes autovacuum more aggressive and also
reduces the threshold for a manual VACUUM to perform a full-table scan.

This patch leaves unsolved the problem of ensuring that emergency
autovacuums are triggered even when autovacuum=off.  We'll need to fix
that via a separate patch.

Thomas Munro and Robert Haas
2015-05-08 12:53:00 -04:00
Stephen Frost 195fbd4012 Remove reference to src/tools/backend/index.html
src/tools/backend was removed back in 63f1ccd, but
backend/storage/lmgr/README didn't get the memo.

Author: Amit Langote
2015-05-08 07:14:18 -04:00
Andres Freund 168d5805e4 Add support for INSERT ... ON CONFLICT DO NOTHING/UPDATE.
The newly added ON CONFLICT clause allows to specify an alternative to
raising a unique or exclusion constraint violation error when inserting.
ON CONFLICT refers to constraints that can either be specified using a
inference clause (by specifying the columns of a unique constraint) or
by naming a unique or exclusion constraint.  DO NOTHING avoids the
constraint violation, without touching the pre-existing row.  DO UPDATE
SET ... [WHERE ...] updates the pre-existing tuple, and has access to
both the tuple proposed for insertion and the existing tuple; the
optional WHERE clause can be used to prevent an update from being
executed.  The UPDATE SET and WHERE clauses have access to the tuple
proposed for insertion using the "magic" EXCLUDED alias, and to the
pre-existing tuple using the table name or its alias.

This feature is often referred to as upsert.

This is implemented using a new infrastructure called "speculative
insertion". It is an optimistic variant of regular insertion that first
does a pre-check for existing tuples and then attempts an insert.  If a
violating tuple was inserted concurrently, the speculatively inserted
tuple is deleted and a new attempt is made.  If the pre-check finds a
matching tuple the alternative DO NOTHING or DO UPDATE action is taken.
If the insertion succeeds without detecting a conflict, the tuple is
deemed inserted.

To handle the possible ambiguity between the excluded alias and a table
named excluded, and for convenience with long relation names, INSERT
INTO now can alias its target table.

Bumps catversion as stored rules change.

Author: Peter Geoghegan, with significant contributions from Heikki
    Linnakangas and Andres Freund. Testing infrastructure by Jeff Janes.
Reviewed-By: Heikki Linnakangas, Andres Freund, Robert Haas, Simon Riggs,
    Dean Rasheed, Stephen Frost and many others.
2015-05-08 05:43:10 +02:00
Andres Freund 2c8f4836db Represent columns requiring insert and update privileges indentently.
Previously, relation range table entries used a single Bitmapset field
representing which columns required either UPDATE or INSERT privileges,
despite the fact that INSERT and UPDATE privileges are separately
cataloged, and may be independently held.  As statements so far required
either insert or update privileges but never both, that was
sufficient. The required permission could be inferred from the top level
statement run.

The upcoming INSERT ... ON CONFLICT UPDATE feature needs to
independently check for both privileges in one statement though, so that
is not sufficient anymore.

Bumps catversion as stored rules change.

Author: Peter Geoghegan
Reviewed-By: Andres Freund
2015-05-08 00:20:46 +02:00
Alvaro Herrera db5f98ab4f Improve BRIN infra, minmax opclass and regression test
The minmax opclass was using the wrong support functions when
cross-datatypes queries were run.  Instead of trying to fix the
pg_amproc definitions (which apparently is not possible), use the
already correct pg_amop entries instead.  This requires jumping through
more hoops (read: extra syscache lookups) to obtain the underlying
functions to execute, but it is necessary for correctness.

Author: Emre Hasegeli, tweaked by Álvaro
Review: Andreas Karlsson

Also change BrinOpcInfo to record each stored type's typecache entry
instead of just the OID.  Turns out that the full type cache is
necessary in brin_deform_tuple: the original code used the indexed
type's byval and typlen properties to extract the stored tuple, which is
correct in Minmax; but in other implementations that want to store
something different, that's wrong.  The realization that this is a bug
comes from Emre also, but I did not use his patch.

I also adopted Emre's regression test code (with smallish changes),
which is more complete.
2015-05-07 13:02:22 -03:00
Robert Haas 7be47c56af Fix incorrect math in DetermineSafeOldestOffset.
The old formula didn't have enough parentheses, so it would do the wrong
thing, and it used / rather than % to find a remainder.  The effect of
these oversights is that the stop point chosen by the logic introduced in
commit b69bf30b9b might be rather
meaningless.

Thomas Munro, reviewed by Kevin Grittner, with a whitespace tweak by me.
2015-05-07 11:19:31 -04:00
Magnus Hagander 1a241d22ae Properly send SCM status updates when shutting down service on Windows
The Service Control Manager should be notified regularly during a shutdown
that takes a long time. Previously we would increaes the counter, but forgot
to actually send the notification to the system. The loop counter was also
incorrectly initalized in the event that the startup of the system took long
enough for it to increase, which could cause the shutdown process not to wait
as long as expected.

Krystian Bigaj, reviewed by Michael Paquier
2015-05-07 15:04:13 +02:00
Magnus Hagander d678bde655 Fix indentation that could mask a future bug
Michael Paquier, spotted using Coverity
2015-05-07 11:41:26 +02:00
Magnus Hagander aa7cf3eef4 Fix minor resource leak in pg_dump
Michael Paquier, spotted using Coverity
2015-05-07 11:41:13 +02:00
Robert Haas 1998261034 Avoid using a C++ keyword as a structure member name.
Per request from Peter Eisentraut.
2015-05-05 22:41:03 -04:00
Alvaro Herrera 3b6db1f445 Add geometry/range functions to support BRIN inclusion
This commit adds the following functions:
    box(point) -> box
    bound_box(box, box) -> box
    inet_same_family(inet, inet) -> bool
    inet_merge(inet, inet) -> cidr
    range_merge(anyrange, anyrange) -> anyrange

The first of these is also used to implement a new assignment cast from
point to box.

These functions are the first part of a base to implement an "inclusion"
operator class for BRIN, for multidimensional data types.

Author: Emre Hasegeli
Reviewed by: Andreas Karlsson
2015-05-05 15:22:24 -03:00
Robert Haas 456ff08638 Fix some problems with patch to fsync the data directory.
pg_win32_is_junction() was a typo for pgwin32_is_junction().  open()
was used not only in a two-argument form, which breaks on Windows,
but also where BasicOpenFile() should have been used.

Per reports from Andrew Dunstan and David Rowley.
2015-05-05 09:29:49 -04:00
Peter Eisentraut ad8d6d064c Fix typos
Author: Erik Rijkers <er@xs4all.nl>
2015-05-04 20:40:19 -04:00
Robert Haas 40f42d2a34 Use outerPlanState macro instead of referring to leffttree.
This makes the executor code more consistent.  It also removes
an apparently superfluous NULL test in nodeGroup.c.

Qingqing Zhou, reviewed by Tom Lane, and further revised by me.
2015-05-04 16:17:36 -04:00
Tom Lane 2503982be4 Improve procost estimates for some text search functions.
The text search functions that involve parsing raw text into lexemes are
remarkably CPU-intensive, so estimating them at the same cost as most other
built-in functions seems like a mistake; moreover, doing so turns out to
discourage the optimizer from using functional indexes on these functions.
After some debate, we've agreed to raise procost from 1 to 100 for
to_tsvector(), plainto_tsvector(), to_tsquery(), ts_headline(),
ts_match_tt(), and ts_match_tq(), which are all the text search functions
that parse raw text.

Also increase procost for the 2-argument form of ts_rewrite()
(tsquery_rewrite_query); while this function doesn't do text parsing,
it does execute a user-supplied SQL query, so its previous procost of 1 is
clearly a drastic underestimate.  It seems reasonable to assign it the same
cost we assign to PL functions by default, so 100 is the number here too.

I did not bother bumping catversion for this change, since it does not
break catalog compatibility with the server executable nor result in
any regression test changes.

Per complaint from Andrew Gierth and subsequent discussion.
2015-05-04 15:38:57 -04:00
Robert Haas 2ce439f337 Recursively fsync() the data directory after a crash.
Otherwise, if there's another crash, some writes from after the first
crash might make it to disk while writes from before the crash fail
to make it to disk.  This could lead to data corruption.

Back-patch to all supported versions.

Abhijit Menon-Sen, reviewed by Andres Freund and slightly revised
by me.
2015-05-04 14:13:53 -04:00
Heikki Linnakangas ec3d976bce Fix the same-rel optimization when creating WAL records.
prev_regbuf was never set, and therefore the same-rel flag was never set on
WAL records.

Report and fix by Zhanq Zq
2015-05-04 21:03:36 +03:00
Andrew Dunstan 3c000fd9a6 Fix two small bugs in json's populate_record_worker
The first bug is not releasing a tupdesc when doing an early return out
of the function. The second bug is a logic error in choosing when to do
an early return if given an empty jsonb object.

Bug reports from Pavel Stehule and Tom Lane respectively.

Backpatch to 9.4 where these were introduced.
2015-05-04 12:38:58 -04:00
Tom Lane c90b85e4d9 Second try at fixing warnings caused by commit 9b43d73b3f.
Commit ef3f9e642d suppressed one cause of warnings here, but
recent clang on OS X is still unhappy because we're passing a "long"
to abs().  The fact that tm_gmtoff is declared as long is no doubt a
hangover from days when int might be only 16 bits; but Postgres has
never been able to run on such machines, so we can just cast it to int
with no worries.  For consistency, also cast to int in the other
uses of tm_gmtoff in this stanza.

Note: this code is still broken on machines that don't follow C99
integer-division-truncates-towards-zero rules.  Given the lack of
complaints about it, I don't feel a large desire to complicate things
enough to cope with the pre-C99 rules.
2015-05-03 23:44:52 -04:00
Tom Lane a4820434c1 Fix overlooked relcache invalidation in ALTER TABLE ... ALTER CONSTRAINT.
When altering the deferredness state of a foreign key constraint, we
correctly updated the catalogs and then invalidated the relcache state for
the target relation ... but that's not the only relation with relevant
triggers.  Must invalidate the other table as well, or the state change
fails to take effect promptly for operations triggered on the other table.
Per bug #13224 from Christian Ullrich.

In passing, reorganize regression test case for this feature so that it
isn't randomly injected into the middle of an unrelated test sequence.

Oversight in commit f177cbfe67.  Back-patch
to 9.4 where the faulty code was added.
2015-05-03 11:30:24 -04:00
Andrew Dunstan b6b2149e48 Fix python_includespec on Windows at configure time
By converting to using forward slashes at configure time we avoid
having to repeat the logic anywhere that this is needed, such as
in transforms modules for plpython.
2015-05-03 08:17:04 -04:00
Noah Misch 1a629c1b16 Combine initdb tests that successfully create a data directory.
This eliminates many seconds of test duration and the cause to invoke
"rm -rf", which is typically unavailable on Windows.

Michael Paquier and Noah Misch
2015-05-02 16:47:28 -04:00
Noah Misch 84c08a7649 Fix one more TAP test to use standard command-line argument ordering.
Commit c67a86f7da caught most of these,
but this negative test escaped notice.  The test did pass, for the wrong
reason, under affected configurations.

Michael Paquier
2015-05-02 16:46:52 -04:00
Noah Misch b339a5cf90 Rename coerce_type() local variable.
coerce_type() has local variables named targetTypeId, baseTypeId, and
targetType.  targetType has been the Type structure for baseTypeId, so
rename it to baseType.
2015-05-02 16:46:23 -04:00
Peter Eisentraut 67df9782e9 Windows also needs an override of the shared libpython detection 2015-05-02 13:23:16 -04:00
Peter Eisentraut 0fd764647a Make hstore_plperl's build even more like plperl's
Combine the two places that set CPPFLAGS into one.  Also, some settings
should be restricted to Windows only.  More precisely, -Wno-comment is
a GCC-only option, but Windows in a makefile implies GCC at the moment.

Also, since -Wno-comment is more properly a preprocessor option, move it
to CPPFLAGS to simplify things a bit.
2015-05-01 22:16:58 -04:00
Peter Eisentraut d664a10f96 Move interpreter shared library detection to configure
For building PL/Perl, PL/Python, and PL/Tcl, we need a shared library of
libperl, libpython, and libtcl, respectively.  Previously, this was
checked in the makefiles, skipping the PL build with a warning if no
shared library was available.  Now this is checked in configure, with an
error if no shared library is available.

The previous situation arose because in the olden days, the configure
options --with-perl, --with-python, and --with-tcl controlled whether
frontend interfaces for those languages would be built.  The procedural
languages were added later, and shared libraries were often not
available in the beginning.  So it was decided skip the builds of the
procedural languages in those cases.  The frontend interfaces have since
been removed from the tree, and shared libraries are now available most
of the time, so that setup makes much less sense now.

Also, the new setup allows contrib modules and pgxs users to rely on the
respective PLs being available based on configure flags.
2015-05-01 21:38:21 -04:00
Bruce Momjian b2f95c34f4 Mark views created from tables as replication identity 'nothing'
pg_dump turns tables into views using a method that was not setting
pg_class.relreplident properly.

Patch by Marko Tiikkaja

Backpatch through 9.4
2015-05-01 13:03:23 -04:00
Robert Haas e044a44949 Deparse named arguments to use the new => operator instead of :=
Tom Lane pointed out that this wasn't done, and asked whether that was
intentional.  Subsequent discussion was in favor of making the change,
so here we go.
2015-05-01 09:37:10 -04:00
Robert Haas e7cb7ee145 Allow FDWs and custom scan providers to replace joins with scans.
Foreign data wrappers can use this capability for so-called "join
pushdown"; that is, instead of executing two separate foreign scans
and then joining the results locally, they can generate a path which
performs the join on the remote server and then is scanned locally.
This commit does not extend postgres_fdw to take advantage of this
capability; it just provides the infrastructure.

Custom scan providers can use this in a similar way.  Previously,
it was only possible for a custom scan provider to scan a single
relation.  Now, it can scan an entire join tree, provided of course
that it knows how to produce the same results that the join would
have produced if executed normally.

KaiGai Kohei, reviewed by Shigeru Hanada, Ashutosh Bapat, and me.
2015-05-01 08:50:35 -04:00
Andres Freund 2b22795b32 Copy editing of the replication origins patch.
Michael Paquier and myself.
2015-05-01 12:22:13 +02:00
Andres Freund 1db12da85b Fix unaligned memory access in xlog parsing due to replication origin patch.
ParseCommitRecord() accessed xl_xact_origin directly. But the chunks in
the commit record's data only have 4 byte alignment, whereas
xl_xact_origin's members require 8 byte alignment on some
platforms. Update comments to make not of that and copy the record to
stack local storage before reading.

With help from Stefan Kaltenbrunner in pinning down the buildfarm and
verifying the fix.
2015-05-01 11:36:14 +02:00
Heikki Linnakangas 484a848a73 Fix pg_rewind regression failure after "fast promotion"
pg_rewind looks at the control file to determine the server's timeline. If
the standby performs a "fast promotion", the timeline ID in the control
file is not updated until the next checkpoint. The startup process requests
a checkpoint immediately after promotion, so this is unlikely to be an
issue in the real world, but the regression suite ran pg_rewind so quickly
after promotion that the checkpoint had not yet completed.

Reported by Stephen Frost
2015-04-30 21:59:58 -07:00
Alvaro Herrera 9d396af463 Fix up some loose ends for CURRENT_USER as RoleSpec
In commit 31eae6028e, some documents were not updated to show the new
capability; fix that.  Also, the error message you get when CURRENT_USER
and SESSION_USER are used in a context that doesn't accept them could be
clearer about it being a problem only in those contexts; so add the
word "here".

Author: Kyotaro HORIGUCHI

His patch submission also included changes to GRANT/REVOKE, but those
seemed more controversial, so I left them out.  We can reconsider these
changes later.
2015-04-30 16:57:05 -03:00
Robert Haas 924bcf4f16 Create an infrastructure for parallel computation in PostgreSQL.
This does four basic things.  First, it provides convenience routines
to coordinate the startup and shutdown of parallel workers.  Second,
it synchronizes various pieces of state (e.g. GUCs, combo CID
mappings, transaction snapshot) from the parallel group leader to the
worker processes.  Third, it prohibits various operations that would
result in unsafe changes to that state while parallelism is active.
Finally, it propagates events that would result in an ErrorResponse,
NoticeResponse, or NotifyResponse message being sent to the client
from the parallel workers back to the master, from which they can then
be sent on to the client.

Robert Haas, Amit Kapila, Noah Misch, Rushabh Lathia, Jeevan Chalke.
Suggestions and review from Andres Freund, Heikki Linnakangas, Noah
Misch, Simon Riggs, Euler Taveira, and Jim Nasby.
2015-04-30 15:02:14 -04:00
Alvaro Herrera 669c7d20e6 Fix pg_upgrade's multixact handling (again)
We need to create the pg_multixact/offsets file deleted by pg_upgrade
much earlier than we originally were: it was in TrimMultiXact(), which
runs after we exit recovery, but it actually needs to run earlier than
the first call to SetMultiXactIdLimit (before recovery), because that
routine already wants to read the first offset segment.

Per pg_upgrade trouble report from Jeff Janes.

While at it, silence a compiler warning about a pointless assert that an
unsigned variable was being tested non-negative.  This was a signed
constant in Thomas Munro's patch which I changed to unsigned before
commit.  Pointed out by Andres Freund.
2015-04-30 13:55:06 -03:00
Peter Eisentraut dbf2ec1a1c Fix parallel make risk with new check temp-install setup
The "check" target no longer needs to depend on "all", because it now
runs "install" directly, which in turn depends on "all".  Doing both
will cause problems with parallel make, because two builds will run next
to each other.

Also remove the redirection of the temp-install output into a log file.
This was appropriate when this was done from within pg_regress, but now
it's just a regular make run, and especially with the above changes this
will now take the place of running the "all" target before the test
suites.

problem report by Jeff Janes, patch in part by Michael Paquier
2015-04-29 20:34:22 -04:00
Andres Freund e0f26fc765 Correct replication origin's use of UINT16_MAX to PG_UINT16_MAX.
We can't rely on UINT16_MAX being present, which is why we introduced
PG_UINT16_MAX...

Buildfarm animal bowerbird via Andrew Gierth.
2015-04-30 00:19:36 +02:00
Robert Haas fe72c4c55b Update .gitignore for new rmgr, changed paths. 2015-04-29 15:53:00 -04:00
Robert Haas 9b6a0ce5f0 Remove enum-related special cases for catalog scans.
When this code was written, catalog scans were normally performed using
SnapshotNow, making special handling necessary here.  Now, however, all
catalog scans use MVCC snapshots, so we can change these cases to look
more like what we do for catalog scans elsewhere in the code.

Per discussion with Tom Lane and a reminder from Bruce Momjian.
2015-04-29 15:48:44 -04:00
Robert Haas ef3f9e642d Attempt to fix some compiler warnings. 2015-04-29 14:02:27 -04:00
Andrew Dunstan eb010637dd Enable transforms tests for python 2 on MSVC builds
Currently regression tests for python 3 are disabled on MSVC, and these
tests fail with python 3, too, so we have some work to do to enable
both. Meanwhile, all the buildfarm hosts seem to be building with python
2 anyway, so this at least gets us some coverage.

Original patch from Michael Paquier, significantly modified by me.
2015-04-29 13:49:24 -04:00
Andres Freund 5aa2350426 Introduce replication progress tracking infrastructure.
When implementing a replication solution ontop of logical decoding, two
related problems exist:
* How to safely keep track of replication progress
* How to change replication behavior, based on the origin of a row;
  e.g. to avoid loops in bi-directional replication setups

The solution to these problems, as implemented here, consist out of
three parts:

1) 'replication origins', which identify nodes in a replication setup.
2) 'replication progress tracking', which remembers, for each
   replication origin, how far replay has progressed in a efficient and
   crash safe manner.
3) The ability to filter out changes performed on the behest of a
   replication origin during logical decoding; this allows complex
   replication topologies. E.g. by filtering all replayed changes out.

Most of this could also be implemented in "userspace", e.g. by inserting
additional rows contain origin information, but that ends up being much
less efficient and more complicated.  We don't want to require various
replication solutions to reimplement logic for this independently. The
infrastructure is intended to be generic enough to be reusable.

This infrastructure also replaces the 'nodeid' infrastructure of commit
timestamps. It is intended to provide all the former capabilities,
except that there's only 2^16 different origins; but now they integrate
with logical decoding. Additionally more functionality is accessible via
SQL.  Since the commit timestamp infrastructure has also been introduced
in 9.5 (commit 73c986add) changing the API is not a problem.

For now the number of origins for which the replication progress can be
tracked simultaneously is determined by the max_replication_slots
GUC. That GUC is not a perfect match to configure this, but there
doesn't seem to be sufficient reason to introduce a separate new one.

Bumps both catversion and wal page magic.

Author: Andres Freund, with contributions from Petr Jelinek and Craig Ringer
Reviewed-By: Heikki Linnakangas, Petr Jelinek, Robert Haas, Steve Singer
Discussion: 20150216002155.GI15326@awork2.anarazel.de,
    20140923182422.GA15776@alap3.anarazel.de,
    20131114172632.GE7522@alap2.anarazel.de
2015-04-29 19:30:53 +02:00
Robert Haas c6e96a2f98 psql: Improve tab completion for ALTER FOREIGN TABLE.
Etsuro Fujita
2015-04-29 12:49:10 -04:00
Bruce Momjian 9b43d73b3f to_char(): have format 'OF' only show the leading negative sign
Previously both hours and minutes displayed as negative.

Report by David Pozsar
2015-04-28 21:02:57 -04:00
Bruce Momjian f19d8f14c7 pg_basebackup: canonicalize old and new tablespace paths
This avoids problems with double-slash-specified paths.

Patch by Ian Barwick
2015-04-28 20:12:10 -04:00
Bruce Momjian 33cb8ff6aa Warn about tablespace creation in PGDATA
Also add warning to pg_upgrade

Report by Josh Berkus
2015-04-28 17:35:12 -04:00
Tom Lane 290713e31a Fix another test for RELKIND_RELATION that should allow foreign tables now.
I thought I'd gone through all of these before, but a fresh review found
this one too.  (Perhaps it would be better to just delete this test and
let the failure occur later, but for the moment I'll preserve the logic.)

The case that this was rejecting is like
	CREATE FOREIGN TABLE ft (f1 int ...) ...;
	CREATE TABLE c1 (UNIQUE(f1)) INHERITS(ft);
2015-04-28 12:34:35 -07:00
Tom Lane ad9f08f706 Fix ATSimpleRecursion() to allow recursion from a foreign table.
This is necessary in view of the changes to allow foreign tables to be
full members of inheritance hierarchies, but I (tgl) unaccountably missed
it in commit cb1ca4d800.

Noted by Amit Langote, patch by Etsuro Fujita
2015-04-28 12:25:00 -07:00
Alvaro Herrera d3821e70c9 Code review for multixact bugfix
Reword messages, rename a confusingly named function.

Per Robert Haas.
2015-04-28 14:52:29 -03:00
Andrew Dunstan cbf9f0ec31 Fix MSVC builds for contrib transforms modules.
With this patch the MSVC build and installation will work correctly with
the transforms. However the python transform tests for hstore and ltree
are still disabled pending some further adjustments.

Michael Paquier with some tweaks from me.
2015-04-28 11:47:08 -04:00
Alvaro Herrera b69bf30b9b Protect against multixact members wraparound
Multixact member files are subject to early wraparound overflow and
removal: if the average multixact size is above a certain threshold (see
note below) the protections against offset overflow are not enough:
during multixact truncation at checkpoint time, some
pg_multixact/members files would be removed because the server considers
them to be old and not needed anymore.  This leads to loss of files that
are critical to interpret existing tuples's Xmax values.

To protect against this, since we don't have enough info in pg_control
and we can't modify it in old branches, we maintain shared memory state
about the oldest value that we need to keep; we use this during new
multixact creation to abort if an old still-needed file would get
overwritten.  This value is kept up to date by checkpoints, which makes
it not completely accurate but should be good enough.  We start emitting
warnings sometime earlier, so that the eventual multixact-shutdown
doesn't take DBAs completely by surprise (more precisely: once 20
members SLRU segments are remaining before shutdown.)

On troublesome average multixact size: The threshold size depends on the
multixact freeze parameters. The oldest age is related to the greater of
multixact_freeze_table_age and multixact_freeze_min_age: anything
older than that should be removed promptly by autovacuum.  If autovacuum
is keeping up with multixact freezing, the troublesome multixact average
size is
	(2^32-1) / Max(freeze table age, freeze min age)
or around 28 members per multixact.  Having an average multixact size
larger than that will eventually cause new multixact data to overwrite
the data area for older multixacts.  (If autovacuum is not able to keep
up, or there are errors in vacuuming, the actual maximum is
multixact_freeeze_max_age instead, at which point multixact generation
is stopped completely.  The default value for this limit is 400 million,
which means that the multixact size that would cause trouble is about 10
members).

Initial bug report by Timothy Garnett, bug #12990
Backpatch to 9.3, where the problem was introduced.

Authors: Álvaro Herrera, Thomas Munro
Reviews: Thomas Munro, Amit Kapila, Robert Haas, Kevin Grittner
2015-04-28 11:32:53 -03:00
Andres Freund dfbaed4597 Use a fd opened for read/write when syncing slots during startup.
Some operating systems, including the reporter's windows, return EBADFD
or similar when fsync() is invoked on a O_RDONLY file descriptor.
Unfortunately RestoreSlotFromDisk() does exactly that; which causes
failures after restarts in at least some scenarios.

If you hit the bug the error message will be something like
ERROR: could not fsync file "pg_replslot/$name/state": Bad file descriptor

Simply use O_RDWR instead of O_RDONLY when opening the relevant file
descriptor to fix the bug.  Unfortunately I have no way of verifying the
fix, but we've seen similar problems in the past.

This bug goes back to 9.4 where slots were introduced. Backpatch
accordingly.

Reported-By: Patrice Drolet
Bug: #13143:
Discussion: 20150424101006.2556.60897@wrigleys.postgresql.org
2015-04-28 00:17:43 +02:00
Stephen Frost dcbf5948e1 Improve qual pushdown for RLS and SB views
The original security barrier view implementation, on which RLS is
built, prevented all non-leakproof functions from being pushed down to
below the view, even when the function was not receiving any data from
the view.  This optimization improves on that situation by, instead of
checking strictly for non-leakproof functions, it checks for Vars being
passed to non-leakproof functions and allows functions which do not
accept arguments or whose arguments are not from the current query level
(eg: constants can be particularly useful) to be pushed down.

As discussed, this does mean that a function which is pushed down might
gain some idea that there are rows meeting a certain criteria based on
the number of times the function is called, but this isn't a
particularly new issue and the documentation in rules.sgml already
addressed similar covert-channel risks.  That documentation is updated
to reflect that non-leakproof functions may be pushed down now, if
they meet the above-described criteria.

Author: Dean Rasheed, with a bit of rework to make things clearer,
along with comment and documentation updates from me.
2015-04-27 12:29:42 -04:00
Andrew Dunstan 06ca28d5ab Fix vcbuild failures and chkpass dependency caused by 854adb8
Switching the Windows build scripts to use forward slashes instead of
backslashes has caused a couple of issues in VC builds:
- The file tree list was not correctly generated, build script
  generating vcproj file missing tree dependencies when listing items in
  Filter.
- VC builds do not accept file paths with forward slashes, perhaps it
  could be possible to use a Condition but it seems safer to simply
  enforce the file paths to use backslashes in the vcproj files.
- chkpass had an unneeded dependency with libpgport and libpgcommon to
  make build succeed but actually it is not necessary as crypt.c is
  already listed for this project and should be replaced with a fake name
  as it is a unique file.

Michael Paquier
2015-04-27 10:56:04 -04:00
Andres Freund 2e3ca04e2e Also correct therefor to therefore.
Since both forms are arguably legal I wasn't sure about changing
this. But then Tom argued for 'therefore'...

Author: Dmitriy Olshevskiy
Discussion: 34789.1430067832@sss.pgh.pa.us
2015-04-26 19:05:39 +02:00
Andres Freund 6aab1f45ac Fix various typos and grammar errors in comments.
Author: Dmitriy Olshevskiy
Discussion: 553D00A6.4090205@bk.ru
2015-04-26 18:42:31 +02:00
Andres Freund 9fe1d9ac68 Fix possible division by zero in pg_xlogdump.
When displaying stats it was possible that a floating point division by
zero occured when no FPIs were issued for a type of record.

Author: Abhijit Menon-Sen
Discussion: 20150417091811.GA14008@toroid.org
2015-04-26 18:02:32 +02:00
Peter Eisentraut cac7658205 Add transforms feature
This provides a mechanism for specifying conversions between SQL data
types and procedural languages.  As examples, there are transforms
for hstore and ltree for PL/Perl and PL/Python.

reviews by Pavel Stěhule and Andres Freund
2015-04-26 10:33:14 -04:00
Tom Lane 0bd11d9711 Add comments warning against generalizing default_with_oids.
pg_dump has historically assumed that default_with_oids affects only plain
tables and not other relkinds.  Conceivably we could make it apply to some
newly invented relkind if we did so from the get-go, but changing the
behavior for existing object types will break existing dump scripts.
Add code comments warning about this interaction.

Also, make sure that default_with_oids doesn't cause parse_utilcmd.c to
think that CREATE FOREIGN TABLE will create an OID column.  I think this is
only a latent bug right now, since we don't allow UNIQUE/PKEY constraints
in CREATE FOREIGN TABLE, but it's better to be consistent and future-proof.
2015-04-25 21:38:06 -04:00
Andrew Dunstan 04f1542d39 Try to unbreak some MSVC builds following forward slash change.
Michael Paquier.
2015-04-25 21:28:02 -04:00
Bruce Momjian 764ce22af3 Revert: Honor OID status of CREATE LIKE'd tables
Reverts d992f8a896

Report by Tom Lane
2015-04-25 21:10:48 -04:00
Peter Eisentraut ee8d392765 Don't overwrite EXTRA_INSTALL
The temp-install target sets EXTRA_INSTALL to install the current
directory.  But when doing so, it should append instead of overwrite,
otherwise settings of EXTRA_INSTALL from a makefile won't take effect.
This would cause the earthdistance test to fail when called directly,
because it would miss installing the cube module.
2015-04-25 21:00:39 -04:00
Tom Lane 3cf8686014 Prevent improper reordering of antijoins vs. outer joins.
An outer join appearing within the RHS of an antijoin can't commute with
the antijoin, but somehow I missed teaching make_outerjoininfo() about
that.  In Teodor Sigaev's recent trouble report, this manifests as a
"could not find RelOptInfo for given relids" error within eqjoinsel();
but I think silently wrong query results are possible too, if the planner
misorders the joins and doesn't happen to trigger any internal consistency
checks.  It's broken as far back as we had antijoins, so back-patch to all
supported branches.
2015-04-25 16:44:27 -04:00
Peter Eisentraut 854adb8371 Replace backslashes by forward slashes in MSVC build code
This makes it possible to run some stages of these build scripts on
non-Windows systems.  That way, we can more easily test whether file
moves or makefile changes might break the MSVC build.

Peter Eisentraut and Michael Paquier
2015-04-25 08:58:01 -04:00
Stephen Frost 410cbfd6dd Fix file comment for test_rls_hooks.c
The file-level comment wasn't updated when it was copied from the shared
memory queue test module.  Fixed.

Noted by Dean Rasheed.
2015-04-24 20:44:53 -04:00
Stephen Frost e89bd02f58 Perform RLS WITH CHECK before constraints, etc
The RLS capability is built on top of the WITH CHECK OPTION
system which was added for auto-updatable views, however, unlike
WCOs on views (which are mandated by the SQL spec to not fire until
after all other constraints and checks are done), it makes much more
sense for RLS checks to happen earlier than constraint and uniqueness
checks.

This patch reworks the structure which holds the WCOs a bit to be
explicitly either VIEW or RLS checks and the RLS-related checks are
done prior to the constraint and uniqueness checks.  This also allows
better error reporting as we are now reporting when a violation is due
to a WITH CHECK OPTION and when it's due to an RLS policy violation,
which was independently noted by Craig Ringer as being confusing.

The documentation is also updated to include a paragraph about when RLS
WITH CHECK handling is performed, as there have been a number of
questions regarding that and the documentation was previously silent on
the matter.

Author: Dean Rasheed, with some kabitzing and comment changes by me.
2015-04-24 20:34:26 -04:00
Noah Misch c8aa893862 Remove obsolete -I options from ECPG library compilation.
The MSVC build system already omitted these.
2015-04-24 19:29:09 -04:00
Noah Misch bcd7e8897c Remove superfluous -DFRONTEND.
The majority practice is to add -DFRONTEND in directories building files
that are, at other times, built for the backend.  Some directories
lacking that property added a noise -DFRONTEND in one build system.
Remove the excess flags, for consistency.
2015-04-24 19:29:05 -04:00
Noah Misch 151e74719b Build every ECPG library with -DFRONTEND.
Each of the libraries incorporates src/port files, which often check
FRONTEND.  Build systems disagreed on whether to build libpgtypes this
way.  Only libecpg incorporates files that rely on it today.  Back-patch
to 9.0 (all supported versions) to forestall surprises.
2015-04-24 19:29:02 -04:00
Tom Lane 732b33f8ae Fix up .gitignore and cleanup actions in some src/test/ subdirectories.
examples/, locale/, and thread/ lacked .gitignore files and were also
not connected up to top-level "make clean" etc.  This had escaped notice
because none of those directories are built in normal scenarios.  Still,
they have working Makefiles, so if someone does a "make" in one of these
directories it would be good if (a) git doesn't bleat about the product
files and (b) cleaning up removes them.

This is a longstanding oversight, but since this behavior is probably
only of interest to developers, there seems no need for back-patching.

Michael Paquier and Tom Lane
2015-04-24 17:13:06 -04:00
Tom Lane 70d44dd9de Fix obsolete comment in set_rel_size().
The cross-reference to set_append_rel_pathlist() was obsoleted by
commit e2fa76d80b, which split what
had been set_rel_pathlist() and child routines into two sets of
functions.  But I (tgl) evidently missed updating this comment.

Back-patch to 9.2 to avoid unnecessary divergence among branches.

Amit Langote
2015-04-24 15:18:07 -04:00
Heikki Linnakangas 61a553a091 Add comments explaining how unique and exclusion constraints are enforced. 2015-04-24 21:13:28 +03:00
Peter Eisentraut 9ba978c8cc Fix misspellings
Amit Langote and Thom Brown
2015-04-24 12:00:49 -04:00
Stephen Frost cb087ec03b Copy the relation name for error reporting in WCOs
In get_row_security_policies(), we need to make a copy of the relation
name when building the WithCheckOptions structure, since
RelationGetRelationName just returns a pointer into the local Relation
structure.  The relation name in the WCO structure is only used for
error reporting.

Pointed out by Robert and Christian Ullrich, who noted that the
buildfarm members with -DCLOBBER_CACHE_ALWAYS were failing.
2015-04-24 09:38:10 -04:00
Heikki Linnakangas 62420ae7d6 Move functions related to index maintenance to separate source file.
There is enough code here to deserve a file of their own, not be buried
in the middle of execUtils.c.
2015-04-24 09:33:23 +03:00
Heikki Linnakangas 2c47fe16a7 Fix deadlock at startup, if max_prepared_transactions is too small.
When the startup process recovers transactions by scanning pg_twophase
directory, it should clear MyLockedGxact after it's done processing each
transaction. Like we do during normal operation, at PREPARE TRANSACTION.
Otherwise, if the startup process exits due to an error, it will try to
clear the locking_backend field of the last recovered transaction. That's
usually harmless, but if the error happens in MarkAsPreparing, while
holding TwoPhaseStateLock, the shmem-exit hook will try to acquire
TwoPhaseStateLock again, and deadlock with itself.

This fixes bug #13128 reported by Grant McAlister. The bug was introduced
by commit bb38fb0d, so backpatch to all supported versions like that
commit.
2015-04-23 21:39:35 +03:00
Peter Eisentraut 2aa0fb032e Fix shell error on Solaris
Apparently, the Bourne shell on Solaris doesn't like "for" loops with an
empty list, so have "make" skip the loop in that case.
2015-04-23 13:09:18 -04:00
Peter Eisentraut dcae5facca Improve speed of make check-world
Before, make check-world would create a new temporary installation for
each test suite, which is slow and wasteful.  Instead, we now create one
test installation that is used by all test suites that are part of a
make run.

The management of the temporary installation is removed from pg_regress
and handled in the makefiles.  This allows for better control, and
unifies the code with that of test suites not run through pg_regress.

review and msvc support by Michael Paquier <michael.paquier@gmail.com>

more review by Fabien Coelho <coelho@cri.ensmp.fr>
2015-04-23 08:59:52 -04:00
Alvaro Herrera 50a16e30eb Use the right type OID after creating a shell type
Commit a2e35b53c3 neglected to update the type OID to use further
down in DefineType when TypeShellMake was changed to return
ObjectAddress instead of OID (it got it right in DefineRange, however.)
This resulted in an internal error message being issued when looking up
I/O functions.

Author: Michael Paquier

Also add Asserts() to a couple of other places to ensure that the type
OID being used is as expected.
2015-04-22 16:23:02 -03:00
Stephen Frost 450fa1b5ba Fix installcheck for test_rls_hooks
As pointed out by the buildfarm, test_rls_hooks wasn't functioning
properly with a clean installcheck.  test_rls_hooks needs to explicitly
load the library with the hooks in it, to allow installcheck to work;
using the --temp-config doesn't help since that isn't used when running
installcheck and it isn't exactly fair to the buildfarm to modify the
installed config prior to calling installcheck.

Also, have test_rls_hooks clean up after itself.
2015-04-22 12:43:57 -04:00
Stephen Frost 0bf22e0c8b RLS fixes, new hooks, and new test module
In prepend_row_security_policies(), defaultDeny was always true, so if
there were any hook policies, the RLS policies on the table would just
get discarded.  Fixed to start off with defaultDeny as false and then
properly set later if we detect that only the default deny policy exists
for the internal policies.

The infinite recursion detection in fireRIRrules() didn't properly
manage the activeRIRs list in the case of WCOs, so it would incorrectly
report infinite recusion if the same relation with RLS appeared more
than once in the rtable, for example "UPDATE t ... FROM t ...".

Further, the RLS expansion code in fireRIRrules() was handling RLS in
the main loop through the rtable, which lead to RTEs being visited twice
if they contained sublink subqueries, which
prepend_row_security_policies() attempted to handle by exiting early if
the RTE already had securityQuals.  That doesn't work, however, since
if the query involved a security barrier view on top of a table with
RLS, the RTE would already have securityQuals (from the view) by the
time fireRIRrules() was invoked, and so the table's RLS policies would
be ignored.  This is fixed in fireRIRrules() by handling RLS in a
separate loop at the end, after dealing with any other sublink
subqueries, thus ensuring that each RTE is only visited once for RLS
expansion.

The inheritance planner code didn't correctly handle non-target
relations with RLS, which would get turned into subqueries during
planning. Thus an update of the form "UPDATE t1 ... FROM t2 ..." where
t1 has inheritance and t2 has RLS quals would fail.  Fix by making sure
to copy in and update the securityQuals when they exist for non-target
relations.

process_policies() was adding WCOs to non-target relations, which is
unnecessary, and could lead to a lot of wasted time in the rewriter and
the planner. Fix by only adding WCO policies when working on the result
relation.  Also in process_policies, we should be copying the USING
policies to the WITH CHECK policies on a per-policy basis, fix by moving
the copying up into the per-policy loop.

Lastly, as noted by Dean, we were simply adding policies returned by the
hook provided to the list of quals being AND'd, meaning that they would
actually restrict records returned and there was no option to have
internal policies and hook-based policies work together permissively (as
all internal policies currently work).  Instead, explicitly add support
for both permissive and restrictive policies by having a hook for each
and combining the results appropriately.  To ensure this is all done
correctly, add a new test module (test_rls_hooks) to test the various
combinations of internal, permissive, and restrictive hook policies.

Largely from Dean Rasheed (thanks!):

CAEZATCVmFUfUOwwhnBTcgi6AquyjQ0-1fyKd0T3xBWJvn+xsFA@mail.gmail.com

Author: Dean Rasheed, though I added the new hooks and test module.
2015-04-22 12:01:06 -04:00
Stephen Frost 4ccc5bd28e Pull in tableoid for inheiritance with rowMarks
As noted by Etsuro Fujita [1] and Dean Rasheed[2],
cb1ca4d800 changed ExecBuildAuxRowMark()
to always look for the tableoid in the target list, but didn't also
change preprocess_targetlist() to always include the tableoid.  This
resulted in errors with soon-to-be-added RLS with inheritance tests,
and errors when using inheritance with foreign tables.

Authors: Etsuro Fujita and Dean Rasheed (independently)

Minor word-smithing on the comments by me.

[1] 552CF0B6.8010006@lab.ntt.co.jp
[2] CAEZATCVmFUfUOwwhnBTcgi6AquyjQ0-1fyKd0T3xBWJvn+xsFA@mail.gmail.com
2015-04-22 11:29:35 -04:00
Heikki Linnakangas 54a16df010 Make the pg_rewind regression tests more robust on slow systems.
There were a couple of hard-coded sleeps in the tests: to wait for standby
to catch up with master, and to wait for promotion with "pg_ctl promote"
to complete. Instead of a fixed, hard-coded sleep, poll the server with a
query once a second. This isn't ideal either, and I wish we had a better
solution for real-world applications too, but this should fix the
immediate problem.

Patch by Michael Paquier, with some editing by me.
2015-04-22 14:33:57 +03:00
Andres Freund cef939c347 Rename pg_replication_slot's new active_in to active_pid.
In d811c037ce active_in was added but discussion since showed that
active_pid is preferred as a name.

Discussion: CAMsr+YFKgZca5_7_ouaMWxA5PneJC9LNViPzpDHusaPhU9pA7g@mail.gmail.com
2015-04-22 09:43:40 +02:00
Heikki Linnakangas 4d930eee89 Don't leave 'tmp_check' directory behind in pg_rewind regression tests. 2015-04-22 10:14:44 +03:00
Peter Eisentraut b0a738f428 Move pg_xlogdump from contrib/ to src/bin/
Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
2015-04-21 19:03:49 -04:00
Heikki Linnakangas 060a1224af Add missing installcheck target to pg_rewind's Makefile
Michael Paquier
2015-04-21 14:09:25 +03:00
Andres Freund d811c037ce Add 'active_in' column to pg_replication_slots.
Right now it is visible whether a replication slot is active in any
session, but not in which.  Adding the active_in column, containing the
pid of the backend having acquired the slot, makes it much easier to
associate pg_replication_slots entries with the corresponding
pg_stat_replication/pg_stat_activity row.

This should have been done from the start, but I (Andres) dropped the
ball there somehow.

Author: Craig Ringer, revised by me Discussion:
CAMsr+YFKgZca5_7_ouaMWxA5PneJC9LNViPzpDHusaPhU9pA7g@mail.gmail.com
2015-04-21 11:51:06 +02:00
Peter Eisentraut 528c2e44ab Move pg_test_timing from contrib/ to src/bin/
Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
2015-04-20 21:30:12 -04:00
Bruce Momjian d992f8a896 Honor OID status of CREATE LIKE'd tables
Previously, tables created by CREATE LIKE never had OIDs.

Report by Tom Lane
2015-04-20 16:11:25 -04:00
Peter Eisentraut 00882d9e5c Move pg_test_fsync from contrib/ to src/bin/
Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
2015-04-19 22:20:49 -04:00
Bruce Momjian f92fc4c95d pg_upgrade: binary_upgrade_create_empty_extension() is strict
Was broken by commit 30982be4e5.

Patch by Jeff Janes
2015-04-17 20:08:42 -04:00
Stephen Frost ab6d1cd26e Fix typo in relcache's equalPolicy()
The USING policies were not being checked for differences as the same
policy was being passed in to both sides of the equal().  This could
result in backends not realizing that a policy had been changed, if
none of the other attributes had been changed.

Fix by passing to equal() the policy1 and policy2 using quals for
comparison.

No need to back-patch as this is not yet released.  Noticed while
testing changes to RLS proposed by Dean Rasheed.
2015-04-17 16:37:11 -04:00
Alvaro Herrera 4cb7d671fd Add new target modulescheck in vcregress.pl
This allows an MSVC build to run regression tests related to modules in
src/test/modules.

Author: Michael Paquier
Reviewed by: Andrew Dunstan
2015-04-16 23:39:52 -03:00
Alvaro Herrera 22d005323f MSVC: install src/test/modules together with contrib
These modules have to be installed so that the testing module can access
them.  (We don't have that yet, but will soon have it.)

Author: Michael Paquier
Reviewed by: Andrew Dunstan
2015-04-16 16:40:14 -03:00
Heikki Linnakangas e2999abcd1 Fix assertion failure in logical decoding.
Logical decoding set SnapshotData's regd_count field to avoid the
snapshot manager from prematurely freeing snapshots that are generated
by the decoding system. That was always an abuse of the field, as it was
never supposed to be used outside the snapshot manager. Commit 94028691
made snapshot manager's tracking of the snapshots smarter, and that scheme
fell apart. The snapshot manager got confused and hit the assertion, when
a snapshot that was marked with regd_count==1 was not found in the heap,
where the snapshot manager tracks registered the snapshots.

To fix, don't abuse the regd_count field like that. Logical decoding still
abuses the active_count field for similar purposes, but that's currently
harmless.

The assertion failure was first reported by Michael Paquier
2015-04-16 21:50:07 +03:00
Alvaro Herrera 90898af30b MSVC: Include modules of src/test/modules in build
commit_ts, being only a module used for test purposes, is ignored in the
process for now.

Author: Michael Paquier
Reviewed by: Andrew Dunstan
2015-04-16 15:17:26 -03:00
Heikki Linnakangas b5e384e374 Add missing newlines to error messages. 2015-04-16 09:18:00 +03:00
Heikki Linnakangas b5e560c246 Error out in pg_rewind if lstat() fails.
A "file not found" is expected if the source server is running, so don't
complain about that. But any other error is definitely not expected.
2015-04-15 23:13:32 +03:00
Heikki Linnakangas 41457fcf97 Minor cleanup of pg_rewind.
Update comments and function names to use the terms "source" and "target"
consistently. Some places were calling them remote and local instead, which
was confusing.

Fix incorrect comment in extractPageInfo on database creation record - it
was wrong on what happens for databases created in the target that don't
exist in source.
2015-04-15 22:52:00 +03:00
Heikki Linnakangas 0d8a22a9ac Shut down test servers after pg_rewind regression tests.
Now that the test servers are initialized twice in each .pl script,
the single END block is not enough to stop them. Add a new clean_rewind_test
function that is called at the end of each test.

Michael Paquier
2015-04-15 19:54:38 +03:00
Heikki Linnakangas 3d80a1e0e3 Fix logic to skip checkpoint if no records have been inserted.
After the WAL format changes, the calculation of the size of a checkpoint
record became incorrect. Instead of trying to fix the math, check that the
previous record, i.e. the xl_prev value that we'd write for the next
record, matches the last checkpoint's redo pointer. That way it's not
dependent on the size of the checkpoint record at all.

The old logic was actually slightly wrong all along: if the previous
checkpoint record crossed a page boundary, the page headers threw off the
record size calculation, and the checkpoint was not skipped. The new
checkpoint would not cross a page boundary, so this only resulted in at
most one extra checkpoint after the system became idle. The new logic fixes
that. (It's not worth fixing in backbranches).

However, it makes some sense to try to keep the latest checkpoint contained
fully in a page, or at least in a single WAL segment, just on general
robustness grounds. If something goes awfully wrong, it's more likely that
you can recover the latest WAL segment, than the last two WAL segments. So
I added an extra check that the checkpoint is not skipped if the previous
checkpoint crossed a WAL segment.

Reported by Jeff Janes.
2015-04-15 17:21:04 +03:00
Peter Eisentraut 9fa8b0ee90 Move pg_upgrade from contrib/ to src/bin/
Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
2015-04-14 19:26:38 -04:00
Peter Eisentraut 30982be4e5 Integrate pg_upgrade_support module into backend
Previously, these functions were created in a schema "binary_upgrade",
which was deleted after pg_upgrade was finished.  Because we don't want
to keep that schema around permanently, move them to pg_catalog but
rename them with a binary_upgrade_... prefix.

The provided functions are only small wrappers around global variables
that were added specifically for pg_upgrade use, so keeping the module
separate does not create any modularity.

The functions still check that they are only called in binary upgrade
mode, so it is not possible to call these during normal operation.

Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
2015-04-14 19:26:37 -04:00
Heikki Linnakangas 936546dcbc Optimize pg_comp_crc32c_sse42 routine slightly, and also use it on x86.
Eliminate the separate 'len' variable from the loops, and also use the 4
byte instruction. This shaves off a few more cycles. Even though this
routine that uses the special SSE 4.2 instructions is much faster than a
generic routine, it's still a hot spot, so let's make it as fast as
possible.

Change the configure test to not test _mm_crc32_u64. That variant is only
available in the 64-bit x86-64 architecture, not in 32-bit x86. Modify
pg_comp_crc32c_sse42 so that it only uses _mm_crc32_u64 on x86-64. With
these changes, the SSE accelerated CRC-32C implementation can also be used
on 32-bit x86 systems.

This also fixes the 32-bit MSVC build.
2015-04-14 23:58:16 +03:00
Heikki Linnakangas b73e7a0716 Oops, fix misspelled #endif
I hope this fixes the Windows builfarm failures.
2015-04-14 22:00:52 +03:00
Alvaro Herrera 0a52fafce4 Fix typo in comment
SLRU_SEGMENTS_PER_PAGE -> SLRU_PAGES_PER_SEGMENT

I introduced this ancient typo in subtrans.c and later propagated it to
multixact.c.  I fixed the latter in f741300c, but only back to 9.3;
backpatch to all supported branches for consistency.
2015-04-14 12:12:18 -03:00