Commit Graph

1877 Commits

Author SHA1 Message Date
Bruce Momjian 1d25779284 Update copyright via script for 2017 2017-01-03 13:48:53 -05:00
Peter Eisentraut 2e254130d1 Make more use of RoleSpec struct
Most code was casting this through a generic Node.  By declaring
everything as RoleSpec appropriately, we can remove a bunch of casts and
ad-hoc node type checking.

Reviewed-by: Alvaro Herrera <alvherre@2ndquadrant.com>
2016-12-29 10:49:39 -05:00
Robert Haas 2ac3ef7a01 Fix tuple routing in cases where tuple descriptors don't match.
The previous coding failed to work correctly when we have a
multi-level partitioned hierarchy where tables at successive levels
have different attribute numbers for the partition key attributes.  To
fix, have each PartitionDispatch object store a standalone
TupleTableSlot initialized with the TupleDesc of the corresponding
partitioned table, along with a TupleConversionMap to map tuples from
the its parent's rowtype to own rowtype.  After tuple routing chooses
a leaf partition, we must use the leaf partition's tuple descriptor,
not the root table's.  To that end, a dedicated TupleTableSlot for
tuple routing is now allocated in EState.

Amit Langote
2016-12-22 17:36:37 -05:00
Peter Eisentraut 1753b1b027 Add pg_sequence system catalog
Move sequence metadata (start, increment, etc.) into a proper system
catalog instead of storing it in the sequence heap object.  This
separates the metadata from the sequence data.  Sequence metadata is now
operated on transactionally by DDL commands, whereas previously
rollbacks of sequence-related DDL commands would be ignored.

Reviewed-by: Andreas Karlsson <andreas@proxel.se>
2016-12-20 08:28:18 -05:00
Robert Haas 7cd0fd655d Invalid parent's relcache after CREATE TABLE .. PARTITION OF.
Otherwise, subsequent commands in the same transaction see the wrong
partition descriptor.

Amit Langote.  Reported by Tomas Vondra and David Fetter.  Reviewed
by me.

Discussion: http://postgr.es/m/22dd313b-d7fd-22b5-0787-654845c8f849%402ndquadrant.com
Discussion: http://postgr.es/m/20161215090916.GB20659%40fetter.org
2016-12-19 22:53:30 -05:00
Robert Haas a25665088d Fix bugs in RelationGetPartitionDispatchInfo.
The previous coding was not quite right for cases involving multiple
levels of partitioning.

Amit Langote
2016-12-13 11:29:08 -05:00
Robert Haas 4b9a98e154 Clean up code, comments, and formatting for table partitioning.
Amit Langote, plus pgindent-ing by me.  Inspired in part by review
comments from Tomas Vondra.
2016-12-13 10:59:14 -05:00
Peter Eisentraut a924c327e2 Add support for temporary replication slots
This allows creating temporary replication slots that are removed
automatically at the end of the session or on error.

From: Petr Jelinek <petr.jelinek@2ndquadrant.com>
2016-12-12 08:38:17 -05:00
Robert Haas ab4575dcf1 Silence compiler warning.
Per report from Stephen Frost.
2016-12-08 14:55:47 -05:00
Robert Haas fa0f466d53 Log the creation of an init fork unconditionally.
Previously, it was thought that this only needed to be done for the
benefit of possible standbys, so wal_level = minimal skipped it.
But that's not safe, because during crash recovery we might replay
XLOG_DBASE_CREATE or XLOG_TBLSPC_CREATE record which recursively
removes the directory that contains the new init fork.  So log it
always.

The user-visible effect of this bug is that if you create a database
or tablespace, then create an unlogged table, then crash without
checkpointing, then restart, accessing the table will fail, because
the it won't have been properly reset.  This commit fixes that.

Michael Paquier, per a report from Konstantin Knizhnik.  Wording of
the comments per a suggestion from me.
2016-12-08 14:12:08 -05:00
Robert Haas f0e44751d7 Implement table partitioning.
Table partitioning is like table inheritance and reuses much of the
existing infrastructure, but there are some important differences.
The parent is called a partitioned table and is always empty; it may
not have indexes or non-inherited constraints, since those make no
sense for a relation with no data of its own.  The children are called
partitions and contain all of the actual data.  Each partition has an
implicit partitioning constraint.  Multiple inheritance is not
allowed, and partitioning and inheritance can't be mixed.  Partitions
can't have extra columns and may not allow nulls unless the parent
does.  Tuples inserted into the parent are automatically routed to the
correct partition, so tuple-routing ON INSERT triggers are not needed.
Tuple routing isn't yet supported for partitions which are foreign
tables, and it doesn't handle updates that cross partition boundaries.

Currently, tables can be range-partitioned or list-partitioned.  List
partitioning is limited to a single column, but range partitioning can
involve multiple columns.  A partitioning "column" can be an
expression.

Because table partitioning is less general than table inheritance, it
is hoped that it will be easier to reason about properties of
partitions, and therefore that this will serve as a better foundation
for a variety of possible optimizations, including query planner
optimizations.  The tuple routing based which this patch does based on
the implicit partitioning constraints is an example of this, but it
seems likely that many other useful optimizations are also possible.

Amit Langote, reviewed and tested by Robert Haas, Ashutosh Bapat,
Amit Kapila, Rajkumar Raghuwanshi, Corey Huinker, Jaime Casanova,
Rushabh Lathia, Erik Rijkers, among others.  Minor revisions by me.
2016-12-07 13:17:55 -05:00
Stephen Frost 093129c9d9 Add support for restrictive RLS policies
We have had support for restrictive RLS policies since 9.5, but they
were only available through extensions which use the appropriate hooks.
This adds support into the grammer, catalog, psql and pg_dump for
restrictive RLS policies, thus reducing the cases where an extension is
necessary.

In passing, also move away from using "AND"d and "OR"d in comments.
As pointed out by Alvaro, it's not really appropriate to attempt
to make verbs out of "AND" and "OR", so reword those comments which
attempted to.

Reviewed By: Jeevan Chalke, Dean Rasheed
Discussion: https://postgr.es/m/20160901063404.GY4028@tamriel.snowman.net
2016-12-05 15:50:55 -05:00
Tom Lane b3427dade1 Delete deleteWhatDependsOn() in favor of more performDeletion() flag bits.
deleteWhatDependsOn() had grown an uncomfortably large number of
assumptions about what it's used for.  There are actually only two minor
differences between what it does and what a regular performDeletion() call
can do, so let's invent additional bits in performDeletion's existing flags
argument that specify those behaviors, and get rid of deleteWhatDependsOn()
as such.  (We'd probably have done it this way from the start, except that
performDeletion didn't originally have a flags argument, IIRC.)

Also, add a SKIP_EXTENSIONS flag bit that prevents ever recursing to an
extension, and use that when dropping temporary objects at session end.
This provides a more general solution to the problem addressed in a hacky
way in commit 08dd23cec: if an extension script creates temp objects and
forgets to remove them again, the whole extension went away when its
contained temp objects were deleted.  The previous solution only covered
temp relations, but this solves it for all object types.

These changes require minor additions in dependency.c to pass the flags
to subroutines that previously didn't get them, but it's still a net
savings of code, and it seems cleaner than before.

Having done this, revert the special-case code added in 08dd23cec that
prevented addition of pg_depend records for temp table extension
membership, because that caused its own oddities: dropping an extension
that had created such a table didn't automatically remove the table,
leading to a failure if the table had another dependency on the extension
(such as use of an extension data type), or to a duplicate-name failure if
you then tried to recreate the extension.  But we keep the part that
prevents the pg_temp_nnn schema from becoming an extension member; we never
want that to happen.  Add a regression test case covering these behaviors.

Although this fixes some arguable bugs, we've heard few field complaints,
and any such problems are easily worked around by explicitly dropping temp
objects at the end of extension scripts (which seems like good practice
anyway).  So I won't risk a back-patch.

Discussion: https://postgr.es/m/e51f4311-f483-4dd0-1ccc-abec3c405110@BlueTreble.com
2016-12-02 14:57:55 -05:00
Tom Lane 182db07040 Fix test about ignoring extension dependencies during extension scripts.
Commit 08dd23cec introduced an exception to the rule that extension member
objects can only be dropped as part of dropping the whole extension,
intending to allow such drops while running the extension's own creation or
update scripts.  However, the exception was only applied at the outermost
recursion level, because it was modeled on a pre-existing check to ignore
dependencies on objects listed in pendingObjects.  Bug #14434 from Philippe
Beaudoin shows that this is inadequate: in some cases we can reach an
extension member object by recursion from another one.  (The bug concerns
the serial-sequence case; I'm not sure if there are other cases, but there
might well be.)

To fix, revert 08dd23cec's changes to findDependentObjects() and instead
apply the creating_extension exception regardless of stack level.

Having seen this example, I'm a bit suspicious that the pendingObjects
logic is also wrong and such cases should likewise be allowed at any
recursion level.  However, changing that would interact in subtle ways
with the recursion logic (at least it would need to be moved to after the
recursing-from check).  Given that the code's been like that a long time,
I'll refrain from touching it without a clear example showing it's wrong.

Back-patch to all active branches.  In HEAD and 9.6, where suitable
test infrastructure exists, add a regression test case based on the
bug report.

Report: <20161125151448.6529.33039@wrigleys.postgresql.org>
Discussion: <13224.1480177514@sss.pgh.pa.us>
2016-11-26 13:31:35 -05:00
Peter Eisentraut 67dc4ccbb2 Add pg_sequences view
Like pg_tables, pg_views, and others, this view contains information
about sequences in a way that is independent of the system catalog
layout but more comprehensive than the information schema.

To help implement the view, add a new internal function
pg_sequence_last_value() to return the last value of a sequence.  This
is kept separate from pg_sequence_parameters() to separate querying
run-time state from catalog-like information.

Reviewed-by: Andreas Karlsson <andreas@proxel.se>
2016-11-18 14:59:03 -05:00
Tom Lane 3cca13cbfc Fix another bug in merging of inherited CHECK constraints.
It's not good for an inherited child constraint to be marked connoinherit;
that would result in the constraint not propagating to grandchild tables,
if any are created later.  The code mostly prevented this from happening
but there was one case that was missed.

This is somewhat related to commit e55a946a8, which also tightened checks
on constraint merging.  Hence, back-patch to 9.2 like that one.  This isn't
so much because there's a concrete feature-related reason to stop there,
as to avoid having more distinct behaviors than we have to in this area.

Amit Langote

Discussion: <b28ee774-7009-313d-dd55-5bdd81242c41@lab.ntt.co.jp>
2016-10-13 17:05:14 -04:00
Tom Lane e55a946a81 Fix two bugs in merging of inherited CHECK constraints.
Historically, we've allowed users to add a CHECK constraint to a child
table and then add an identical CHECK constraint to the parent.  This
results in "merging" the two constraints so that the pre-existing
child constraint ends up with both conislocal = true and coninhcount > 0.
However, if you tried to do it in the other order, you got a duplicate
constraint error.  This is problematic for pg_dump, which needs to issue
separated ADD CONSTRAINT commands in some cases, but has no good way to
ensure that the constraints will be added in the required order.
And it's more than a bit arbitrary, too.  The goal of complaining about
duplicated ADD CONSTRAINT commands can be served if we reject the case of
adding a constraint when the existing one already has conislocal = true;
but if it has conislocal = false, let's just make the ADD CONSTRAINT set
conislocal = true.  In this way, either order of adding the constraints
has the same end result.

Another problem was that the code allowed creation of a parent constraint
marked convalidated that is merged with a child constraint that is
!convalidated.  In this case, an inheritance scan of the parent table could
emit some rows violating the constraint condition, which would be an
unexpected result given the marking of the parent constraint as validated.
Hence, forbid merging of constraints in this case.  (Note: valid child and
not-valid parent seems fine, so continue to allow that.)

Per report from Benedikt Grundmann.  Back-patch to 9.2 where we introduced
possibly-not-valid check constraints.  The second bug obviously doesn't
apply before that, and I think the first doesn't either, because pg_dump
only gets into this situation when dealing with not-valid constraints.

Report: <CADbMkNPT-Jz5PRSQ4RbUASYAjocV_KHUWapR%2Bg8fNvhUAyRpxA%40mail.gmail.com>
Discussion: <22108.1475874586@sss.pgh.pa.us>
2016-10-08 19:29:27 -04:00
Alvaro Herrera b82d5a2c7c Silence compiler warnings
Reported by Peter Eisentraut.  Coding suggested by Tom Lane.
2016-09-28 19:31:58 -03:00
Tom Lane fdc9186f7e Replace the built-in GIN array opclasses with a single polymorphic opclass.
We had thirty different GIN array opclasses sharing the same operators and
support functions.  That still didn't cover all the built-in types, nor
did it cover arrays of extension-added types.  What we want is a single
polymorphic opclass for "anyarray".  There were two missing features needed
to make this possible:

1. We have to be able to declare the index storage type as ANYELEMENT
when the opclass is declared to index ANYARRAY.  This just takes a few
more lines in index_create().  Although this currently seems of use only
for GIN, there's no reason to make index_create() restrict it to that.

2. We have to be able to identify the proper GIN compare function for
the index storage type.  This patch proceeds by making the compare function
optional in GIN opclass definitions, and specifying that the default btree
comparison function for the index storage type will be looked up when the
opclass omits it.  Again, that seems pretty generically useful.

Since the comparison function lookup is done in initGinState(), making
use of the second feature adds an additional cache lookup to GIN index
access setup.  It seems unlikely that that would be very noticeable given
the other costs involved, but maybe at some point we should consider
making GinState data persist longer than it now does --- we could keep it
in the index relcache entry, perhaps.

Rather fortuitously, we don't seem to need to do anything to get this
change to play nice with dump/reload or pg_upgrade scenarios: the new
opclass definition is automatically selected to replace existing index
definitions, and the on-disk data remains compatible.  Also, if a user has
created a custom opclass definition for a non-builtin type, this doesn't
break that, since CREATE INDEX will prefer an exact match to opcintype
over a match to ANYARRAY.  However, if there's anyone out there with
handwritten DDL that explicitly specifies _bool_ops or one of the other
replaced opclass names, they'll need to adjust that.

Tom Lane, reviewed by Enrique Meneses

Discussion: <14436.1470940379@sss.pgh.pa.us>
2016-09-26 14:52:44 -04:00
Tom Lane a4c35ea1c2 Improve parser's and planner's handling of set-returning functions.
Teach the parser to reject misplaced set-returning functions during parse
analysis using p_expr_kind, in much the same way as we do for aggregates
and window functions (cf commit eaccfded9).  While this isn't complete
(it misses nesting-based restrictions), it's much better than the previous
error reporting for such cases, and it allows elimination of assorted
ad-hoc expression_returns_set() error checks.  We could add nesting checks
later if it seems important to catch all cases at parse time.

There is one case the parser will now throw error for although previous
versions allowed it, which is SRFs in the tlist of an UPDATE.  That never
behaved sensibly (since it's ill-defined which generated row should be
used to perform the update) and it's hard to see why it should not be
treated as an error.  It's a release-note-worthy change though.

Also, add a new Query field hasTargetSRFs reporting whether there are
any SRFs in the targetlist (including GROUP BY/ORDER BY expressions).
The parser can now set that basically for free during parse analysis,
and we can use it in a number of places to avoid expression_returns_set
searches.  (There will be more such checks soon.)  In some places, this
allows decontorting the logic since it's no longer expensive to check for
SRFs in the tlist --- so I made the checks parallel to the handling of
hasAggs/hasWindowFuncs wherever it seemed appropriate.

catversion bump because adding a Query field changes stored rules.

Andres Freund and Tom Lane

Discussion: <24639.1473782855@sss.pgh.pa.us>
2016-09-13 13:54:24 -04:00
Tom Lane 0ab9c56d0f Support renaming an existing value of an enum type.
Not much to be said about this patch: it does what it says on the tin.

In passing, rename AlterEnumStmt.skipIfExists to skipIfNewValExists
to clarify what it actually does.  In the discussion of this patch
we considered supporting other similar options, such as IF EXISTS
on the type as a whole or IF NOT EXISTS on the target name.  This
patch doesn't actually add any such feature, but it might happen later.

Dagfinn Ilmari Mannsåker, reviewed by Emre Hasegeli

Discussion: <CAO=2mx6uvgPaPDf-rHqG8=1MZnGyVDMQeh8zS4euRyyg4D35OQ@mail.gmail.com>
2016-09-07 16:11:56 -04:00
Peter Eisentraut 49eb0fd097 Add location field to DefElem
Add a location field to the DefElem struct, used to parse many utility
commands.  Update various error messages to supply error position
information.

To propogate the error position information in a more systematic way,
create a ParseState in standard_ProcessUtility() and pass that to
interested functions implementing the utility commands.  This seems
better than passing the query string and then reassembling a parse state
ad hoc, which violates the encapsulation of the ParseState type.

Reviewed-by: Pavel Stehule <pavel.stehule@gmail.com>
2016-09-06 12:00:00 -04:00
Tom Lane 65a588b4c3 Try to fix portability issue in enum renumbering (again).
The hack embodied in commit 4ba61a487 no longer works after today's change
to allow DatumGetFloat4/Float4GetDatum to be inlined (commit 14cca1bf8).
Probably what's happening is that the faulty compilers are deciding that
the now-inlined assignment is a no-op and so they're not required to
round to float4 width.

We had a bunch of similar issues earlier this year in the degree-based
trig functions, and eventually settled on using volatile intermediate
variables as the least ugly method of forcing recalcitrant compilers
to do what the C standard says (cf commit 82311bcdd).  Let's see if
that method works here.

Discussion: <4640.1472664476@sss.pgh.pa.us>
2016-08-31 13:58:01 -04:00
Tom Lane ea268cdc9a Add macros to make AllocSetContextCreate() calls simpler and safer.
I found that half a dozen (nearly 5%) of our AllocSetContextCreate calls
had typos in the context-sizing parameters.  While none of these led to
especially significant problems, they did create minor inefficiencies,
and it's now clear that expecting people to copy-and-paste those calls
accurately is not a great idea.  Let's reduce the risk of future errors
by introducing single macros that encapsulate the common use-cases.
Three such macros are enough to cover all but two special-purpose contexts;
those two calls can be left as-is, I think.

While this patch doesn't in itself improve matters for third-party
extensions, it doesn't break anything for them either, and they can
gradually adopt the simplified notation over time.

In passing, change TopMemoryContext to use the default allocation
parameters.  Formerly it could only be extended 8K at a time.  That was
probably reasonable when this code was written; but nowadays we create
many more contexts than we did then, so that it's not unusual to have a
couple hundred K in TopMemoryContext, even without considering various
dubious code that sticks other things there.  There seems no good reason
not to let it use growing blocks like most other contexts.

Back-patch to 9.6, mostly because that's still close enough to HEAD that
it's easy to do so, and keeping the branches in sync can be expected to
avoid some future back-patching pain.  The bugs fixed by these changes
don't seem to be significant enough to justify fixing them further back.

Discussion: <21072.1472321324@sss.pgh.pa.us>
2016-08-27 17:50:38 -04:00
Tom Lane 8299471c37 Use LEFT JOINs in some system views in case referenced row doesn't exist.
In particular, left join to pg_authid so that rows in pg_stat_activity
don't disappear if the session's owning user has been dropped.
Also convert a few joins to pg_database to left joins, in the same spirit,
though that case might be harder to hit.  We were doing this in other
views already, so it was a bit inconsistent that these views didn't.

Oskari Saarenmaa, with some further tweaking by me

Discussion: <56E87CD8.60007@ohmu.fi>
2016-08-19 17:13:47 -04:00
Tom Lane cf9b0fea5f Implement regexp_match(), a simplified alternative to regexp_matches().
regexp_match() is like regexp_matches(), but it disallows the 'g' flag
and in consequence does not need to return a set.  Instead, it returns
a simple text array value, or NULL if there's no match.  Previously people
usually got that behavior with a sub-select, but this way is considerably
more efficient.

Documentation adjusted so that regexp_match() is presented first and then
regexp_matches() is introduced as a more complicated version.  This is
a bit historically revisionist but seems pedagogically better.

Still TODO: extend contrib/citext to support this function.

Emre Hasegeli, reviewed by David Johnston

Discussion: <CAE2gYzy42sna2ME_e3y1KLQ-4UBrB-eVF0SWn8QG39sQSeVhEw@mail.gmail.com>
2016-08-17 18:33:01 -04:00
Tom Lane ca9112a424 Stamp HEAD as 10devel.
This is a good bit more complicated than the average new-version stamping
commit, because it includes various adjustments in pursuit of changing
from three-part to two-part version numbers.  It's likely some further
work will be needed around that change; but this is enough to get through
the regression tests, at least in Unix builds.

Peter Eisentraut and Tom Lane
2016-08-15 13:49:49 -04:00
Tom Lane ed0097e4f9 Add SQL-accessible functions for inspecting index AM properties.
Per discussion, we should provide such functions to replace the lost
ability to discover AM properties by inspecting pg_am (cf commit
65c5fcd35).  The added functionality is also meant to displace any code
that was looking directly at pg_index.indoption, since we'd rather not
believe that the bit meanings in that field are part of any client API
contract.

As future-proofing, define the SQL API to not assume that properties that
are currently AM-wide or index-wide will remain so unless they logically
must be; instead, expose them only when inquiring about a specific index
or even specific index column.  Also provide the ability for an index
AM to override the behavior.

In passing, document pg_am.amtype, overlooked in commit 473b93287.

Andrew Gierth, with kibitzing by me and others

Discussion: <87mvl5on7n.fsf@news-spur.riddles.org.uk>
2016-08-13 18:31:14 -04:00
Peter Eisentraut d8710f18f4 Correct column name in information schema
Although the standard has routines.result_cast_character_set_name, given
the naming of the surrounding columns, we concluded that this must have
been a mistake and that result_cast_char_set_name was intended, so
change the implementation.  The documentation was already using the new
name.

found by Clément Prévost <prevostclement@gmail.com>
2016-08-07 21:56:13 -04:00
Peter Eisentraut 6a9e09c49e Add missing casts in information schema
From: Clément Prévost <prevostclement@gmail.com>
2016-08-03 14:41:01 -04:00
Fujii Masao 60d50769b7 Rename pg_stat_wal_receiver.conn_info to conninfo.
Per discussion on pgsql-hackers, conninfo is better as the column name
because it's more commonly used in PostgreSQL.

Catalog version bumped due to the change of pg_proc.

Author: Michael Paquier
2016-07-07 12:59:39 +09:00
Alvaro Herrera 9ed551e0a4 Add conninfo to pg_stat_wal_receiver
Commit b1a9bad9e7 introduced a stats view to provide insight into the
running WAL receiver, but neglected to include the connection string in
it, as reported by Michaël Paquier.  This commit fixes that omission.
(Any security-sensitive information is not disclosed).

While at it, close the mild security hole that we were exposing the
password in the connection string in shared memory.  This isn't
user-accessible, but it still looks like a good idea to avoid having the
cleartext password in memory.

Author: Michaël Paquier, Álvaro Herrera
Review by: Vik Fearing

Discussion: https://www.postgresql.org/message-id/CAB7nPqStg4M561obo7ryZ5G+fUydG4v1Ajs1xZT1ujtu+woRag@mail.gmail.com
2016-06-29 16:57:17 -04:00
Tom Lane f8ace5477e Fix type-safety problem with parallel aggregate serial/deserialization.
The original specification for this called for the deserialization function
to have signature "deserialize(serialtype) returns transtype", which is a
security violation if transtype is INTERNAL (which it always would be in
practice) and serialtype is not (which ditto).  The patch blithely overrode
the opr_sanity check for that, which was sloppy-enough work in itself,
but the indisputable reason this cannot be allowed to stand is that CREATE
FUNCTION will reject such a signature and thus it'd be impossible for
extensions to create parallelizable aggregates.

The minimum fix to make the signature type-safe is to add a second, dummy
argument of type INTERNAL.  But to lock it down a bit more and make misuse
of INTERNAL-accepting functions less likely, let's get rid of the ability
to specify a "serialtype" for an aggregate and just say that the only
useful serialtype is BYTEA --- which, in practice, is the only interesting
value anyway, due to the usefulness of the send/recv infrastructure for
this purpose.  That means we only have to allow "serialize(internal)
returns bytea" and "deserialize(bytea, internal) returns internal" as
the signatures for these support functions.

In passing fix bogus signature of int4_avg_combine, which I found thanks
to adding an opr_sanity check on combinefunc signatures.

catversion bump due to removing pg_aggregate.aggserialtype and adjusting
signatures of assorted built-in functions.

David Rowley and Tom Lane

Discussion: <27247.1466185504@sss.pgh.pa.us>
2016-06-22 16:52:41 -04:00
Tom Lane 9bc3332372 Improve error message annotation for GRANT/REVOKE on untrusted PLs.
The annotation for "ERROR: language "foo" is not trusted" used to say
"HINT: Only superusers can use untrusted languages", which was fairly
poorly thought out.  For one thing, it's not a hint about what to do,
but a statement of fact, which makes it errdetail.  But also, this
fails to clarify things much, because there's a missing step in the
chain of reasoning.  I think it's more useful to say "GRANT and REVOKE
are not allowed on untrusted languages, because only superusers can use
untrusted languages".

It's been like this for a long time, but given the lack of previous
complaints, I don't think this is worth back-patching.

Discussion: <1417.1466289901@sss.pgh.pa.us>
2016-06-18 19:38:59 -04:00
Robert Haas 71d05a2c7b pg_visibility: Add pg_truncate_visibility_map function.
This requires some core changes as well so that we can properly
WAL-log the truncation.  Specifically, it changes the format of the
XLOG_SMGR_TRUNCATE WAL record, so bump XLOG_PAGE_MAGIC.

Patch by me, reviewed but not fully endorsed by Andres Freund.
2016-06-17 17:37:30 -04:00
Tom Lane 783cb6e48b Fix multiple minor infelicities in aclchk.c error reports.
pg_type_aclmask reported the wrong type's OID when complaining that
it could not find a type's typelem.  It also failed to provide a
suitable errcode when the initially given OID doesn't exist (which
is a user-facing error, since that OID can be user-specified).
pg_foreign_data_wrapper_aclmask and pg_foreign_server_aclmask likewise
lacked errcode specifications.  Trivial cosmetic adjustments too.

The wrong-type-OID problem was reported by Petru-Florin Mihancea in
bug #14186; the other issues noted by me while reading the code.
These errors all seem to be aboriginal in the respective routines, so
back-patch as necessary.

Report: <20160613163159.5798.52928@wrigleys.postgresql.org>
2016-06-13 13:53:10 -04:00
Kevin Grittner 13761bccb1 Rename local variable for consistency.
Pointed out by Robert Haas.
2016-06-10 11:24:01 -05:00
Kevin Grittner bf9a60ee33 Fix interaction between CREATE INDEX and "snapshot too old".
Since indexes are created without valid LSNs, an index created
while a snapshot older than old_snapshot_threshold existed could
cause queries to return incorrect results when those old snapshots
were used, if any relevant rows had been subject to early pruning
before the index was built.  Prevent usage of a newly created index
until all such snapshots are released, for relations where this can
happen.

Questions about the interaction of "snapshot too old" with index
creation were initially raised by Andres Freund.

Reviewed by Robert Haas.
2016-06-10 09:25:31 -05:00
Tom Lane cae1c788b9 Improve the situation for parallel query versus temp relations.
Transmit the leader's temp-namespace state to workers.  This is important
because without it, the workers do not really have the same search path
as the leader.  For example, there is no good reason (and no extant code
either) to prevent a worker from executing a temp function that the
leader created previously; but as things stood it would fail to find the
temp function, and then either fail or execute the wrong function entirely.

We still prohibit a worker from creating a temp namespace on its own.
In effect, a worker can only see the session's temp namespace if the leader
had created it before starting the worker, which seems like the right
semantics.

Also, transmit the leader's BackendId to workers, and arrange for workers
to use that when determining the physical file path of a temp relation
belonging to their session.  While the original intent was to prevent such
accesses entirely, there were a number of holes in that, notably in places
like dbsize.c which assume they can safely access temp rels of other
sessions anyway.  We might as well get this right, as a small down payment
on someday allowing workers to access the leader's temp tables.  (With
this change, directly using "MyBackendId" as a relation or buffer backend
ID is deprecated; you should use BackendIdForTempRelations() instead.
I left a couple of such uses alone though, as they're not going to be
reachable in parallel workers until we do something about localbuf.c.)

Move the thou-shalt-not-access-thy-leader's-temp-tables prohibition down
into localbuf.c, which is where it actually matters, instead of having it
in relation_open().  This amounts to recognizing that access to temp
tables' catalog entries is perfectly safe in a worker, it's only the data
in local buffers that is problematic.

Having done all that, we can get rid of the test in has_parallel_hazard()
that says that use of a temp table's rowtype is unsafe in parallel workers.
That test was unduly expensive, and if we really did need such a
prohibition, that was not even close to being a bulletproof guard for it.
(For example, any user-defined function executed in a parallel worker
might have attempted such access.)
2016-06-09 20:16:11 -04:00
Robert Haas 4bc424b968 pgindent run for 9.6 2016-06-09 18:02:36 -04:00
Greg Stark e1623c3959 Fix various common mispellings.
Mostly these are just comments but there are a few in documentation
and a handful in code and tests. Hopefully this doesn't cause too much
unnecessary pain for backpatching. I relented from some of the most
common like "thru" for that reason. The rest don't seem numerous
enough to cause problems.

Thanks to Kevin Lyda's tool https://pypi.python.org/pypi/misspellings
2016-06-03 16:08:45 +01:00
Peter Eisentraut 9b7bfc3a88 sql_features: Fix typos
This makes the feature names match the SQL standard.

From: Alexander Law <exclusion@gmail.com>
2016-05-13 21:24:54 -04:00
Tom Lane 26e66184d6 Fix assorted missing infrastructure for ON CONFLICT.
subquery_planner() failed to apply expression preprocessing to the
arbiterElems and arbiterWhere fields of an OnConflictExpr.  No doubt the
theory was that this wasn't necessary because we don't actually try to
execute those expressions; but that's wrong, because it results in failure
to match to index expressions or index predicates that are changed at all
by preprocessing.  Per bug #14132 from Reynold Smith.

Also add pullup_replace_vars processing for onConflictWhere.  Perhaps
it's impossible to have a subquery reference there, but I'm not exactly
convinced; and even if true today it's a failure waiting to happen.

Also add some comments to other places where one or another field of
OnConflictExpr is intentionally ignored, with explanation as to why it's
okay to do so.

Also, catalog/dependency.c failed to record any dependency on the named
constraint in ON CONFLICT ON CONSTRAINT, allowing such a constraint to
be dropped while rules exist that depend on it, and allowing pg_dump to
dump such a rule before the constraint it refers to.  The normal execution
path managed to error out reasonably for a dangling constraint reference,
but ruleutils.c dumped core; so in addition to fixing the omission, add
a protective check in ruleutils.c, since we can't retroactively add a
dependency in existing databases.

Back-patch to 9.5 where this code was introduced.

Report: <20160510190350.2608.48667@wrigleys.postgresql.org>
2016-05-11 16:20:23 -04:00
Tom Lane 1a2c17f8e2 Fix pg_upgrade to not fail when new-cluster TOAST rules differ from old.
This patch essentially reverts commit 4c6780fd17, in favor of a much
simpler solution for the case where the new cluster would choose to create
a TOAST table but the old cluster doesn't have one: just don't create a
TOAST table.

The existing code failed in at least two different ways if the situation
arose: (1) ALTER TABLE RESET didn't grab an exclusive lock, so that the
lock sanity check in create_toast_table failed; (2) pg_upgrade did not
provide a pg_type OID for the new toast table, so that the crosscheck in
TypeCreate failed.  While both these problems were introduced by later
patches, they show that the hack being used to cause TOAST table creation
is overwhelmingly fragile (and untested).  I also note that before the
TypeCreate crosscheck was added, the code would have resulted in assigning
an indeterminate pg_type OID to the toast table, possibly causing a later
OID conflict in that catalog; so that it didn't really work even when
committed.

If we simply don't create a TOAST table, there will only be a problem if
the code tries to store a tuple that's wider than a page, and field
compression isn't sufficient to get it under a page.  Given that the TOAST
creation threshold is intended to be about a quarter of a page, it's very
hard to believe that cross-version differences in the do-we-need-a-toast-
table heuristic could result in an observable problem.  So let's just
follow the old version's conclusion about whether a TOAST table is needed.

(If we ever do change needs_toast_table() so much that this conclusion
doesn't apply, we can devise a solution at that time, and hopefully do
it in a less klugy way than 4c6780fd17 did.)

Back-patch to 9.3, like the previous patch.

Discussion: <8110.1462291671@sss.pgh.pa.us>
2016-05-06 22:05:56 -04:00
Stephen Frost a89505fd21 Remove various special checks around default roles
Default roles really should be like regular roles, for the most part.
This removes a number of checks that were trying to make default roles
extra special by not allowing them to be used as regular roles.

We still prevent users from creating roles in the "pg_" namespace or
from altering roles which exist in that namespace via ALTER ROLE, as
we can't preserve such changes, but otherwise the roles are very much
like regular roles.

Based on discussion with Robert and Tom.
2016-05-06 14:06:50 -04:00
Robert Haas 9888b34fdb Fix more things to be parallel-safe.
Conversion functions were previously marked as parallel-unsafe, since
that is the default, but in fact they are safe.  Parallel-safe
functions defined in pg_proc.h and redefined in system_views.sql were
ending up as parallel-unsafe because the redeclarations were not
marked PARALLEL SAFE.  While editing system_views.sql, mark ts_debug()
parallel safe also.

Andreas Karlsson
2016-05-03 14:36:38 -04:00
Robert Haas 37d0c2cb1a Fix parallel safety markings for pg_start_backup.
Commit 7117685461 made pg_start_backup
parallel-restricted rather than parallel-safe, because it now relies
on backend-private state that won't be synchronized with the parallel
worker.  However, it didn't update pg_proc.h.  Separately, Andreas
Karlsson observed that system_views.sql neglected to reiterate the
parallel-safety markings whe redefining various functions, including
this one; so add a PARALLEL RESTRICTED declaration there to match
the new value in pg_proc.h.
2016-05-02 10:42:34 -04:00
Kevin Grittner a343e223a5 Revert no-op changes to BufferGetPage()
The reverted changes were intended to force a choice of whether any
newly-added BufferGetPage() calls needed to be accompanied by a
test of the snapshot age, to support the "snapshot too old"
feature.  Such an accompanying test is needed in about 7% of the
cases, where the page is being used as part of a scan rather than
positioning for other purposes (such as DML or vacuuming).  The
additional effort required for back-patching, and the doubt whether
the intended benefit would really be there, have indicated it is
best just to rely on developers to do the right thing based on
comments and existing usage, as we do with many other conventions.

This change should have little or no effect on generated executable
code.

Motivated by the back-patching pain of Tom Lane and Robert Haas
2016-04-20 08:31:19 -05:00
Stephen Frost 99f2f3c19a In recordExtensionInitPriv(), keep the scan til we're done with it
For reasons of sheer brain fade, we (I) was calling systable_endscan()
immediately after systable_getnext() and expecting the tuple returned
by systable_getnext() to still be valid.

That's clearly wrong.  Move the systable_endscan() down below the tuple
usage.

Discovered initially by Pavel Stehule and then also by Alvaro.

Add a regression test based on Alvaro's testing.
2016-04-15 21:57:15 -04:00
Stephen Frost 293007898d Reserve the "pg_" namespace for roles
This will prevent users from creating roles which begin with "pg_" and
will check for those roles before allowing an upgrade using pg_upgrade.

This will allow for default roles to be provided at initdb time.

Reviews by José Luis Tallón and Robert Haas
2016-04-08 16:56:27 -04:00
Kevin Grittner 8b65cf4c5e Modify BufferGetPage() to prepare for "snapshot too old" feature
This patch is a no-op patch which is intended to reduce the chances
of failures of omission once the functional part of the "snapshot
too old" patch goes in.  It adds parameters for snapshot, relation,
and an enum to specify whether the snapshot age check needs to be
done for the page at this point.  This initial patch passes NULL
for the first two new parameters and BGP_NO_SNAPSHOT_TEST for the
third.  The follow-on patch will change the places where the test
needs to be made.
2016-04-08 14:30:10 -05:00
Teodor Sigaev 8b99edefca Revert CREATE INDEX ... INCLUDING ...
It's not ready yet, revert two commits
690c543550 - unstable test output
386e3d7609 - patch itself
2016-04-08 21:52:13 +03:00
Teodor Sigaev 386e3d7609 CREATE INDEX ... INCLUDING (column[, ...])
Now indexes (but only B-tree for now) can contain "extra" column(s) which
doesn't participate in index structure, they are just stored in leaf
tuples. It allows to use index only scan by using single index instead
of two or more indexes.

Author: Anastasia Lubennikova with minor editorializing by me
Reviewers: David Rowley, Peter Geoghegan, Jeff Janes
2016-04-08 19:45:59 +03:00
Stephen Frost 1574783b4c Use GRANT system to manage access to sensitive functions
Now that pg_dump will properly dump out any ACL changes made to
functions which exist in pg_catalog, switch to using the GRANT system
to manage access to those functions.

This means removing 'if (!superuser()) ereport()' checks from the
functions themselves and then REVOKEing EXECUTE right from 'public' for
these functions in system_views.sql.

Reviews by Alexander Korotkov, Jose Luis Tallon
2016-04-06 21:45:32 -04:00
Stephen Frost 23f34fa4ba In pg_dump, include pg_catalog and extension ACLs, if changed
Now that all of the infrastructure exists, add in the ability to
dump out the ACLs of the objects inside of pg_catalog or the ACLs
for objects which are members of extensions, but only if they have
been changed from their original values.

The original values are tracked in pg_init_privs.  When pg_dump'ing
9.6-and-above databases, we will dump out the ACLs for all objects
in pg_catalog and the ACLs for all extension members, where the ACL
has been changed from the original value which was set during either
initdb or CREATE EXTENSION.

This should not change dumps against pre-9.6 databases.

Reviews by Alexander Korotkov, Jose Luis Tallon
2016-04-06 21:45:32 -04:00
Stephen Frost 6c268df127 Add new catalog called pg_init_privs
This new catalog holds the privileges which the system was
initialized with at initdb time, along with any permissions set
by extensions at CREATE EXTENSION time.  This allows pg_dump
(and any other similar use-cases) to detect when the privileges
set on initdb-created or extension-created objects have been
changed from what they were set to at initdb/extension-creation
time and handle those changes appropriately.

Reviews by Alexander Korotkov, Jose Luis Tallon
2016-04-06 21:45:32 -04:00
Teodor Sigaev 0b62fd036e Add jsonb_insert
It inserts a new value into an jsonb array at arbitrary position or
a new key to jsonb object.

Author: Dmitry Dolgov
Reviewers: Petr Jelinek, Vitaly Burovoy, Andrew Dunstan
2016-04-06 19:25:00 +03:00
Alvaro Herrera f2fcad27d5 Support ALTER THING .. DEPENDS ON EXTENSION
This introduces a new dependency type which marks an object as depending
on an extension, such that if the extension is dropped, the object
automatically goes away; and also, if the database is dumped, the object
is included in the dump output.  Currently the grammar supports this for
indexes, triggers, materialized views and functions only, although the
utility code is generic so adding support for more object types is a
matter of touching the parser rules only.

Author: Abhijit Menon-Sen
Reviewed-by: Alexander Korotkov, Álvaro Herrera
Discussion: http://www.postgresql.org/message-id/20160115062649.GA5068@toroid.org
2016-04-05 18:38:54 -03:00
Robert Haas 41ea0c2376 Fix parallel-safety code for parallel aggregation.
has_parallel_hazard() was ignoring the proparallel markings for
aggregates, which is no good.  Fix that.  There was no way to mark
an aggregate as actually being parallel-safe, either, so add a
PARALLEL option to CREATE AGGREGATE.

Patch by me, reviewed by David Rowley.
2016-04-05 16:06:15 -04:00
Magnus Hagander 7117685461 Implement backup API functions for non-exclusive backups
Previously non-exclusive backups had to be done using the replication protocol
and pg_basebackup. With this commit it's now possible to make them using
pg_start_backup/pg_stop_backup as well, as long as the backup program can
maintain a persistent connection to the database.

Doing this, backup_label and tablespace_map are returned as results from
pg_stop_backup() instead of being written to the data directory. This makes
the server safe from a crash during an ongoing backup, which can be a problem
with exclusive backups.

The old syntax of the functions remain and work exactly as before, but since the
new syntax is safer this should eventually be deprecated and removed.

Only reference documentation is included. The main section on backup still needs
to be rewritten to cover this, but since that is already scheduled for a separate
large rewrite, it's not included in this patch.

Reviewed by David Steele and Amit Kapila
2016-04-05 20:03:49 +02:00
Robert Haas 5fe5a2cee9 Allow aggregate transition states to be serialized and deserialized.
This is necessary infrastructure for supporting parallel aggregation
for aggregates whose transition type is "internal".  Such values
can't be passed between cooperating processes, because they are
just pointers.

David Rowley, reviewed by Tomas Vondra and by me.
2016-03-29 15:04:05 -04:00
Tom Lane c94959d411 Fix DROP OPERATOR to reset oprcom/oprnegate links to the dropped operator.
This avoids leaving dangling links in pg_operator; which while fairly
harmless are also unsightly.

While we're at it, simplify OperatorUpd, which went through
heap_modify_tuple for no very good reason considering it had already made
a tuple copy it could just scribble on.

Roma Sokolov, reviewed by Tomas Vondra, additional hacking by Robert Haas
and myself.
2016-03-25 12:33:16 -04:00
Alvaro Herrera 473b932870 Support CREATE ACCESS METHOD
This enables external code to create access methods.  This is useful so
that extensions can add their own access methods which can be formally
tracked for dependencies, so that DROP operates correctly.  Also, having
explicit support makes pg_dump work correctly.

Currently only index AMs are supported, but we expect different types to
be added in the future.

Authors: Alexander Korotkov, Petr Jelínek
Reviewed-By: Teodor Sigaev, Petr Jelínek, Jim Nasby
Commitfest-URL: https://commitfest.postgresql.org/9/353/
Discussion: https://www.postgresql.org/message-id/CAPpHfdsXwZmojm6Dx+TJnpYk27kT4o7Ri6X_4OSWcByu1Rm+VA@mail.gmail.com
2016-03-23 23:01:35 -03:00
Teodor Sigaev 3187d6de0e Introduce parse_ident()
SQL-layer function to split qualified identifier into array parts.

Author: Pavel Stehule with minor editorization by me and Jim Nasby
2016-03-18 18:16:14 +03:00
Robert Haas c16dc1aca5 Add simple VACUUM progress reporting.
There's a lot more that could be done here yet - in particular, this
reports only very coarse-grained information about the index vacuuming
phase - but even as it stands, the new pg_stat_progress_vacuum can
tell you quite a bit about what a long-running vacuum is actually
doing.

Amit Langote and Robert Haas, based on earlier work by Vinayak Pokale
and Rahila Syed.
2016-03-15 13:32:56 -04:00
Tom Lane 364a9f47ab Refactor pull_var_clause's API to make it less tedious to extend.
In commit 1d97c19a0f and later c1d9579dd8, we extended
pull_var_clause's API by adding enum-type arguments.  That's sort of a pain
to maintain, though, because it means every time we add a new behavior we
must touch every last one of the call sites, even if there's a reasonable
default behavior that most of them could use.  Let's switch over to using a
bitmask of flags, instead; that seems more maintainable and might save a
nanosecond or two as well.  This commit changes no behavior in itself,
though I'm going to follow it up with one that does add a new behavior.

In passing, remove flatten_tlist(), which has not been used since 9.1
and would otherwise need the same API changes.

Removing these enums means that optimizer/tlist.h no longer needs to
depend on optimizer/var.h.  Changing that caused a number of C files to
need addition of #include "optimizer/var.h" (probably we can thank old
runs of pgrminclude for that); but on balance it seems like a good change
anyway.
2016-03-10 15:53:07 -05:00
Robert Haas 53be0b1add Provide much better wait information in pg_stat_activity.
When a process is waiting for a heavyweight lock, we will now indicate
the type of heavyweight lock for which it is waiting.  Also, you can
now see when a process is waiting for a lightweight lock - in which
case we will indicate the individual lock name or the tranche, as
appropriate - or for a buffer pin.

Amit Kapila, Ildus Kurbangaliev, reviewed by me.  Lots of helpful
discussion and suggestions by many others, including Alexander
Korotkov, Vladimir Borodin, and many others.
2016-03-10 12:44:09 -05:00
Robert Haas a892234f83 Change the format of the VM fork to add a second bit per page.
The new bit indicates whether every tuple on the page is already frozen.
It is cleared only when the all-visible bit is cleared, and it can be
set only when we vacuum a page and find that every tuple on that page is
both visible to every transaction and in no need of any future
vacuuming.

A future commit will use this new bit to optimize away full-table scans
that would otherwise be triggered by XID wraparound considerations.  A
page which is merely all-visible must still be scanned in that case, but
a page which is all-frozen need not be.  This commit does not attempt
that optimization, although that optimization is the goal here.  It
seems better to get the basic infrastructure in place first.

Per discussion, it's very desirable for pg_upgrade to automatically
migrate existing VM forks from the old format to the new format.  That,
too, will be handled in a follow-on patch.

Masahiko Sawada, reviewed by Kyotaro Horiguchi, Fujii Masao, Amit
Kapila, Simon Riggs, Andres Freund, and others, and substantially
revised by me.
2016-03-01 21:49:41 -05:00
Joe Conway a5c43b8869 Add new system view, pg_config
Move and refactor the underlying code for the pg_config client
application to src/common in support of sharing it with a new
system information SRF called pg_config() which makes the same
information available via SQL. Additionally wrap the SRF with a
new system view, as called pg_config.

Patch by me with extensive input and review by Michael Paquier
and additional review by Alvaro Herrera.
2016-02-17 09:12:06 -08:00
Robert Haas f1f5ec1efa Reuse abbreviated keys in ordered [set] aggregates.
When processing ordered aggregates following a sort that could make use
of the abbreviated key optimization, only call the equality operator to
compare successive pairs of tuples when their abbreviated keys were not
equal.

Peter Geoghegan, reviewd by Andreas Karlsson and by me.
2016-02-17 15:40:00 +05:30
Tom Lane f144f73242 Refactor check_functional_grouping() to use get_primary_key_attnos().
If we ever get around to allowing functional dependency to be proven
from other things besides simple primary keys, this code will need to
be rethought, but that was true anyway.  In the meantime, we might as
well not have two very-similar routines for scanning pg_constraint.

David Rowley, reviewed by Julien Rouhaud
2016-02-11 17:52:03 -05:00
Tom Lane d4c3a156cb Remove GROUP BY columns that are functionally dependent on other columns.
If a GROUP BY clause includes all columns of a non-deferred primary key,
as well as other columns of the same relation, those other columns are
redundant and can be dropped from the grouping; the pkey is enough to
ensure that each row of the table corresponds to a separate group.
Getting rid of the excess columns will reduce the cost of the sorting or
hashing needed to implement GROUP BY, and can indeed remove the need for
a sort step altogether.

This seems worth testing for since many query authors are not aware of
the GROUP-BY-primary-key exception to the rule about queries not being
allowed to reference non-grouped-by columns in their targetlists or
HAVING clauses.  Thus, redundant GROUP BY items are not uncommon.  Also,
we can make the test pretty cheap in most queries where it won't help
by not looking up a rel's primary key until we've found that at least
two of its columns are in GROUP BY.

David Rowley, reviewed by Julien Rouhaud
2016-02-11 17:34:59 -05:00
Tom Lane 72eee410d4 Move pg_constraint.h function declarations to new file pg_constraint_fn.h.
A pending patch requires exporting a function returning Bitmapset from
catalog/pg_constraint.c.  As things stand, that would mean including
nodes/bitmapset.h in pg_constraint.h, which might be hazardous for the
client-side includability of that header.  It's not entirely clear whether
any client-side code needs to include pg_constraint.h, but it seems prudent
to assume that there is some such code somewhere.  Therefore, split off the
function definitions into a new file pg_constraint_fn.h, similarly to what
we've done for some other catalog header files.
2016-02-11 15:51:28 -05:00
Robert Haas a7de3dc5c3 Support multi-stage aggregation.
Aggregate nodes now have two new modes: a "partial" mode where they
output the unfinalized transition state, and a "finalize" mode where
they accept unfinalized transition states rather than individual
values as input.

These new modes are not used anywhere yet, but they will be necessary
for parallel aggregation.  The infrastructure also figures to be
useful for cases where we want to aggregate local data and remote
data via the FDW interface, and want to bring back partial aggregates
from the remote side that can then be combined with locally generated
partial aggregates to produce the final value.  It may also be useful
even when neither FDWs nor parallelism are in play, as explained in
the comments in nodeAgg.c.

David Rowley and Simon Riggs, reviewed by KaiGai Kohei, Heikki
Linnakangas, Haribabu Kommi, and me.
2016-01-20 13:46:50 -05:00
Tom Lane 65c5fcd353 Restructure index access method API to hide most of it at the C level.
This patch reduces pg_am to just two columns, a name and a handler
function.  All the data formerly obtained from pg_am is now provided
in a C struct returned by the handler function.  This is similar to
the designs we've adopted for FDWs and tablesample methods.  There
are multiple advantages.  For one, the index AM's support functions
are now simple C functions, making them faster to call and much less
error-prone, since the C compiler can now check function signatures.
For another, this will make it far more practical to define index access
methods in installable extensions.

A disadvantage is that SQL-level code can no longer see attributes
of index AMs; in particular, some of the crosschecks in the opr_sanity
regression test are no longer possible from SQL.  We've addressed that
by adding a facility for the index AM to perform such checks instead.
(Much more could be done in that line, but for now we're content if the
amvalidate functions more or less replace what opr_sanity used to do.)
We might also want to expose some sort of reporting functionality, but
this patch doesn't do that.

Alexander Korotkov, reviewed by Petr Jelínek, and rather heavily
editorialized on by me.
2016-01-17 19:36:59 -05:00
Tom Lane 8d290c8ec6 Re-pgindent a few files.
In preparation for landing index AM interface changes.
2016-01-17 19:13:18 -05:00
Tom Lane 26d538dc93 Clean up some lack-of-STRICT issues in the core code, too.
A scan for missed proisstrict markings in the core code turned up
these functions:

brin_summarize_new_values
pg_stat_reset_single_table_counters
pg_stat_reset_single_function_counters
pg_create_logical_replication_slot
pg_create_physical_replication_slot
pg_drop_replication_slot

The first three of these take OID, so a null argument will normally look
like a zero to them, resulting in "ERROR: could not open relation with OID
0" for brin_summarize_new_values, and no action for the pg_stat_reset_XXX
functions.  The other three will dump core on a null argument, though this
is mitigated by the fact that they won't do so until after checking that
the caller is superuser or has rolreplication privilege.

In addition, the pg_logical_slot_get/peek[_binary]_changes family was
intentionally marked nonstrict, but failed to make nullness checks on all
the arguments; so again a null-pointer-dereference crash is possible but
only for superusers and rolreplication users.

Add the missing ARGISNULL checks to the latter functions, and mark the
former functions as strict in pg_proc.  Make that change in the back
branches too, even though we can't force initdb there, just so that
installations initdb'd in future won't have the issue.  Since none of these
bugs rise to the level of security issues (and indeed the pg_stat_reset_XXX
functions hardly misbehave at all), it seems sufficient to do this.

In addition, fix some order-of-operations oddities in the slot_get_changes
family, mostly cosmetic, but not the part that moves the function's last
few operations into the PG_TRY block.  As it stood, there was significant
risk for an error to exit without clearing historical information from
the system caches.

The slot_get_changes bugs go back to 9.4 where that code was introduced.
Back-patch appropriate subsets of the pg_proc changes into all active
branches, as well.
2016-01-09 16:58:32 -05:00
Alvaro Herrera b1a9bad9e7 pgstat: add WAL receiver status view & SRF
This new view provides insight into the state of a running WAL receiver
in a HOT standby node.
The information returned includes the PID of the WAL receiver process,
its status (stopped, starting, streaming, etc), start LSN and TLI, last
received LSN and TLI, timestamp of last message send and receipt, latest
end-of-WAL LSN and time, and the name of the slot (if any).

Access to the detailed data is only granted to superusers; others only
get the PID.

Author: Michael Paquier
Reviewer: Haribabu Kommi
2016-01-07 16:21:19 -03:00
Tom Lane 4bf87169cc Comment typo fix.
Per Amit Langote.
2016-01-06 11:06:51 -05:00
Bruce Momjian ee94300446 Update copyright for 2016
Backpatch certain files through 9.1
2016-01-02 13:33:40 -05:00
Tom Lane 0dab5ef39b Fix ALTER OPERATOR to update dependencies properly.
Fix an oversight in commit 321eed5f0f7563a0: replacing an operator's
selectivity functions needs to result in a corresponding update in
pg_depend.  We have a function that can handle that, but it was not
called by AlterOperator().

To fix this without enlarging pg_operator.h's #include list beyond
what clients can safely include, split off the function definitions
into a new file pg_operator_fn.h, similarly to what we've done for
some other catalog header files.  It's not entirely clear whether
any client-side code needs to include pg_operator.h, but it seems
prudent to assume that there is some such code somewhere.
2015-12-31 17:37:31 -05:00
Tom Lane 66d947b9d3 Adjust behavior of single-user -j mode for better initdb error reporting.
Previously, -j caused the entire input file to be read in and executed as
a single command string.  That's undesirable, not least because any error
causes the entire file to be regurgitated as the "failing query".  Some
experimentation suggests a better rule: end the command string when we see
a semicolon immediately followed by two newlines, ie, an empty line after
a query.  This serves nicely to break up the existing examples such as
information_schema.sql and system_views.sql.  A limitation is that it's
no longer possible to write such a sequence within a string literal or
multiline comment in a file meant to be read with -j; but there are no
instances of such a problem within the data currently used by initdb.
(If someone does make such a mistake in future, it'll be obvious because
they'll get an unterminated-literal or unterminated-comment syntax error.)
Other than that, there shouldn't be any negative consequences; you're not
forced to end statements that way, it's just a better idea in most cases.

In passing, remove src/include/tcop/tcopdebug.h, which is dead code
because it's not included anywhere, and hasn't been for more than
ten years.  One of the debug-support symbols it purported to describe
has been unreferenced for at least the same amount of time, and the
other is removed by this commit on the grounds that it was useless:
forcing -j mode all the time would have broken initdb.  The lack of
complaints about that, or about the missing inclusion, shows that
no one has tried to use TCOP_DONTUSENEWLINE in many years.
2015-12-17 19:34:15 -05:00
Alvaro Herrera 756e7b4c9d Rework internals of changing a type's ownership
This is necessary so that REASSIGN OWNED does the right thing with
composite types, to wit, that it also alters ownership of the type's
pg_class entry -- previously, the pg_class entry remained owned by the
original user, which caused later other failures such as the new owner's
inability to use ALTER TYPE to rename an attribute of the affected
composite.  Also, if the original owner is later dropped, the pg_class
entry becomes owned by a non-existant user which is bogus.

To fix, create a new routine AlterTypeOwner_oid which knows whether to
pass the request to ATExecChangeOwner or deal with it directly, and use
that in shdepReassignOwner rather than calling AlterTypeOwnerInternal
directly.  AlterTypeOwnerInternal is now simpler in that it only
modifies the pg_type entry and recurses to handle a possible array type;
higher-level tasks are handled by either AlterTypeOwner directly or
AlterTypeOwner_oid.

I took the opportunity to add a few more objects to the test rig for
REASSIGN OWNED, so that more cases are exercised.  Additional ones could
be added for superuser-only-ownable objects (such as FDWs and event
triggers) but I didn't want to push my luck by adding a new superuser to
the tests on a backpatchable bug fix.

Per bug #13666 reported by Chris Pacejo.

Backpatch to 9.5.

(I would back-patch this all the way back, except that it doesn't apply
cleanly in 9.4 and earlier because 59367fdf9 wasn't backpatched.  If we
decide that we need this in earlier branches too, we should backpatch
both.)
2015-12-17 14:25:41 -03:00
Robert Haas b648b70342 Speed up CREATE INDEX CONCURRENTLY's TID sort.
Encode TIDs as 64-bit integers to speed up comparisons.  This seems to
speed things up on all platforms, but is even more beneficial when
8-byte integers are passed by value.

Peter Geoghegan.  Design suggestions and review by Tom Lane.  Review
also by Simon Riggs and by me.
2015-12-16 15:23:45 -05:00
Robert Haas f27a6b15e6 Mark CHECK constraints declared NOT VALID valid if created with table.
FOREIGN KEY constraints have behaved this way for a long time, but for
some reason the behavior of CHECK constraints has been inconsistent up
until now.

Amit Langote and Amul Sul, with assorted tweaks by me.
2015-12-16 07:43:56 -05:00
Alvaro Herrera 8c1615531f For REASSIGN OWNED for foreign user mappings
As reported in bug #13809 by Alexander Ashurkov, the code for REASSIGN
OWNED hadn't gotten word about user mappings.  Deal with them in the
same way default ACLs do, which is to ignore them altogether; they are
handled just fine by DROP OWNED.  The other foreign object cases are
already handled correctly by both commands.

Also add a REASSIGN OWNED statement to foreign_data test to exercise the
foreign data objects.  (The changes are just before the "cleanup" phase,
so it shouldn't remove any existing live test.)

Reported by Alexander Ashurkov, then independently by Jaime Casanova.
2015-12-11 18:39:09 -03:00
Stephen Frost 833728d4c8 Handle policies during DROP OWNED BY
DROP OWNED BY handled GRANT-based ACLs but was not removing roles from
policies.  Fix that by having DROP OWNED BY remove the role specified
from the list of roles the policy (or policies) apply to, or the entire
policy (or policies) if it only applied to the role specified.

As with ACLs, the DROP OWNED BY caller must have permission to modify
the policy or a WARNING is thrown and no change is made to the policy.
2015-12-11 16:12:25 -05:00
Peter Eisentraut a351705d8a Improve some messages 2015-12-10 22:05:27 -05:00
Tom Lane 074c5cfbfb Fix handling of inherited check constraints in ALTER COLUMN TYPE (again).
The previous way of reconstructing check constraints was to do a separate
"ALTER TABLE ONLY tab ADD CONSTRAINT" for each table in an inheritance
hierarchy.  However, that way has no hope of reconstructing the check
constraints' own inheritance properties correctly, as pointed out in
bug #13779 from Jan Dirk Zijlstra.  What we should do instead is to do
a regular "ALTER TABLE", allowing recursion, at the topmost table that
has a particular constraint, and then suppress the work queue entries
for inherited instances of the constraint.

Annoyingly, we'd tried to fix this behavior before, in commit 5ed6546cf,
but we failed to notice that it wasn't reconstructing the pg_constraint
field values correctly.

As long as I'm touching pg_get_constraintdef_worker anyway, tweak it to
always schema-qualify the target table name; this seems like useful backup
to the protections installed by commit 5f173040.

In HEAD/9.5, get rid of get_constraint_relation_oids, which is now unused.
(I could alternatively have modified it to also return conislocal, but that
seemed like a pretty single-purpose API, so let's not pretend it has some
other use.)  It's unused in the back branches as well, but I left it in
place just in case some third-party code has decided to use it.

In HEAD/9.5, also rename pg_get_constraintdef_string to
pg_get_constraintdef_command, as the previous name did nothing to explain
what that entry point did differently from others (and its comment was
equally useless).  Again, that change doesn't seem like material for
back-patching.

I did a bit of re-pgindenting in tablecmds.c in HEAD/9.5, as well.

Otherwise, back-patch to all supported branches.
2015-11-20 14:55:47 -05:00
Robert Haas fea2b642fd Remove numbers from incorrectly-numbered list.
Reported by Andres Freund.
2015-11-19 16:45:13 -05:00
Robert Haas bc4996e61b Make ALTER .. SET SCHEMA do nothing, instead of throwing an ERROR.
This was already true for CREATE EXTENSION, but historically has not
been true for other object types.  Therefore, this is a backward
incompatibility.  Per discussion on pgsql-hackers, everyone seems to
agree that the new behavior is better.

Marti Raudsepp, reviewed by Haribabu Kommi and myself
2015-11-19 10:49:25 -05:00
Peter Eisentraut 5db837d3f2 Message improvements 2015-11-16 21:39:23 -05:00
Peter Eisentraut a8d585c091 Message style improvements
Message style, plurals, quoting, spelling, consistency with similar
messages
2015-10-28 20:38:36 -04:00
Stephen Frost 088c83363a ALTER TABLE .. FORCE ROW LEVEL SECURITY
To allow users to force RLS to always be applied, even for table owners,
add ALTER TABLE .. FORCE ROW LEVEL SECURITY.

row_security=off overrides FORCE ROW LEVEL SECURITY, to ensure pg_dump
output is complete (by default).

Also add SECURITY_NOFORCE_RLS context to avoid data corruption when
ALTER TABLE .. FORCE ROW SECURITY is being used. The
SECURITY_NOFORCE_RLS security context is used only during referential
integrity checks and is only considered in check_enable_rls() after we
have already checked that the current user is the owner of the relation
(which should always be the case during referential integrity checks).

Back-patch to 9.5 where RLS was added.
2015-10-04 21:05:08 -04:00
Robert Haas 7aea8e4f2d Determine whether it's safe to attempt a parallel plan for a query.
Commit 924bcf4f16 introduced a framework
for parallel computation in PostgreSQL that makes most but not all
built-in functions safe to execute in parallel mode.  In order to have
parallel query, we'll need to be able to determine whether that query
contains functions (either built-in or user-defined) that cannot be
safely executed in parallel mode.  This requires those functions to be
labeled, so this patch introduces an infrastructure for that.  Some
functions currently labeled as safe may need to be revised depending on
how pending issues related to heavyweight locking under paralllelism
are resolved.

Parallel plans can't be used except for the case where the query will
run to completion.  If portal execution were suspended, the parallel
mode restrictions would need to remain in effect during that time, but
that might make other queries fail.  Therefore, this patch introduces
a framework that enables consideration of parallel plans only when it
is known that the plan will be run to completion.  This probably needs
some refinement; for example, at bind time, we do not know whether a
query run via the extended protocol will be execution to completion or
run with a limited fetch count.  Having the client indicate its
intentions at bind time would constitute a wire protocol break.  Some
contexts in which parallel mode would be safe are not adjusted by this
patch; the default is not to try parallel plans except from call sites
that have been updated to say that such plans are OK.

This commit doesn't introduce any parallel paths or plans; it just
provides a way to determine whether they could potentially be used.
I'm committing it on the theory that the remaining parallel sequential
scan patches will also get committed to this release, hopefully in the
not-too-distant future.

Robert Haas and Amit Kapila.  Reviewed (in earlier versions) by Noah
Misch.
2015-09-16 15:38:47 -04:00
Peter Eisentraut b2ae8f1e35 Update SQL features list 2015-09-12 00:08:18 -04:00
Andres Freund 6fcd88511f Allow pg_create_physical_replication_slot() to reserve WAL.
When creating a physical slot it's often useful to immediately reserve
the current WAL position instead of only doing after the first feedback
message arrives. That e.g. allows slots to guarantee that all the WAL
for a base backup will be available afterwards.

Logical slots already have to reserve WAL during creation, so generalize
that logic into being usable for both physical and logical slots.

Catversion bump because of the new parameter.

Author: Gurjeet Singh
Reviewed-By: Andres Freund
Discussion: CABwTF4Wh_dBCzTU=49pFXR6coR4NW1ynb+vBqT+Po=7fuq5iCw@mail.gmail.com
2015-08-11 12:34:31 +02:00
Andres Freund 3f811c2d6f Add confirmed_flush column to pg_replication_slots.
There's no reason not to expose both restart_lsn and confirmed_flush
since they have rather distinct meanings. The former is the oldest WAL
still required and valid for both physical and logical slots, whereas
the latter is the location up to which a logical slot's consumer has
confirmed receiving data. Most of the time a slot will require older
WAL (i.e. restart_lsn) than the confirmed
position (i.e. confirmed_flush_lsn).

Author: Marko Tiikkaja, editorialized by me
Discussion: 559D110B.1020109@joh.to
2015-08-10 13:28:18 +02:00
Alvaro Herrera 2834855cb9 Fix BRIN to use SnapshotAny during summarization
For correctness of summarization results, it is critical that the
snapshot used during the summarization scan is able to see all tuples
that are live to all transactions -- including tuples inserted or
deleted by in-progress transactions.  Otherwise, it would be possible
for a transaction to insert a tuple, then idle for a long time while a
concurrent transaction executes summarization of the range: this would
result in the inserted value not being considered in the summary.
Previously we were trying to use a MVCC snapshot in conjunction with
adding a "placeholder" tuple in the index: the snapshot would see all
committed tuples, and the placeholder tuple would catch insertions by
any new inserters.  The hole is that prior insertions by transactions
that are still in progress by the time the MVCC snapshot was taken were
ignored.

Kevin Grittner reported this as a bogus error message during vacuum with
default transaction isolation mode set to repeatable read (because the
error report mentioned a function name not being invoked during), but
the problem is larger than that.

To fix, tweak IndexBuildHeapRangeScan to have a new mode that behaves
the way we need using SnapshotAny visibility rules.  This change
simplifies the BRIN code a bit, mainly by removing large comments that
were mistaken.  Instead, rely on the SnapshotAny semantics to provide
what it needs.  (The business about a placeholder tuple needs to remain:
that covers the case that a transaction inserts a a tuple in a page that
summarization already scanned.)

Discussion: https://www.postgresql.org/message-id/20150731175700.GX2441@postgresql.org

In passing, remove a couple of unused declarations from brin.h and
reword a comment to be proper English.  This part submitted by Kevin
Grittner.

Backpatch to 9.5, where BRIN was introduced.
2015-08-05 16:20:50 -03:00
Joe Conway f781a0f1d8 Create a pg_shdepend entry for each role in TO clause of policies.
CreatePolicy() and AlterPolicy() omit to create a pg_shdepend entry for
each role in the TO clause. Fix this by creating a new shared dependency
type called SHARED_DEPENDENCY_POLICY and assigning it to each role.

Reported by Noah Misch. Patch by me, reviewed by Alvaro Herrera.
Back-patch to 9.5 where RLS was introduced.
2015-07-28 16:01:53 -07:00
Joe Conway 7b4bfc87d5 Plug RLS related information leak in pg_stats view.
The pg_stats view is supposed to be restricted to only show rows
about tables the user can read. However, it sometimes can leak
information which could not otherwise be seen when row level security
is enabled. Fix that by not showing pg_stats rows to users that would
be subject to RLS on the table the row is related to. This is done
by creating/using the newly introduced SQL visible function,
row_security_active().

Along the way, clean up three call sites of check_enable_rls(). The second
argument of that function should only be specified as other than
InvalidOid when we are checking as a different user than the current one,
as in when querying through a view. These sites were passing GetUserId()
instead of InvalidOid, which can cause the function to return incorrect
results if the current user has the BYPASSRLS privilege and row_security
has been set to OFF.

Additionally fix a bug causing RI Trigger error messages to unintentionally
leak information when RLS is enabled, and other minor cleanup and
improvements. Also add WITH (security_barrier) to the definition of pg_stats.

Bumped CATVERSION due to new SQL functions and pg_stats view definition.

Back-patch to 9.5 where RLS was introduced. Reported by Yaroslav.
Patch by Joe Conway and Dean Rasheed with review and input by
Michael Paquier and Stephen Frost.
2015-07-28 13:21:22 -07:00
Tom Lane dd7a8f66ed Redesign tablesample method API, and do extensive code review.
The original implementation of TABLESAMPLE modeled the tablesample method
API on index access methods, which wasn't a good choice because, without
specialized DDL commands, there's no way to build an extension that can
implement a TSM.  (Raw inserts into system catalogs are not an acceptable
thing to do, because we can't undo them during DROP EXTENSION, nor will
pg_upgrade behave sanely.)  Instead adopt an API more like procedural
language handlers or foreign data wrappers, wherein the only SQL-level
support object needed is a single handler function identified by having
a special return type.  This lets us get rid of the supporting catalog
altogether, so that no custom DDL support is needed for the feature.

Adjust the API so that it can support non-constant tablesample arguments
(the original coding assumed we could evaluate the argument expressions at
ExecInitSampleScan time, which is undesirable even if it weren't outright
unsafe), and discourage sampling methods from looking at invisible tuples.
Make sure that the BERNOULLI and SYSTEM methods are genuinely repeatable
within and across queries, as required by the SQL standard, and deal more
honestly with methods that can't support that requirement.

Make a full code-review pass over the tablesample additions, and fix
assorted bugs, omissions, infelicities, and cosmetic issues (such as
failure to put the added code stanzas in a consistent ordering).
Improve EXPLAIN's output of tablesample plans, too.

Back-patch to 9.5 so that we don't have to support the original API
in production.
2015-07-25 14:39:00 -04:00
Tom Lane 434873806a Fix some oversights in BRIN patch.
Remove HeapScanDescData.rs_initblock, which wasn't being used for anything
in the final version of the patch.

Fix IndexBuildHeapScan so that it supports syncscan again; the patch
broke synchronous scanning for index builds by forcing rs_startblk
to zero even when the caller did not care about that and had asked
for syncscan.

Add some commentary and usage defenses to heap_setscanlimits().

Fix heapam so that asking for rs_numblocks == 0 does what you would
reasonably expect.  As coded it amounted to requesting a whole-table
scan, because those "--x <= 0" tests on an unsigned variable would
behave surprisingly.
2015-07-21 13:38:24 -04:00
Alvaro Herrera 149b1dd840 Fix omission of OCLASS_TRANSFORM in object_classes[]
This was forgotten in cac7658205 (and its fixup ad89a5d115).  Since it
seems way too easy to miss this, this commit also introduces a mechanism
to enforce that the array is consistent with the enum.

Problem reported independently by Robert Haas and Jaimin Pan.
Patches proposed by Jaimin Pan, Jim Nasby, Michael Paquier and myself,
though I didn't use any of these and instead went with a cleaner
approach suggested by Tom Lane.

Backpatch to 9.5.

Discussion:
https://www.postgresql.org/message-id/CA+Tgmoa6SgDaxW_n_7SEhwBAc=mniYga+obUj5fmw4rU9_mLvA@mail.gmail.com
https://www.postgresql.org/message-id/29788.1437411581@sss.pgh.pa.us
2015-07-21 13:20:53 +02:00
Tom Lane ac50f84866 Fix misuse of TextDatumGetCString().
"TextDatumGetCString(PG_GETARG_TEXT_P(x))" is formally wrong: a text*
is not a Datum.  Although this coding will accidentally fail to fail on
all known platforms, it risks leaking memory if a detoast step is needed,
unlike "TextDatumGetCString(PG_GETARG_DATUM(x))" which is what's used
elsewhere.  Make pg_get_object_address() fall in line with other uses.

Noted while reviewing two-arg current_setting() patch.
2015-07-02 17:02:08 -04:00
Alvaro Herrera ad89a5d115 Add transforms to pg_get_object_address and friends
This was missed when transforms were added by commit cac7658205.

Extracted from a larger patch
Author: Michael Paquier
2015-06-21 16:08:49 -03:00
Andrew Dunstan 37def42245 Rename jsonb_replace to jsonb_set and allow it to add new values
The function is given a fourth parameter, which defaults to true. When
this parameter is true, if the last element of the path is missing
in the original json, jsonb_set creates it in the result and assigns it
the new value. If it is false then the function does nothing unless all
elements of the path are present, including the last.

Based on some original code from Dmitry Dolgov, heavily modified by me.

Catalog version bumped.
2015-05-31 20:34:10 -04:00
Tom Lane 17b48a1a9f Rename pg_shdepend.c's typedef "objectType" to SharedDependencyObjectType.
The name objectType is widely used as a field name, and it's pure luck that
this conflict has not caused pgindent to go crazy before.  It messed up
pg_audit.c pretty good though.  Since pg_shdepend.c doesn't export this
typedef and only uses it in three places, changing that seems saner than
changing the field usages.

Back-patch because we're contemplating using the union of all branch
typedefs for future pgindent runs, so this won't fix anything if it
stays the same in back branches.
2015-05-24 13:03:45 -04:00
Bruce Momjian 807b9e0dff pgindent run for 9.5 2015-05-23 21:35:49 -04:00
Heikki Linnakangas 4fc72cc7bb Collection of typo fixes.
Use "a" and "an" correctly, mostly in comments. Two error messages were
also fixed (they were just elogs, so no translation work required). Two
function comments in pg_proc.h were also fixed. Etsuro Fujita reported one
of these, but I found a lot more with grep.

Also fix a few other typos spotted while grepping for the a/an typos.
For example, "consists out of ..." -> "consists of ...". Plus a "though"/
"through" mixup reported by Euler Taveira.

Many of these typos were in old code, which would be nice to backpatch to
make future backpatching easier. But much of the code was new, and I didn't
feel like crafting separate patches for each branch. So no backpatching.
2015-05-20 16:56:22 +03:00
Peter Eisentraut 0779f2ba2d Fix parse tree of DROP TRANSFORM and COMMENT ON TRANSFORM
The plain C string language name needs to be wrapped in makeString() so
that the parse tree is copyable.  This is detectable by
-DCOPY_PARSE_PLAN_TREES.  Add a test case for the COMMENT case.

Also make the quoting in the error messages more consistent.

discovered by Tom Lane
2015-05-18 22:55:14 -04:00
Andres Freund f3d3118532 Support GROUPING SETS, CUBE and ROLLUP.
This SQL standard functionality allows to aggregate data by different
GROUP BY clauses at once. Each grouping set returns rows with columns
grouped by in other sets set to NULL.

This could previously be achieved by doing each grouping as a separate
query, conjoined by UNION ALLs. Besides being considerably more concise,
grouping sets will in many cases be faster, requiring only one scan over
the underlying data.

The current implementation of grouping sets only supports using sorting
for input. Individual sets that share a sort order are computed in one
pass. If there are sets that don't share a sort order, additional sort &
aggregation steps are performed. These additional passes are sourced by
the previous sort step; thus avoiding repeated scans of the source data.

The code is structured in a way that adding support for purely using
hash aggregation or a mix of hashing and sorting is possible. Sorting
was chosen to be supported first, as it is the most generic method of
implementation.

Instead of, as in an earlier versions of the patch, representing the
chain of sort and aggregation steps as full blown planner and executor
nodes, all but the first sort are performed inside the aggregation node
itself. This avoids the need to do some unusual gymnastics to handle
having to return aggregated and non-aggregated tuples from underlying
nodes, as well as having to shut down underlying nodes early to limit
memory usage.  The optimizer still builds Sort/Agg node to describe each
phase, but they're not part of the plan tree, but instead additional
data for the aggregation node. They're a convenient and preexisting way
to describe aggregation and sorting.  The first (and possibly only) sort
step is still performed as a separate execution step. That retains
similarity with existing group by plans, makes rescans fairly simple,
avoids very deep plans (leading to slow explains) and easily allows to
avoid the sorting step if the underlying data is sorted by other means.

A somewhat ugly side of this patch is having to deal with a grammar
ambiguity between the new CUBE keyword and the cube extension/functions
named cube (and rollup). To avoid breaking existing deployments of the
cube extension it has not been renamed, neither has cube been made a
reserved keyword. Instead precedence hacking is used to make GROUP BY
cube(..) refer to the CUBE grouping sets feature, and not the function
cube(). To actually group by a function cube(), unlikely as that might
be, the function name has to be quoted.

Needs a catversion bump because stored rules may change.

Author: Andrew Gierth and Atri Sharma, with contributions from Andres Freund
Reviewed-By: Andres Freund, Noah Misch, Tom Lane, Svenne Krap, Tomas
    Vondra, Erik Rijkers, Marti Raudsepp, Pavel Stehule
Discussion: CAOeZVidmVRe2jU6aMk_5qkxnB7dfmPROzM7Ur8JPW5j8Y5X-Lw@mail.gmail.com
2015-05-16 03:46:31 +02:00
Simon Riggs 1e98fa0bf8 SQLStandard feature T613 Sampling now Supported 2015-05-15 15:51:31 -04:00
Simon Riggs f6d208d6e5 TABLESAMPLE, SQL Standard and extensible
Add a TABLESAMPLE clause to SELECT statements that allows
user to specify random BERNOULLI sampling or block level
SYSTEM sampling. Implementation allows for extensible
sampling functions to be written, using a standard API.
Basic version follows SQLStandard exactly. Usable
concrete use cases for the sampling API follow in later
commits.

Petr Jelinek

Reviewed by Michael Paquier and Simon Riggs
2015-05-15 14:37:10 -04:00
Fujii Masao ecd222e770 Support VERBOSE option in REINDEX command.
When this option is specified, a progress report is printed as each index
is reindexed.

Per discussion, we agreed on the following syntax for the extensibility of
the options.

    REINDEX (flexible options) { INDEX | ... } name

Sawada Masahiko.
Reviewed by Robert Haas, Fabrízio Mello, Alvaro Herrera, Kyotaro Horiguchi,
Jim Nasby and me.

Discussion: CAD21AoA0pK3YcOZAFzMae+2fcc3oGp5zoRggDyMNg5zoaWDhdQ@mail.gmail.com
2015-05-15 20:09:57 +09:00
Peter Eisentraut d02f16470f Replace some appendStringInfo* calls with more appropriate variants
Author: David Rowley <dgrowleyml@gmail.com>
2015-05-11 20:38:55 -04:00
Alvaro Herrera b488c580ae Allow on-the-fly capture of DDL event details
This feature lets user code inspect and take action on DDL events.
Whenever a ddl_command_end event trigger is installed, DDL actions
executed are saved to a list which can be inspected during execution of
a function attached to ddl_command_end.

The set-returning function pg_event_trigger_ddl_commands can be used to
list actions so captured; it returns data about the type of command
executed, as well as the affected object.  This is sufficient for many
uses of this feature.  For the cases where it is not, we also provide a
"command" column of a new pseudo-type pg_ddl_command, which is a
pointer to a C structure that can be accessed by C code.  The struct
contains all the info necessary to completely inspect and even
reconstruct the executed command.

There is no actual deparse code here; that's expected to come later.
What we have is enough infrastructure that the deparsing can be done in
an external extension.  The intention is that we will add some deparsing
code in a later release, as an in-core extension.

A new test module is included.  It's probably insufficient as is, but it
should be sufficient as a starting point for a more complete and
future-proof approach.

Authors: Álvaro Herrera, with some help from Andres Freund, Ian Barwick,
Abhijit Menon-Sen.

Reviews by Andres Freund, Robert Haas, Amit Kapila, Michael Paquier,
Craig Ringer, David Steele.
Additional input from Chris Browne, Dimitri Fontaine, Stephen Frost,
Petr Jelínek, Tom Lane, Jim Nasby, Steven Singer, Pavel Stěhule.

Based on original work by Dimitri Fontaine, though I didn't use his
code.

Discussion:
  https://www.postgresql.org/message-id/m2txrsdzxa.fsf@2ndQuadrant.fr
  https://www.postgresql.org/message-id/20131108153322.GU5809@eldon.alvh.no-ip.org
  https://www.postgresql.org/message-id/20150215044814.GL3391@alvh.no-ip.org
2015-05-11 19:14:31 -03:00
Andrew Dunstan cb9fa802b3 Add new OID alias type regnamespace
Catalog version bumped

Kyotaro HORIGUCHI
2015-05-09 13:36:52 -04:00
Andrew Dunstan 0c90f6769d Add new OID alias type regrole
The new type has the scope of whole the database cluster so it doesn't
behave the same as the existing OID alias types which have database
scope,
concerning object dependency. To avoid confusion constants of the new
type are prohibited from appearing where dependencies are made involving
it.

Also, add a note to the docs about possible MVCC violation and
optimization issues, which are general over the all reg* types.

Kyotaro Horiguchi
2015-05-09 13:06:49 -04:00
Stephen Frost a97e0c3354 Add pg_file_settings view and function
The function and view added here provide a way to look at all settings
in postgresql.conf, any #include'd files, and postgresql.auto.conf
(which is what backs the ALTER SYSTEM command).

The information returned includes the configuration file name, line
number in that file, sequence number indicating when the parameter is
loaded (useful to see if it is later masked by another definition of the
same parameter), parameter name, and what it is set to at that point.
This information is updated on reload of the server.

This is unfiltered, privileged, information and therefore access is
restricted to superusers through the GRANT system.

Author: Sawada Masahiko, various improvements by me.
Reviewers: David Steele
2015-05-08 19:09:26 -04:00
Andres Freund 168d5805e4 Add support for INSERT ... ON CONFLICT DO NOTHING/UPDATE.
The newly added ON CONFLICT clause allows to specify an alternative to
raising a unique or exclusion constraint violation error when inserting.
ON CONFLICT refers to constraints that can either be specified using a
inference clause (by specifying the columns of a unique constraint) or
by naming a unique or exclusion constraint.  DO NOTHING avoids the
constraint violation, without touching the pre-existing row.  DO UPDATE
SET ... [WHERE ...] updates the pre-existing tuple, and has access to
both the tuple proposed for insertion and the existing tuple; the
optional WHERE clause can be used to prevent an update from being
executed.  The UPDATE SET and WHERE clauses have access to the tuple
proposed for insertion using the "magic" EXCLUDED alias, and to the
pre-existing tuple using the table name or its alias.

This feature is often referred to as upsert.

This is implemented using a new infrastructure called "speculative
insertion". It is an optimistic variant of regular insertion that first
does a pre-check for existing tuples and then attempts an insert.  If a
violating tuple was inserted concurrently, the speculatively inserted
tuple is deleted and a new attempt is made.  If the pre-check finds a
matching tuple the alternative DO NOTHING or DO UPDATE action is taken.
If the insertion succeeds without detecting a conflict, the tuple is
deemed inserted.

To handle the possible ambiguity between the excluded alias and a table
named excluded, and for convenience with long relation names, INSERT
INTO now can alias its target table.

Bumps catversion as stored rules change.

Author: Peter Geoghegan, with significant contributions from Heikki
    Linnakangas and Andres Freund. Testing infrastructure by Jeff Janes.
Reviewed-By: Heikki Linnakangas, Andres Freund, Robert Haas, Simon Riggs,
    Dean Rasheed, Stephen Frost and many others.
2015-05-08 05:43:10 +02:00
Robert Haas 924bcf4f16 Create an infrastructure for parallel computation in PostgreSQL.
This does four basic things.  First, it provides convenience routines
to coordinate the startup and shutdown of parallel workers.  Second,
it synchronizes various pieces of state (e.g. GUCs, combo CID
mappings, transaction snapshot) from the parallel group leader to the
worker processes.  Third, it prohibits various operations that would
result in unsafe changes to that state while parallelism is active.
Finally, it propagates events that would result in an ErrorResponse,
NoticeResponse, or NotifyResponse message being sent to the client
from the parallel workers back to the master, from which they can then
be sent on to the client.

Robert Haas, Amit Kapila, Noah Misch, Rushabh Lathia, Jeevan Chalke.
Suggestions and review from Andres Freund, Heikki Linnakangas, Noah
Misch, Simon Riggs, Euler Taveira, and Jim Nasby.
2015-04-30 15:02:14 -04:00
Andres Freund 5aa2350426 Introduce replication progress tracking infrastructure.
When implementing a replication solution ontop of logical decoding, two
related problems exist:
* How to safely keep track of replication progress
* How to change replication behavior, based on the origin of a row;
  e.g. to avoid loops in bi-directional replication setups

The solution to these problems, as implemented here, consist out of
three parts:

1) 'replication origins', which identify nodes in a replication setup.
2) 'replication progress tracking', which remembers, for each
   replication origin, how far replay has progressed in a efficient and
   crash safe manner.
3) The ability to filter out changes performed on the behest of a
   replication origin during logical decoding; this allows complex
   replication topologies. E.g. by filtering all replayed changes out.

Most of this could also be implemented in "userspace", e.g. by inserting
additional rows contain origin information, but that ends up being much
less efficient and more complicated.  We don't want to require various
replication solutions to reimplement logic for this independently. The
infrastructure is intended to be generic enough to be reusable.

This infrastructure also replaces the 'nodeid' infrastructure of commit
timestamps. It is intended to provide all the former capabilities,
except that there's only 2^16 different origins; but now they integrate
with logical decoding. Additionally more functionality is accessible via
SQL.  Since the commit timestamp infrastructure has also been introduced
in 9.5 (commit 73c986add) changing the API is not a problem.

For now the number of origins for which the replication progress can be
tracked simultaneously is determined by the max_replication_slots
GUC. That GUC is not a perfect match to configure this, but there
doesn't seem to be sufficient reason to introduce a separate new one.

Bumps both catversion and wal page magic.

Author: Andres Freund, with contributions from Petr Jelinek and Craig Ringer
Reviewed-By: Heikki Linnakangas, Petr Jelinek, Robert Haas, Steve Singer
Discussion: 20150216002155.GI15326@awork2.anarazel.de,
    20140923182422.GA15776@alap3.anarazel.de,
    20131114172632.GE7522@alap2.anarazel.de
2015-04-29 19:30:53 +02:00
Andres Freund 6aab1f45ac Fix various typos and grammar errors in comments.
Author: Dmitriy Olshevskiy
Discussion: 553D00A6.4090205@bk.ru
2015-04-26 18:42:31 +02:00
Peter Eisentraut cac7658205 Add transforms feature
This provides a mechanism for specifying conversions between SQL data
types and procedural languages.  As examples, there are transforms
for hstore and ltree for PL/Perl and PL/Python.

reviews by Pavel Stěhule and Andres Freund
2015-04-26 10:33:14 -04:00
Andres Freund cef939c347 Rename pg_replication_slot's new active_in to active_pid.
In d811c037ce active_in was added but discussion since showed that
active_pid is preferred as a name.

Discussion: CAMsr+YFKgZca5_7_ouaMWxA5PneJC9LNViPzpDHusaPhU9pA7g@mail.gmail.com
2015-04-22 09:43:40 +02:00
Andres Freund d811c037ce Add 'active_in' column to pg_replication_slots.
Right now it is visible whether a replication slot is active in any
session, but not in which.  Adding the active_in column, containing the
pid of the backend having acquired the slot, makes it much easier to
associate pg_replication_slots entries with the corresponding
pg_stat_replication/pg_stat_activity row.

This should have been done from the start, but I (Andres) dropped the
ball there somehow.

Author: Craig Ringer, revised by me Discussion:
CAMsr+YFKgZca5_7_ouaMWxA5PneJC9LNViPzpDHusaPhU9pA7g@mail.gmail.com
2015-04-21 11:51:06 +02:00
Peter Eisentraut 30982be4e5 Integrate pg_upgrade_support module into backend
Previously, these functions were created in a schema "binary_upgrade",
which was deleted after pg_upgrade was finished.  Because we don't want
to keep that schema around permanently, move them to pg_catalog but
rename them with a binary_upgrade_... prefix.

The provided functions are only small wrappers around global variables
that were added specifically for pg_upgrade use, so keeping the module
separate does not create any modularity.

The functions still check that they are only called in binary upgrade
mode, so it is not possible to call these during normal operation.

Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
2015-04-14 19:26:37 -04:00
Magnus Hagander 9029f4b374 Add system view pg_stat_ssl
This view shows information about all connections, such as if the
connection is using SSL, which cipher is used, and which client
certificate (if any) is used.

Reviews by Alex Shulgin, Heikki Linnakangas, Andres Freund & Michael Paquier
2015-04-12 19:07:46 +02:00
Andres Freund 06d36fa40c Fix typo in eb68379c3.
I'd accidentally missed to rename PG_FORCE_NULL to BKI_FORCE_NULL in one
place.

Author: Jeevan Chalke
Discussion: CAM2+6=VPoow5PqgqiTjPX4QNeokb7op8aD_8Zg3QnHZMvvU0GQ@mail.gmail.com
2015-04-09 13:29:22 +02:00
Fujii Masao 026fafde91 Fix typo in comment. 2015-04-08 20:55:43 +09:00
Alvaro Herrera e9a077cad3 pg_event_trigger_dropped_objects: add is_temp column
It now also reports temporary objects dropped that are local to the
backend.  Previously we weren't reporting any temp objects because it
was deemed unnecessary; but as it turns out, it is necessary if we want
to keep close track of DDL command execution inside one session.  Temp
objects are reported as living in schema pg_temp, which works because
such a schema-qualification always refers to the temp objects of the
current session.
2015-04-06 11:40:55 -03:00
Alvaro Herrera 70dc2db7f1 Fix object identities for pg_conversion objects
This was already fixed in 0d906798f, but I failed to update the
array-formatted case.  This is not backpatched, since this only affects
the code path introduced by commit a676201490.
2015-04-06 11:15:13 -03:00
Alvaro Herrera bdc3d7fa23 Return ObjectAddress in many ALTER TABLE sub-routines
Since commit a2e35b53c3, most CREATE and ALTER commands return the
ObjectAddress of the affected object.  This is useful for event triggers
to try to figure out exactly what happened.  This patch extends this
idea a bit further to cover ALTER TABLE as well: an auxiliary
ObjectAddress is returned for each of several subcommands of ALTER
TABLE.  This makes it possible to decode with precision what happened
during execution of any ALTER TABLE command; for instance, which
constraint was added by ALTER TABLE ADD CONSTRAINT, or which parent got
dropped from the parents list by ALTER TABLE NO INHERIT.

As with the previous patch, there is no immediate user-visible change
here.

This is all really just continuing what c504513f83 started.

Reviewed by Stephen Frost.
2015-03-25 17:17:56 -03:00
Alvaro Herrera b3196e65f5 Fix bug for array-formatted identities of user mappings
I failed to realize that server names reported in the object args array
would get quoted, which is wrong; remove that, making sure that it's
only quoted in the string-formatted identity.

This bug was introduced by my commit cf34e373, which was backpatched,
but since object name/args arrays are new in commit a676201490, there
is no need to backpatch this any further.
2015-03-25 14:28:34 -03:00
Tom Lane cb1ca4d800 Allow foreign tables to participate in inheritance.
Foreign tables can now be inheritance children, or parents.  Much of the
system was already ready for this, but we had to fix a few things of
course, mostly in the area of planner and executor handling of row locks.

As side effects of this, allow foreign tables to have NOT VALID CHECK
constraints (and hence to accept ALTER ... VALIDATE CONSTRAINT), and to
accept ALTER SET STORAGE and ALTER SET WITH/WITHOUT OIDS.  Continuing to
disallow these things would've required bizarre and inconsistent special
cases in inheritance behavior.  Since foreign tables don't enforce CHECK
constraints anyway, a NOT VALID one is a complete no-op, but that doesn't
mean we shouldn't allow it.  And it's possible that some FDWs might have
use for SET STORAGE or SET WITH OIDS, though doubtless they will be no-ops
for most.

An additional change in support of this is that when a ModifyTable node
has multiple target tables, they will all now be explicitly identified
in EXPLAIN output, for example:

 Update on pt1  (cost=0.00..321.05 rows=3541 width=46)
   Update on pt1
   Foreign Update on ft1
   Foreign Update on ft2
   Update on child3
   ->  Seq Scan on pt1  (cost=0.00..0.00 rows=1 width=46)
   ->  Foreign Scan on ft1  (cost=100.00..148.03 rows=1170 width=46)
   ->  Foreign Scan on ft2  (cost=100.00..148.03 rows=1170 width=46)
   ->  Seq Scan on child3  (cost=0.00..25.00 rows=1200 width=46)

This was done mainly to provide an unambiguous place to attach "Remote SQL"
fields, but it is useful for inherited updates even when no foreign tables
are involved.

Shigeru Hanada and Etsuro Fujita, reviewed by Ashutosh Bapat and Kyotaro
Horiguchi, some additional hacking by me
2015-03-22 13:53:21 -04:00
Bruce Momjian 1c7087af42 Add TOAST table to pg_shseclabel for long label use
Report by Andres Freund
2015-03-21 22:14:49 -04:00
Alvaro Herrera a190738457 Fix out-of-array-bounds compiler warning
Since the array length check is using a post-increment operator, the
compiler complains that there's a potential write to one element beyond
the end of the array.  This is not possible currently: the only path to
this function is through pg_get_object_address(), which already verifies
that the input array is no more than two elements in length.  Still, a
bug is a bug.

No idea why my compiler doesn't complain about this ...

Pointed out by Dead Rasheed and Peter Eisentraut
2015-03-16 22:35:45 -03:00
Alvaro Herrera a61fd5334e Support opfamily members in get_object_address
In the spirit of 890192e99a and 4464303405f: have get_object_address
understand individual pg_amop and pg_amproc objects.  There is no way to
refer to such objects directly in the grammar -- rather, they are almost
always considered an integral part of the opfamily that contains them.
(The only case that deals with them individually is ALTER OPERATOR
FAMILY ADD/DROP, which carries the opfamily address separately and thus
does not need it to be part of each added/dropped element's address.)
In event triggers it becomes possible to become involved with individual
amop/amproc elements, and this commit enables pg_get_object_address to
do so as well.

To make the overall coding simpler, this commit also slightly changes
the get_object_address representation for opclasses and opfamilies:
instead of having the AM name in the objargs array, I moved it as the
first element of the objnames array.  This enables the new code to use
objargs for the type names used by pg_amop and pg_amproc.

Reviewed by: Stephen Frost
2015-03-16 12:06:34 -03:00
Alvaro Herrera 4464303405 Support default ACLs in get_object_address
In the spirit of 890192e99a, this time add support for the things
living in the pg_default_acl catalog.  These are not really "objects",
but they show up as such in event triggers.

There is no "DROP DEFAULT PRIVILEGES" or similar command, so it doesn't
look like the new representation given would be useful anywhere else, so
I didn't try to use it outside objectaddress.c.  (That might be a bug in
itself, but that would be material for another commit.)

Reviewed by Stephen Frost.
2015-03-11 19:23:47 -03:00
Alvaro Herrera 890192e99a Support user mappings in get_object_address
Since commit 72dd233d3e we were trying to obtain object addressing
information in sql_drop event triggers, but that caused failures when
the drops involved user mappings.  This addition enables that to work
again.  Naturally, pg_get_object_address can work with these objects
now, too.

I toyed with the idea of removing DropUserMappingStmt as a node and
using DropStmt instead in the DropUserMappingStmt grammar production,
but that didn't go very well: for one thing the messages thrown by the
specific code are specialized (you get "server not found" if you specify
the wrong server, instead of a generic "user mapping for ... not found"
which you'd get it we were to merge this with RemoveObjects --- unless
we added even more special cases).  For another thing, it would require
to pass RoleSpec nodes through the objname/objargs representation used
by RemoveObjects, which works in isolation, but gets messy when
pg_get_object_address is involved.  So I dropped this part for now.

Reviewed by Stephen Frost.
2015-03-11 17:04:27 -03:00
Alvaro Herrera e3f1c24b99 Fix crasher bugs in previous commit
ALTER DEFAULT PRIVILEGES was trying to decode the list of roles in the
FOR clause as a list of names rather than of RoleSpecs; and the IN
clause in CREATE ROLE was doing the same thing.  This was evidenced by
crashes on some buildfarm machines, though on my platform this doesn't
cause a failure by mere chance; I can reproduce the failures only by
adding some padding in struct RoleSpecs.

Fix by dereferencing those lists as being of RoleSpecs, not string
Values.
2015-03-09 17:00:43 -03:00
Alvaro Herrera 31eae6028e Allow CURRENT/SESSION_USER to be used in certain commands
Commands such as ALTER USER, ALTER GROUP, ALTER ROLE, GRANT, and the
various ALTER OBJECT / OWNER TO, as well as ad-hoc clauses related to
roles such as the AUTHORIZATION clause of CREATE SCHEMA, the FOR clause
of CREATE USER MAPPING, and the FOR ROLE clause of ALTER DEFAULT
PRIVILEGES can now take the keywords CURRENT_USER and SESSION_USER as
user specifiers in place of an explicit user name.

This commit also fixes some quite ugly handling of special standards-
mandated syntax in CREATE USER MAPPING, which in particular would fail
to work in presence of a role named "current_user".

The special role specifiers PUBLIC and NONE also have more consistent
handling now.

Also take the opportunity to add location tracking to user specifiers.

Authors: Kyotaro Horiguchi.  Heavily reworked by Álvaro Herrera.
Reviewed by: Rushabh Lathia, Adam Brightwell, Marti Raudsepp.
2015-03-09 15:41:54 -03:00
Peter Eisentraut bb8582abf3 Remove rolcatupdate
This role attribute is an ancient PostgreSQL feature, but could only be
set by directly updating the system catalogs, and it doesn't have any
clearly defined use.

Author: Adam Brightwell <adam.brightwell@crunchydatasolutions.com>
2015-03-06 23:42:38 -05:00
Alvaro Herrera cf34e373fc Fix user mapping object description
We were using "user mapping for user XYZ" as description for user mappings, but
that's ambiguous because users can have mappings on multiple foreign
servers; therefore change it to "for user XYZ on server UVW" instead.
Object identities for user mappings are also updated in the same way, in
branches 9.3 and above.

The incomplete description string was introduced together with the whole
SQL/MED infrastructure by commit cae565e503 of 8.4 era, so backpatch all
the way back.
2015-03-05 18:03:16 -03:00
Alvaro Herrera bf22d2707a Silence warning in non-assert-enabled build
An OID return value was being used only for a (rather pointless) assert.
Silence by removing the variable and the assert.

Per note from Peter Geoghegan
2015-03-05 15:38:37 -03:00
Alvaro Herrera a2e35b53c3 Change many routines to return ObjectAddress rather than OID
The changed routines are mostly those that can be directly called by
ProcessUtilitySlow; the intention is to make the affected object
information more precise, in support for future event trigger changes.
Originally it was envisioned that the OID of the affected object would
be enough, and in most cases that is correct, but upon actually
implementing the event trigger changes it turned out that ObjectAddress
is more widely useful.

Additionally, some command execution routines grew an output argument
that's an object address which provides further info about the executed
command.  To wit:

* for ALTER DOMAIN / ADD CONSTRAINT, it corresponds to the address of
  the new constraint

* for ALTER OBJECT / SET SCHEMA, it corresponds to the address of the
  schema that originally contained the object.

* for ALTER EXTENSION {ADD, DROP} OBJECT, it corresponds to the address
  of the object added to or dropped from the extension.

There's no user-visible change in this commit, and no functional change
either.

Discussion: 20150218213255.GC6717@tamriel.snowman.net
Reviewed-By: Stephen Frost, Andres Freund
2015-03-03 14:10:50 -03:00
Alvaro Herrera 6f9d799047 Add comment for "is_internal" parameter
This was missed in my commit f4c4335 of 9.3 vintage, so backpatch to
that.
2015-03-03 14:05:05 -03:00
Andres Freund eb68379c38 Allow forcing nullness of columns during bootstrap.
Bootstrap determines whether a column is null based on simple builtin
rules. Those work surprisingly well, but nonetheless a few existing
columns aren't set correctly. Additionally there is at least one patch
sent to hackers where forcing the nullness of a column would be helpful.

The boostrap format has gained FORCE [NOT] NULL for this, which will be
emitted by genbki.pl when BKI_FORCE_(NOT_)?NULL is specified for a
column in a catalog header.

This patch doesn't change the marking of any existing columns.

Discussion: 20150215170014.GE15326@awork2.anarazel.de
2015-02-21 22:31:54 +01:00
Tom Lane e1a11d9311 Use FLEXIBLE_ARRAY_MEMBER for HeapTupleHeaderData.t_bits[].
This requires changing quite a few places that were depending on
sizeof(HeapTupleHeaderData), but it seems for the best.

Michael Paquier, some adjustments by me
2015-02-21 15:13:06 -05:00
Tom Lane 09d8d110a6 Use FLEXIBLE_ARRAY_MEMBER in a bunch more places.
Replace some bogus "x[1]" declarations with "x[FLEXIBLE_ARRAY_MEMBER]".
Aside from being more self-documenting, this should help prevent bogus
warnings from static code analyzers and perhaps compiler misoptimizations.

This patch is just a down payment on eliminating the whole problem, but
it gets rid of a lot of easy-to-fix cases.

Note that the main problem with doing this is that one must no longer rely
on computing sizeof(the containing struct), since the result would be
compiler-dependent.  Instead use offsetof(struct, lastfield).  Autoconf
also warns against spelling that offsetof(struct, lastfield[0]).

Michael Paquier, review and additional fixes by me.
2015-02-20 00:11:42 -05:00
Alvaro Herrera 9c7dd35019 Fix opclass/opfamily identity strings
The original representation uses "opcname for amname", which is good
enough; but if we replace "for" with "using", we can apply the returned
identity directly in a DROP command, as in

DROP OPERATOR CLASS opcname USING amname

This slightly simplifies code using object identities to programatically
execute commands on these kinds of objects.

Note backwards-incompatible change:
The previous representation dates back to 9.3 when object identities
were introduced by commit f8348ea3, but we don't want to change the
behavior on released branches unnecessarily and so this is not
backpatched.
2015-02-18 14:44:27 -03:00
Alvaro Herrera 0d906798f6 Fix object identities for pg_conversion objects
We were neglecting to schema-qualify them.

Backpatch to 9.3, where object identities were introduced as a concept
by commit f8348ea32e.
2015-02-18 14:28:11 -03:00
Heikki Linnakangas 57fe246890 Fix reference-after-free when waiting for another xact due to constraint.
If an insertion or update had to wait for another transaction to finish,
because there was another insertion with conflicting key in progress,
we would pass a just-free'd item pointer to XactLockTableWait().

All calls to XactLockTableWait() and MultiXactIdWait() had similar issues.
Some passed a pointer to a buffer in the buffer cache, after already
releasing the lock. The call in EvalPlanQualFetch had already released the
pin too. All but the call in execUtils.c would merely lead to reporting a
bogus ctid, however (or an assertion failure, if enabled).

All the callers that passed HeapTuple->t_data->t_ctid were slightly bogus
anyway: if the tuple was updated (again) in the same transaction, its ctid
field would point to the next tuple in the chain, not the tuple itself.

Backpatch to 9.4, where the 'ctid' argument to XactLockTableWait was added
(in commit f88d4cfc)
2015-02-04 16:00:34 +02:00
Stephen Frost c7cf9a2433 Add usebypassrls to pg_user and pg_shadow
The row level security patches didn't add the 'usebypassrls' columns to
the pg_user and pg_shadow views on the belief that they were deprecated,
but we havn't actually said they are and therefore we should include it.

This patch corrects that, adds missing documentation for rolbypassrls
into the system catalog page for pg_authid, along with the entries for
pg_user and pg_shadow, and cleans up a few other uses of 'row-level'
cases to be 'row level' in the docs.

Pointed out by Amit Kapila.

Catalog version bump due to system view changes.
2015-01-28 21:47:15 -05:00
Tom Lane fd496129d1 Clean up some mess in row-security patches.
Fix unsafe coding around PG_TRY in RelationBuildRowSecurity: can't change
a variable inside PG_TRY and then use it in PG_CATCH without marking it
"volatile".  In this case though it seems saner to avoid that by doing
a single assignment before entering the TRY block.

I started out just intending to fix that, but the more I looked at the
row-security code the more distressed I got.  This patch also fixes
incorrect construction of the RowSecurityPolicy cache entries (there was
not sufficient care taken to copy pass-by-ref data into the cache memory
context) and a whole bunch of sloppiness around the definition and use of
pg_policy.polcmd.  You can't use nulls in that column because initdb will
mark it NOT NULL --- and I see no particular reason why a null entry would
be a good idea anyway, so changing initdb's behavior is not the right
answer.  The internal value of '\0' wouldn't be suitable in a "char" column
either, so after a bit of thought I settled on using '*' to represent ALL.
Chasing those changes down also revealed that somebody wasn't paying
attention to what the underlying values of ACL_UPDATE_CHR etc really were,
and there was a great deal of lackadaiscalness in the catalogs.sgml
documentation for pg_policy and pg_policies too.

This doesn't pretend to be a complete code review for the row-security
stuff, it just fixes the things that were in my face while dealing with
the bugs in RelationBuildRowSecurity.
2015-01-24 16:16:22 -05:00
Heikki Linnakangas e922a13058 Spell the X072 feature correctly, was missing "with".
Also use lower-case for a few more features, to be consistent with the
others and with the SQL spec.
2015-01-13 16:08:55 +02:00
Bruce Momjian 4baaf863ec Update copyright for 2015
Backpatch certain files through 9.0
2015-01-06 11:43:47 -05:00
Tom Lane a486841eb1 Print more information about getObjectIdentityParts() failures.
This might help us debug what's happening on some buildfarm members.

In passing, reduce the message from ereport to elog --- it doesn't seem
like this should be a user-facing case, so not worth translating.
2014-12-31 14:44:43 -05:00
Alvaro Herrera ba66c9d068 Add missing pstrdup calls
The one for the OCLASS_COLLATION case was noticed by
CLOBBER_CACHE_ALWAYS buildfarm members; the others I spotted by manual
code inspection.

Also remove a redundant check.
2014-12-31 13:19:40 -03:00
Alvaro Herrera a676201490 Add pg_identify_object_as_address
This function returns object type and objname/objargs arrays, which can
be passed to pg_get_object_address.  This is especially useful because
the textual representation can be copied to a remote server in order to
obtain the corresponding OID-based address.  In essence, this function
is the inverse of recently added pg_get_object_address().

Catalog version bumped due to the addition of the new function.

Also add docs to pg_get_object_address.
2014-12-30 15:41:50 -03:00
Alvaro Herrera 3f88672a4e Use TypeName to represent type names in certain commands
In COMMENT, DROP, SECURITY LABEL, and the new pg_get_object_address
function, we were representing types as a list of names, same as other
objects; but types are special objects that require their own
representation to be totally accurate.  In the original COMMENT code we
had a note about fixing it which was lost in the course of c10575ff00.
Change all those places to use TypeName instead, as suggested by that
comment.

Right now the original coding doesn't cause any bugs, so no backpatch.
It is more problematic for proposed future code that operate with object
addresses from the SQL interface; type details such as array-ness are
lost when working with the degraded representation.

Thanks to Petr Jelínek and Dimitri Fontaine for offlist help on finding
a solution to a shift/reduce grammar conflict.
2014-12-30 13:57:23 -03:00
Tom Lane 9a11df1449 Remove duplicate assignment in new pg_get_object_address() function.
Noted by Coverity.
2014-12-28 12:03:32 -05:00
Alvaro Herrera 6630420fc9 Restrict name list len for domain constraints
This avoids an ugly-looking "cache lookup failure" message.

Ugliness pointed out by Andres Freund.
2014-12-26 14:31:37 -03:00
Alvaro Herrera a609d96778 Revert "Use a bitmask to represent role attributes"
This reverts commit 1826987a46.

The overall design was deemed unacceptable, in discussion following the
previous commit message; we might find some parts of it still
salvageable, but I don't want to be on the hook for fixing it, so let's
wait until we have a new patch.
2014-12-23 15:35:49 -03:00
Alvaro Herrera d7ee82e50f Add SQL-callable pg_get_object_address
This allows access to get_object_address from SQL, which is useful to
obtain OID addressing information from data equivalent to that emitted
by the parser.  This is necessary infrastructure of a project to let
replication systems propagate object dropping events to remote servers,
where the schema might be different than the server originating the
DROP.

This patch also adds support for OBJECT_DEFAULT to get_object_address;
that is, it is now possible to refer to a column's default value.

Catalog version bumped due to the new function.

Reviewed by Stephen Frost, Heikki Linnakangas, Robert Haas, Andres
Freund, Abhijit Menon-Sen, Adam Brightwell.
2014-12-23 15:31:29 -03:00
Alvaro Herrera 1826987a46 Use a bitmask to represent role attributes
The previous representation using a boolean column for each attribute
would not scale as well as we want to add further attributes.

Extra auxilliary functions are added to go along with this change, to
make up for the lost convenience of access of the old representation.

Catalog version bumped due to change in catalogs and the new functions.

Author: Adam Brightwell, minor tweaks by Álvaro
Reviewed by: Stephen Frost, Andres Freund, Álvaro Herrera
2014-12-23 10:22:09 -03:00
Alvaro Herrera 7eca575d1c get_object_address: separate domain constraints from table constraints
Apart from enabling comments on domain constraints, this enables a
future project to replicate object dropping to remote servers: with the
current mechanism there's no way to distinguish between the two types of
constraints, so there's no way to know what to drop.

Also added support for the domain constraint comments in psql's \dd and
pg_dump.

Catalog version bumped due to the change in ObjectType enum.
2014-12-23 09:06:44 -03:00
Alvaro Herrera 0ee98d1cbf pg_event_trigger_dropped_objects: add behavior flags
Add "normal" and "original" flags as output columns to the
pg_event_trigger_dropped_objects() function.  With this it's possible to
distinguish which objects, among those listed, need to be explicitely
referenced when trying to replicate a deletion.

This is necessary so that the list of objects can be pruned to the
minimum necessary to replicate the DROP command in a remote server that
might have slightly different schema (for instance, TOAST tables and
constraints with different names and such.)

Catalog version bumped due to change of function definition.

Reviewed by: Abhijit Menon-Sen, Stephen Frost, Heikki Linnakangas,
Robert Haas.
2014-12-19 15:00:45 -03:00
Tom Lane fc2ac1fb41 Allow CHECK constraints to be placed on foreign tables.
As with NOT NULL constraints, we consider that such constraints are merely
reports of constraints that are being enforced by the remote server (or
other underlying storage mechanism).  Their only real use is to allow
planner optimizations, for example in constraint-exclusion checks.  Thus,
the code changes here amount to little more than removal of the error that
was formerly thrown for applying CHECK to a foreign table.

(In passing, do a bit of cleanup of the ALTER FOREIGN TABLE reference page,
which had accumulated some weird decisions about ordering etc.)

Shigeru Hanada and Etsuro Fujita, reviewed by Kyotaro Horiguchi and
Ashutosh Bapat.
2014-12-17 17:00:53 -05:00
Tom Lane 96d66bcfc6 Improve performance of OverrideSearchPathMatchesCurrent().
This function was initially coded on the assumption that it would not be
performance-critical, but that turns out to be wrong in workloads that
are heavily dependent on the speed of plpgsql functions.  Speed it up by
hard-coding the comparison rules, thereby avoiding palloc/pfree traffic
from creating and immediately freeing an OverrideSearchPath object.
Per report from Scott Marlowe.
2014-11-28 12:37:27 -05:00
Stephen Frost 143b39c185 Rename pg_rowsecurity -> pg_policy and other fixes
As pointed out by Robert, we should really have named pg_rowsecurity
pg_policy, as the objects stored in that catalog are policies.  This
patch fixes that and updates the column names to start with 'pol' to
match the new catalog name.

The security consideration for COPY with row level security, also
pointed out by Robert, has also been addressed by remembering and
re-checking the OID of the relation initially referenced during COPY
processing, to make sure it hasn't changed under us by the time we
finish planning out the query which has been built.

Robert and Alvaro also commented on missing OCLASS and OBJECT entries
for POLICY (formerly ROWSECURITY or POLICY, depending) in various
places.  This patch fixes that too, which also happens to add the
ability to COMMENT on policies.

In passing, attempt to improve the consistency of messages, comments,
and documentation as well.  This removes various incarnations of
'row-security', 'row-level security', 'Row-security', etc, in favor
of 'policy', 'row level security' or 'row_security' as appropriate.

Happy Thanksgiving!
2014-11-27 01:15:57 -05:00
Stephen Frost 25976710df Add int64 -> int8 mapping to genbki
Per discussion with Tom and Andrew, 64bit integers are no longer a
problem for the catalogs, so go ahead and add the mapping from the C
int64 type to the int8 SQL identification to allow using them.

Patch by Adam Brightwell
2014-11-25 12:12:19 -05:00
Heikki Linnakangas 2c03216d83 Revamp the WAL record format.
Each WAL record now carries information about the modified relation and
block(s) in a standardized format. That makes it easier to write tools that
need that information, like pg_rewind, prefetching the blocks to speed up
recovery, etc.

There's a whole new API for building WAL records, replacing the XLogRecData
chains used previously. The new API consists of XLogRegister* functions,
which are called for each buffer and chunk of data that is added to the
record. The new API also gives more control over when a full-page image is
written, by passing flags to the XLogRegisterBuffer function.

This also simplifies the XLogReadBufferForRedo() calls. The function can dig
the relation and block number from the WAL record, so they no longer need to
be passed as arguments.

For the convenience of redo routines, XLogReader now disects each WAL record
after reading it, copying the main data part and the per-block data into
MAXALIGNed buffers. The data chunks are not aligned within the WAL record,
but the redo routines can assume that the pointers returned by XLogRecGet*
functions are. Redo routines are now passed the XLogReaderState, which
contains the record in the already-disected format, instead of the plain
XLogRecord.

The new record format also makes the fixed size XLogRecord header smaller,
by removing the xl_len field. The length of the "main data" portion is now
stored at the end of the WAL record, and there's a separate header after
XLogRecord for it. The alignment padding at the end of XLogRecord is also
removed. This compansates for the fact that the new format would otherwise
be more bulky than the old format.

Reviewed by Andres Freund, Amit Kapila, Michael Paquier, Alvaro Herrera,
Fujii Masao.
2014-11-20 18:46:41 +02:00
Alvaro Herrera 0f9692b40d Fix relpersistence setting in reindex_index
Buildfarm members with CLOBBER_CACHE_ALWAYS advised us that commit
85b506bbfc was mistaken in setting the relpersistence value of the
index directly in the relcache entry, within reindex_index.  The reason
for the failure is that an invalidation message that comes after mucking
with the relcache entry directly, but before writing it to the catalogs,
would cause the entry to become rebuilt in place from catalogs with the
old contents, losing the update.

Fix by passing the correct persistence value to
RelationSetNewRelfilenode instead; this routine also writes the updated
tuple to pg_class, avoiding the problem.  Suggested by Tom Lane.
2014-11-17 11:23:35 -03:00
Alvaro Herrera 85b506bbfc Get rid of SET LOGGED indexes persistence kludge
This removes ATChangeIndexesPersistence() introduced by f41872d0c1
which was too ugly to live for long.  Instead, the correct persistence
marking is passed all the way down to reindex_index, so that the
transient relation built to contain the index relfilenode can
get marked correctly right from the start.

Author: Fabrízio de Royes Mello
Review and editorialization by Michael Paquier
                                     and Álvaro Herrera
2014-11-15 01:19:49 -03:00
Tom Lane 2edfc021c6 Fix dependency searching for case where column is visited before table.
When the recursive search in dependency.c visits a column and then later
visits the whole table containing the column, it needs to propagate the
drop-context flags for the table to the existing target-object entry for
the column.  Otherwise we might refuse the DROP (if not CASCADE) on the
incorrect grounds that there was no automatic drop pathway to the column.
Remarkably, this has not been reported before, though it's possible at
least when an extension creates both a datatype and a table using that
datatype.

Rather than just marking the column as allowed to be dropped, it might
seem good to skip the DROP COLUMN step altogether, since the later DROP
of the table will surely get the job done.  The problem with that is that
the datatype would then be dropped before the table (since the whole
situation occurred because we visited the datatype, and then recursed to
the dependent column, before visiting the table).  That seems pretty risky,
and the case is rare enough that it doesn't seem worth expending a lot of
effort or risk to make the drops happen in a safe order.  So we just play
dumb and delete the column separately according to the existing drop
ordering rules.

Per report from Petr Jelinek, though this is different from his proposed
patch.

Back-patch to 9.1, where extensions were introduced.  There's currently
no evidence that such cases can arise before 9.1, and in any case we would
also need to back-patch cb5c2ba2d8 to 9.0
if we wanted to back-patch this.
2014-11-11 17:00:11 -05:00
Alvaro Herrera 7516f52594 BRIN: Block Range Indexes
BRIN is a new index access method intended to accelerate scans of very
large tables, without the maintenance overhead of btrees or other
traditional indexes.  They work by maintaining "summary" data about
block ranges.  Bitmap index scans work by reading each summary tuple and
comparing them with the query quals; all pages in the range are returned
in a lossy TID bitmap if the quals are consistent with the values in the
summary tuple, otherwise not.  Normal index scans are not supported
because these indexes do not store TIDs.

As new tuples are added into the index, the summary information is
updated (if the block range in which the tuple is added is already
summarized) or not; in the latter case, a subsequent pass of VACUUM or
the brin_summarize_new_values() function will create the summary
information.

For data types with natural 1-D sort orders, the summary info consists
of the maximum and the minimum values of each indexed column within each
page range.  This type of operator class we call "Minmax", and we
supply a bunch of them for most data types with B-tree opclasses.
Since the BRIN code is generalized, other approaches are possible for
things such as arrays, geometric types, ranges, etc; even for things
such as enum types we could do something different than minmax with
better results.  In this commit I only include minmax.

Catalog version bumped due to new builtin catalog entries.

There's more that could be done here, but this is a good step forwards.

Loosely based on ideas from Simon Riggs; code mostly by Álvaro Herrera,
with contribution by Heikki Linnakangas.

Patch reviewed by: Amit Kapila, Heikki Linnakangas, Robert Haas.
Testing help from Jeff Janes, Erik Rijkers, Emanuel Calvo.

PS:
  The research leading to these results has received funding from the
  European Union's Seventh Framework Programme (FP7/2007-2013) under
  grant agreement n° 318633.
2014-11-07 16:38:14 -03:00
Heikki Linnakangas 2076db2aea Move the backup-block logic from XLogInsert to a new file, xloginsert.c.
xlog.c is huge, this makes it a little bit smaller, which is nice. Functions
related to putting together the WAL record are in xloginsert.c, and the
lower level stuff for managing WAL buffers and such are in xlog.c.

Also move the definition of XLogRecord to a separate header file. This
causes churn in the #includes of all the files that write WAL records, and
redo routines, but it avoids pulling in xlog.h into most places.

Reviewed by Michael Paquier, Alvaro Herrera, Andres Freund and Amit Kapila.
2014-11-06 13:55:36 +02:00
Fujii Masao 08309aaf74 Implement IF NOT EXIST for CREATE INDEX.
Fabrízio de Royes Mello, reviewed by Marti Raudsepp, Adam Brightwell and me.
2014-11-06 18:48:33 +09:00
Tom Lane a00d468e65 Remove obsolete commentary.
Since we got rid of non-MVCC catalog scans, the fourth reason given for
using a non-transactional update in index_update_stats() is obsolete.
The other three are still good, so we're not going to change the code,
but fix the comment.
2014-10-28 18:36:02 -04:00
Alvaro Herrera 7b1c2a0f20 Split builtins.h to a new header ruleutils.h
The new header contains many prototypes for functions in ruleutils.c
that are not exposed to the SQL level.

Reviewed by Andres Freund and Michael Paquier.
2014-10-08 18:10:47 -03:00
Stephen Frost 78d72563ef Fix CreatePolicy, pg_dump -v; psql and doc updates
Peter G pointed out that valgrind was, rightfully, complaining about
CreatePolicy() ending up copying beyond the end of the parsed policy
name.  Name is a fixed-size type and we need to use namein (through
DirectFunctionCall1()) to flush out the entire array before we pass
it down to heap_form_tuple.

Michael Paquier pointed out that pg_dump --verbose was missing a
newline and Fabrízio de Royes Mello further pointed out that the
schema was also missing from the messages, so fix those also.

Also, based on an off-list comment from Kevin, rework the psql \d
output to facilitate copy/pasting into a new CREATE or ALTER POLICY
command.

Lastly, improve the pg_policies view and update the documentation for
it, along with a few other minor doc corrections based on an off-list
discussion with Adam Brightwell.
2014-10-03 16:31:53 -04:00
Stephen Frost c8a026e4f1 Revert 95d737ff to add 'ignore_nulls'
Per discussion, revert the commit which added 'ignore_nulls' to
row_to_json.  This capability would be better added as an independent
function rather than being bolted on to row_to_json.  Additionally,
the implementation didn't address complex JSON objects, and so was
incomplete anyway.

Pointed out by Tom and discussed with Andrew and Robert.
2014-09-29 13:32:22 -04:00
Stephen Frost 6550b901fe Code review for row security.
Buildfarm member tick identified an issue where the policies in the
relcache for a relation were were being replaced underneath a running
query, leading to segfaults while processing the policies to be added
to a query.  Similar to how TupleDesc RuleLocks are handled, add in a
equalRSDesc() function to check if the policies have actually changed
and, if not, swap back the rsdesc field (using the original instead of
the temporairly built one; the whole structure is swapped and then
specific fields swapped back).  This now passes a CLOBBER_CACHE_ALWAYS
for me and should resolve the buildfarm error.

In addition to addressing this, add a new chapter in Data Definition
under Privileges which explains row security and provides examples of
its usage, change \d to always list policies (even if row security is
disabled- but note that it is disabled, or enabled with no policies),
rework check_role_for_policy (it really didn't need the entire policy,
but it did need to be using has_privs_of_role()), and change the field
in pg_class to relrowsecurity from relhasrowsecurity, based on
Heikki's suggestion.  Also from Heikki, only issue SET ROW_SECURITY in
pg_restore when talking to a 9.5+ server, list Bypass RLS in \du, and
document --enable-row-security options for pg_dump and pg_restore.

Lastly, fix a number of minor whitespace and typo issues from Heikki,
Dimitri, add a missing #include, per Peter E, fix a few minor
variable-assigned-but-not-used and resource leak issues from Coverity
and add tab completion for role attribute bypassrls as well.
2014-09-24 16:32:22 -04:00
Stephen Frost 491c029dbc Row-Level Security Policies (RLS)
Building on the updatable security-barrier views work, add the
ability to define policies on tables to limit the set of rows
which are returned from a query and which are allowed to be added
to a table.  Expressions defined by the policy for filtering are
added to the security barrier quals of the query, while expressions
defined to check records being added to a table are added to the
with-check options of the query.

New top-level commands are CREATE/ALTER/DROP POLICY and are
controlled by the table owner.  Row Security is able to be enabled
and disabled by the owner on a per-table basis using
ALTER TABLE .. ENABLE/DISABLE ROW SECURITY.

Per discussion, ROW SECURITY is disabled on tables by default and
must be enabled for policies on the table to be used.  If no
policies exist on a table with ROW SECURITY enabled, a default-deny
policy is used and no records will be visible.

By default, row security is applied at all times except for the
table owner and the superuser.  A new GUC, row_security, is added
which can be set to ON, OFF, or FORCE.  When set to FORCE, row
security will be applied even for the table owner and superusers.
When set to OFF, row security will be disabled when allowed and an
error will be thrown if the user does not have rights to bypass row
security.

Per discussion, pg_dump sets row_security = OFF by default to ensure
that exports and backups will have all data in the table or will
error if there are insufficient privileges to bypass row security.
A new option has been added to pg_dump, --enable-row-security, to
ask pg_dump to export with row security enabled.

A new role capability, BYPASSRLS, which can only be set by the
superuser, is added to allow other users to be able to bypass row
security using row_security = OFF.

Many thanks to the various individuals who have helped with the
design, particularly Robert Haas for his feedback.

Authors include Craig Ringer, KaiGai Kohei, Adam Brightwell, Dean
Rasheed, with additional changes and rework by me.

Reviewers have included all of the above, Greg Smith,
Jeff McCormick, and Robert Haas.
2014-09-19 11:18:35 -04:00
Stephen Frost 95d737ff45 Add 'ignore_nulls' option to row_to_json
Provide an option to skip NULL values in a row when generating a JSON
object from that row with row_to_json.  This can reduce the size of the
JSON object in cases where columns are NULL without really reducing the
information in the JSON object.

This also makes row_to_json into a single function with default values,
rather than having multiple functions.  In passing, change array_to_json
to also be a single function with default values (we don't add an
'ignore_nulls' option yet- it's not clear that there is a sensible
use-case there, and it hasn't been asked for in any case).

Pavel Stehule
2014-09-11 21:23:51 -04:00
Bruce Momjian a7ae1dcf49 pg_upgrade: prevent automatic oid assignment
Prevent automatic oid assignment when in binary upgrade mode.  Also
throw an error when contrib/pg_upgrade_support functions are called when
not in binary upgrade mode.

This prevent automatically-assigned oids from conflicting with later
pre-assigned oids coming from the old cluster.  It also makes sure oids
are preserved in call important cases.
2014-08-25 22:19:05 -04:00
Bruce Momjian 73fe87503f rename macro isTempOrToastNamespace to isTempOrTempToastNamespace
Done for clarity
2014-08-25 21:28:19 -04:00
Bruce Momjian 4c6780fd17 pg_upgrade: prevent oid conflicts with new-cluster TOAST tables
Previously, TOAST tables only required in the new cluster could cause
oid conflicts if they were auto-numbered and a later conflicting oid had
to be assigned.

Backpatch through 9.3
2014-08-07 14:56:13 -04:00
Peter Eisentraut 0e819c5e98 Update SQL features list 2014-07-21 00:42:50 -04:00
Alvaro Herrera 6bdf4b9c7d Fix REASSIGN OWNED for text search objects
Trying to reassign objects owned by a user that had text search
dictionaries or configurations used to fail with:
ERROR:  unexpected classid 3600
or
ERROR:  unexpected classid 3602

Fix by adding cases for those object types in a switch in pg_shdepend.c.

Both REASSIGN OWNED and text search objects go back all the way to 8.1,
so backpatch to all supported branches.  In 9.3 the alter-owner code was
made generic, so the required change in recent branches is pretty
simple; however, for 9.2 and older ones we need some additional
reshuffling to enable specifying objects by OID rather than name.

Text search templates and parsers are not owned objects, so there's no
change required for them.

Per bug #9749 reported by Michal Novotný
2014-07-15 13:24:07 -04:00
Peter Eisentraut d90ad5d8ab Small spelling fix 2014-07-15 08:45:27 -04:00
Tom Lane a749a23d7a Remove use_json_as_text options from json_to_record/json_populate_record.
The "false" case was really quite useless since all it did was to throw
an error; a definition not helped in the least by making it the default.
Instead let's just have the "true" case, which emits nested objects and
arrays in JSON syntax.  We might later want to provide the ability to
emit sub-objects in Postgres record or array syntax, but we'd be best off
to drive that off a check of the target field datatype, not a separate
argument.

For the functions newly added in 9.4, we can just remove the flag arguments
outright.  We can't do that for json_populate_record[set], which already
existed in 9.3, but we can ignore the argument and always behave as if it
were "true".  It helps that the flag arguments were optional and not
documented in any useful fashion anyway.
2014-06-29 13:50:58 -04:00
Noah Misch d098b236f3 Fix typos in comments. 2014-06-11 19:50:29 -04:00
Andres Freund f0c108560b Consistently spell a replication slot's name as slot_name.
Previously there's been a mix between 'slotname' and 'slot_name'. It's
not nice to be unneccessarily inconsistent in a new feature. As a post
beta1 initdb now is required in the wake of eeca4cd35e, fix the
inconsistencies.
Most the changes won't affect usage of replication slots because the
majority of changes is around function parameter names. The prominent
exception to that is that the recovery.conf parameter
'primary_slotname' is now named 'primary_slot_name'.
2014-06-05 16:29:20 +02:00
Andres Freund 621a99a666 Fix longstanding bug in HeapTupleSatisfiesVacuum().
HeapTupleSatisfiesVacuum() didn't properly discern between
DELETE_IN_PROGRESS and INSERT_IN_PROGRESS for rows that have been
inserted in the current transaction and deleted in a aborted
subtransaction of the current backend. At the very least that caused
problems for CLUSTER and CREATE INDEX in transactions that had
aborting subtransactions producing rows, leading to warnings like:
WARNING:  concurrent delete in progress within table "..."
possibly in an endless, uninterruptible, loop.

Instead of treating *InProgress xmins the same as *IsCurrent ones,
treat them as being distinct like the other visibility routines. As
implemented this separatation can cause a behaviour change for rows
that have been inserted and deleted in another, still running,
transaction. HTSV will now return INSERT_IN_PROGRESS instead of
DELETE_IN_PROGRESS for those. That's both, more in line with the other
visibility routines and arguably more correct. The latter because a
INSERT_IN_PROGRESS will make callers look at/wait for xmin, instead of
xmax.
The only current caller where that's possibly worse than the old
behaviour is heap_prune_chain() which now won't mark the page as
prunable if a row has concurrently been inserted and deleted. That's
harmless enough.

As a cautionary measure also insert a interrupt check before the gotos
in IndexBuildHeapScan() that lead to the uninterruptible loop. There
are other possible causes, like a row that several sessions try to
update and all fail, for repeated loops and the cost of doing so in
the retry case is low.

As this bug goes back all the way to the introduction of
subtransactions in 573a71a5da backpatch to all supported releases.

Reported-By: Sandro Santilli
2014-06-04 21:36:19 +02:00
Bruce Momjian 0a78320057 pgindent run for 9.4
This includes removing tabs after periods in C comments, which was
applied to back branches, so this change should not effect backpatching.
2014-05-06 12:12:18 -04:00
Tom Lane 2d00190495 Rationalize common/relpath.[hc].
Commit a730183926 created rather a mess by
putting dependencies on backend-only include files into include/common.
We really shouldn't do that.  To clean it up:

* Move TABLESPACE_VERSION_DIRECTORY back to its longtime home in
catalog/catalog.h.  We won't consider this symbol part of the FE/BE API.

* Push enum ForkNumber from relfilenode.h into relpath.h.  We'll consider
relpath.h as the source of truth for fork numbers, since relpath.c was
already partially serving that function, and anyway relfilenode.h was
kind of a random place for that enum.

* So, relfilenode.h now includes relpath.h rather than vice-versa.  This
direction of dependency is fine.  (That allows most, but not quite all,
of the existing explicit #includes of relpath.h to go away again.)

* Push forkname_to_number from catalog.c to relpath.c, just to centralize
fork number stuff a bit better.

* Push GetDatabasePath from catalog.c to relpath.c; it was rather odd
that the previous commit didn't keep this together with relpath().

* To avoid needing relfilenode.h in common/, redefine the underlying
function (now called GetRelationPath) as taking separate OID arguments,
and make the APIs using RelFileNode or RelFileNodeBackend into macro
wrappers.  (The macros have a potential multiple-eval risk, but none of
the existing call sites have an issue with that; one of them had such a
risk already anyway.)

* Fix failure to follow the directions when "init" fork type was added;
specifically, the errhint in forkname_to_number wasn't updated, and neither
was the SGML documentation for pg_relation_size().

* Fix tablespace-path-too-long check in CreateTableSpace() to account for
fork-name component of maximum-length pathnames.  This requires putting
FORKNAMECHARS into a header file, but it was rather useless (and
actually unreferenced) where it was.

The last couple of items are potentially back-patchable bug fixes,
if anyone is sufficiently excited about them; but personally I'm not.

Per a gripe from Christoph Berg about how include/common wasn't
self-contained.
2014-04-30 17:30:50 -04:00
Tom Lane 39b0c7681e Record the proper typmod for an index expression column.
We should use exprTypmod() to extract the typmod of the expression,
instead of just blindly storing -1.  This seems to have been an aboriginal
oversight in commit fc8d970cbc which
introduced general-expression indexes.  The consequences are only cosmetic
at present, since the index machinery doesn't really look at typmod for
index columns; but still it seems best to describe the column type as
precisely as we can.  Per off-list complaint from Thomas Fanghaenel.
2014-04-26 12:22:09 -04:00
Tom Lane f0fedfe82c Allow polymorphic aggregates to have non-polymorphic state data types.
Before 9.4, such an aggregate couldn't be declared, because its final
function would have to have polymorphic result type but no polymorphic
argument, which CREATE FUNCTION would quite properly reject.  The
ordered-set-aggregate patch found a workaround: allow the final function
to be declared as accepting additional dummy arguments that have types
matching the aggregate's regular input arguments.  However, we failed
to notice that this problem applies just as much to regular aggregates,
despite the fact that we had a built-in regular aggregate array_agg()
that was known to be undeclarable in SQL because its final function
had an illegal signature.  So what we should have done, and what this
patch does, is to decouple the extra-dummy-arguments behavior from
ordered-set aggregates and make it generally available for all aggregate
declarations.  We have to put this into 9.4 rather than waiting till
later because it slightly alters the rules for declaring ordered-set
aggregates.

The patch turned out a bit bigger than I'd hoped because it proved
necessary to record the extra-arguments option in a new pg_aggregate
column.  I'd thought we could just look at the final function's pronargs
at runtime, but that didn't work well for variadic final functions.
It's probably just as well though, because it simplifies life for pg_dump
to record the option explicitly.

While at it, fix array_agg() to have a valid final-function signature,
and add an opr_sanity test to notice future deviations from polymorphic
consistency.  I also marked the percentile_cont() aggregates as not
needing extra arguments, since they don't.
2014-04-23 19:17:41 -04:00
Alvaro Herrera 83ab8e32f2 Fix object identities for text search objects
We were neglecting to schema-qualify them.

Backpatch to 9.3, where object identities were introduced as a concept
by commit f8348ea32e.
2014-04-16 18:25:44 -03:00
Tom Lane a9d9acbf21 Create infrastructure for moving-aggregate optimization.
Until now, when executing an aggregate function as a window function
within a window with moving frame start (that is, any frame start mode
except UNBOUNDED PRECEDING), we had to recalculate the aggregate from
scratch each time the frame head moved.  This patch allows an aggregate
definition to include an alternate "moving aggregate" implementation
that includes an inverse transition function for removing rows from
the aggregate's running state.  As long as this can be done successfully,
runtime is proportional to the total number of input rows, rather than
to the number of input rows times the average frame length.

This commit includes the core infrastructure, documentation, and regression
tests using user-defined aggregates.  Follow-on commits will update some
of the built-in aggregates to use this feature.

David Rowley and Florian Pflug, reviewed by Dean Rasheed; additional
hacking by me
2014-04-12 12:03:30 -04:00
Robert Haas 0886fc6a5c Add new to_reg* functions for error-free OID lookups.
These functions won't throw an error if the object doesn't exist,
or if (for functions and operators) there's more than one matching
object.

Yugo Nagata and Nozomi Anzai, reviewed by Amit Khandekar, Marti
Raudsepp, Amit Kapila, and me.
2014-04-08 10:27:56 -04:00
Simon Riggs e5550d5fec Reduce lock levels of some ALTER TABLE cmds
VALIDATE CONSTRAINT

CLUSTER ON
SET WITHOUT CLUSTER

ALTER COLUMN SET STATISTICS
ALTER COLUMN SET ()
ALTER COLUMN RESET ()

All other sub-commands use AccessExclusiveLock

Simon Riggs and Noah Misch

Reviews by Robert Haas and Andres Freund
2014-04-06 11:13:43 -04:00
Andrew Dunstan f9c6d72cbf Cleanup around json_to_record/json_to_recordset
Set function parameter names and defaults. Add jsonb versions (which the
code already provided for so the actual new code is trivial). Add jsonb
regression tests and docs.

Bump catalog version (which I apparently forgot to do when jsonb was
committed).
2014-03-26 10:18:24 -04:00
Andrew Dunstan d9134d0a35 Introduce jsonb, a structured format for storing json.
The new format accepts exactly the same data as the json type. However, it is
stored in a format that does not require reparsing the orgiginal text in order
to process it, making it much more suitable for indexing and other operations.
Insignificant whitespace is discarded, and the order of object keys is not
preserved. Neither are duplicate object keys kept - the later value for a given
key is the only one stored.

The new type has all the functions and operators that the json type has,
with the exception of the json generation functions (to_json, json_agg etc.)
and with identical semantics. In addition, there are operator classes for
hash and btree indexing, and two classes for GIN indexing, that have no
equivalent in the json type.

This feature grew out of previous work by Oleg Bartunov and Teodor Sigaev, which
was intended to provide similar facilities to a nested hstore type, but which
in the end proved to have some significant compatibility issues.

Authors: Oleg Bartunov,  Teodor Sigaev, Peter Geoghegan and Andrew Dunstan.
Review: Andres Freund
2014-03-23 16:40:19 -04:00
Alvaro Herrera f88d4cfc9d Setup error context callback for transaction lock waits
With this in place, a session blocking behind another one because of
tuple locks will get a context line mentioning the relation name, tuple
TID, and operation being done on tuple.  For example:

LOG:  process 11367 still waiting for ShareLock on transaction 717 after 1000.108 ms
DETAIL:  Process holding the lock: 11366. Wait queue: 11367.
CONTEXT:  while updating tuple (0,2) in relation "foo"
STATEMENT:  UPDATE foo SET value = 3;

Most usefully, the new line is displayed by log entries due to
log_lock_waits, although of course it will be printed by any other log
message as well.

Author: Christian Kruse, some tweaks by Álvaro Herrera
Reviewed-by: Amit Kapila, Andres Freund, Tom Lane, Robert Haas
2014-03-19 15:10:36 -03:00
Tom Lane d70cf811f7 During index build, check and elog (not just Assert) for broken HOT chain.
The recently-fixed bug in WAL replay could result in not finding a parent
tuple for a heap-only tuple.  The existing code would either Assert or
generate an invalid index entry, neither of which is desirable.  Throw a
regular error instead.
2014-03-17 12:36:11 -04:00
Bruce Momjian 5024044a20 C comments: improve description of relfilenode uniqueness
Report by Antonin Houska
2014-03-08 12:20:30 -05:00
Alvaro Herrera 84df54b22e Constructors for interval, timestamp, timestamptz
Author: Pavel Stěhule, editorialized somewhat by Álvaro Herrera
Reviewed-by: Tomáš Vondra, Marko Tiikkaja
With input from Fabrízio de Royes Mello, Jim Nasby
2014-03-04 15:09:43 -03:00
Robert Haas b89e151054 Introduce logical decoding.
This feature, building on previous commits, allows the write-ahead log
stream to be decoded into a series of logical changes; that is,
inserts, updates, and deletes and the transactions which contain them.
It is capable of handling decoding even across changes to the schema
of the effected tables.  The output format is controlled by a
so-called "output plugin"; an example is included.  To make use of
this in a real replication system, the output plugin will need to be
modified to produce output in the format appropriate to that system,
and to perform filtering.

Currently, information can be extracted from the logical decoding
system only via SQL; future commits will add the ability to stream
changes via walsender.

Andres Freund, with review and other contributions from many other
people, including Álvaro Herrera, Abhijit Menon-Sen, Peter Gheogegan,
Kevin Grittner, Robert Haas, Heikki Linnakangas, Fujii Masao, Abhijit
Menon-Sen, Michael Paquier, Simon Riggs, Craig Ringer, and Steve
Singer.
2014-03-03 16:32:18 -05:00
Robert Haas dd1a3bccca Show xid and xmin in pg_stat_activity and pg_stat_replication.
Christian Kruse, reviewed by Andres Freund and myself, with further
minor adjustments by me.
2014-02-25 12:34:04 -05:00
Robert Haas 6f289c2b7d Switch various builtin functions to use pg_lsn instead of text.
The functions in slotfuncs.c don't exist in any released version,
but the changes to xlogfuncs.c represent backward-incompatibilities.
Per discussion, we're hoping that the queries using these functions
are few enough and simple enough that this won't cause too much
breakage for users.

Michael Paquier, reviewed by Andres Freund and further modified
by me.
2014-02-19 11:37:43 -05:00
Robert Haas 5f173040e3 Avoid repeated name lookups during table and index DDL.
If the name lookups come to different conclusions due to concurrent
activity, we might perform some parts of the DDL on a different table
than other parts.  At least in the case of CREATE INDEX, this can be
used to cause the permissions checks to be performed against a
different table than the index creation, allowing for a privilege
escalation attack.

This changes the calling convention for DefineIndex, CreateTrigger,
transformIndexStmt, transformAlterTableStmt, CheckIndexCompatible
(in 9.2 and newer), and AlterTable (in 9.1 and older).  In addition,
CheckRelationOwnership is removed in 9.2 and newer and the calling
convention is changed in older branches.  A field has also been added
to the Constraint node (FkConstraint in 8.4).  Third-party code calling
these functions or using the Constraint node will require updating.

Report by Andres Freund.  Patch by Robert Haas and Andres Freund,
reviewed by Tom Lane.

Security: CVE-2014-0062
2014-02-17 09:33:31 -05:00
Noah Misch 537cbd35c8 Prevent privilege escalation in explicit calls to PL validators.
The primary role of PL validators is to be called implicitly during
CREATE FUNCTION, but they are also normal functions that a user can call
explicitly.  Add a permissions check to each validator to ensure that a
user cannot use explicit validator calls to achieve things he could not
otherwise achieve.  Back-patch to 8.4 (all supported versions).
Non-core procedural language extensions ought to make the same two-line
change to their own validators.

Andres Freund, reviewed by Tom Lane and Noah Misch.

Security: CVE-2014-0061
2014-02-17 09:33:31 -05:00
Robert Haas 858ec11858 Introduce replication slots.
Replication slots are a crash-safe data structure which can be created
on either a master or a standby to prevent premature removal of
write-ahead log segments needed by a standby, as well as (with
hot_standby_feedback=on) pruning of tuples whose removal would cause
replication conflicts.  Slots have some advantages over existing
techniques, as explained in the documentation.

In a few places, we refer to the type of replication slots introduced
by this patch as "physical" slots, because forthcoming patches for
logical decoding will also have slots, but with somewhat different
properties.

Andres Freund and Robert Haas
2014-01-31 22:45:36 -05:00
Fujii Masao 9132b189bf Add pg_stat_archiver statistics view.
This view shows the statistics about the WAL archiver process's activity.

Gabriele Bartolini, reviewed by Michael Paquier, refactored a bit by me.
2014-01-29 02:58:22 +09:00
Heikki Linnakangas f62eba204f Fix typo in README
Amit Langote
2014-01-27 09:33:18 +02:00
Alvaro Herrera b152c6cd0d Make DROP IF EXISTS more consistently not fail
Some cases were still reporting errors and aborting, instead of a NOTICE
that the object was being skipped.  This makes it more difficult to
cleanly handle pg_dump --clean, so change that to instead skip missing
objects properly.

Per bug #7873 reported by Dave Rolsky; apparently this affects a large
number of users.

Authors: Pavel Stehule and Dean Rasheed.  Some tweaks by Álvaro Herrera
2014-01-23 14:40:29 -03:00
Robert Haas 5709b8acc6 Avoid a possible relcache leak in get_object_address_attribute.
There's no apparent way to trigger this, so I'm not going to worry
about back-patching it for now.  But it's still wrong.

Marti Raudsepp
2014-01-21 10:02:37 -05:00
Tom Lane 0d79c0a8cc Make various variables const (read-only).
These changes should generally improve correctness/maintainability.
A nice side benefit is that several kilobytes move from initialized
data to text segment, allowing them to be shared across processes and
probably reducing copy-on-write overhead while forking a new backend.
Unfortunately this doesn't seem to help libpq in the same way (at least
not when it's compiled with -fpic on x86_64), but we can hope the linker
at least collects all nominally-const data together even if it's not
actually part of the text segment.

Also, make pg_encname_tbl[] static in encnames.c, since there seems
no very good reason for any other code to use it; per a suggestion
from Wim Lewis, who independently submitted a patch that was mostly
a subset of this one.

Oskari Saarenmaa, with some editorialization by me
2014-01-18 16:04:32 -05:00
Alvaro Herrera 423e1211a8 Accept pg_upgraded tuples during multixact freezing
The new MultiXact freezing routines introduced by commit 8e9a16ab8f
neglected to consider tuples that came from a pg_upgrade'd database; a
vacuum run that tried to freeze such tuples would die with an error such
as
ERROR: MultiXactId 11415437 does no longer exist -- apparent wraparound

To fix, ensure that GetMultiXactIdMembers is allowed to return empty
multis when the infomask bits are right, as is done in other callsites.

Per trouble report from F-Secure.

In passing, fix a copy&paste bug reported by Andrey Karpov from VIVA64
from their PVS-Studio static checked, that instead of setting relminmxid
to Invalid, we were setting relfrozenxid twice.  Not an important
mistake because that code branch is about relations for which we don't
use the frozenxid/minmxid values at all in the first place, but seems to
warrants a fix nonetheless.
2014-01-10 18:03:18 -03:00
Bruce Momjian 7e04792a1c Update copyright for 2014
Update all files in head, and files COPYRIGHT and legal.sgml in all back
branches.
2014-01-07 16:05:30 -05:00
Alvaro Herrera 1a3e82a7f9 Restore some comments lost during 15732b34e8
Michael Paquier
2014-01-03 13:22:03 -03:00
Tom Lane 8d65da1f01 Support ordered-set (WITHIN GROUP) aggregates.
This patch introduces generic support for ordered-set and hypothetical-set
aggregate functions, as well as implementations of the instances defined in
SQL:2008 (percentile_cont(), percentile_disc(), rank(), dense_rank(),
percent_rank(), cume_dist()).  We also added mode() though it is not in the
spec, as well as versions of percentile_cont() and percentile_disc() that
can compute multiple percentile values in one pass over the data.

Unlike the original submission, this patch puts full control of the sorting
process in the hands of the aggregate's support functions.  To allow the
support functions to find out how they're supposed to sort, a new API
function AggGetAggref() is added to nodeAgg.c.  This allows retrieval of
the aggregate call's Aggref node, which may have other uses beyond the
immediate need.  There is also support for ordered-set aggregates to
install cleanup callback functions, so that they can be sure that
infrastructure such as tuplesort objects gets cleaned up.

In passing, make some fixes in the recently-added support for variadic
aggregates, and make some editorial adjustments in the recent FILTER
additions for aggregates.  Also, simplify use of IsBinaryCoercible() by
allowing it to succeed whenever the target type is ANY or ANYELEMENT.
It was inconsistent that it dealt with other polymorphic target types
but not these.

Atri Sharma and Andrew Gierth; reviewed by Pavel Stehule and Vik Fearing,
and rather heavily editorialized upon by Tom Lane
2013-12-23 16:11:35 -05:00
Alvaro Herrera 6130208e75 Avoid useless palloc during transaction commit
We can allocate the initial relations-to-drop array when first needed,
instead of at function entry; this avoids allocating it when the
function is not going to do anything, which is most of the time.

Backpatch to 9.3, where this behavior was introduced by commit
279628a0a7.

There's more that could be done here, such as possible reworking of the
code to avoid having to palloc anything, but that doesn't sound as
backpatchable as this relatively minor change.

Per complaint from Noah Misch in
20131031145234.GA621493@tornado.leadboat.com
2013-12-20 12:37:30 -03:00
Bruce Momjian 527fdd9df1 Move pg_upgrade_support global variables to their own include file
Previously their declarations were spread around to avoid accidental
access.
2013-12-19 16:10:07 -05:00
Robert Haas 001a573a20 Allow on-detach callbacks for dynamic shared memory segments.
Just as backends must clean up their shared memory state (releasing
lwlocks, buffer pins, etc.) before exiting, they must also perform
any similar cleanups related to dynamic shared memory segments they
have mapped before unmapping those segments.  So add a mechanism to
ensure that.

Existing on_shmem_exit hooks include both "user level" cleanup such
as transaction abort and removal of leftover temporary relations and
also "low level" cleanup that forcibly released leftover shared
memory resources.  On-detach callbacks should run after the first
group but before the second group, so create a new before_shmem_exit
function for registering the early callbacks and keep on_shmem_exit
for the regular callbacks.  (An earlier draft of this patch added an
additional argument to on_shmem_exit, but that had a much larger
footprint and probably a substantially higher risk of breaking third
party code for no real gain.)

Patch by me, reviewed by KaiGai Kohei and Andres Freund.
2013-12-18 13:09:09 -05:00
Robert Haas e55704d8b2 Add new wal_level, logical, sufficient for logical decoding.
When wal_level=logical, we'll log columns from the old tuple as
configured by the REPLICA IDENTITY facility added in commit
07cacba983.  This makes it possible
a properly-configured logical replication solution to correctly
follow table updates even if they change the chosen key columns,
or, with REPLICA IDENTITY FULL, even if the table has no key at
all.  Note that updates which do not modify the replica identity
column won't log anything extra, making the choice of a good key
(i.e. one that will rarely be changed) important to performance
when wal_level=logical is configured.

Each insert, update, or delete to a catalog table will also log
the CMIN and/or CMAX values of stamped by the current transaction.
This is necessary because logical decoding will require access to
historical snapshots of the catalog in order to decode some data
types, and the CMIN/CMAX values that we may need in order to judge
row visibility may have been overwritten by the time we need them.

Andres Freund, reviewed in various versions by myself, Heikki
Linnakangas, KONDO Mitsumasa, and many others.
2013-12-10 19:01:40 -05:00
Robert Haas 8e18d04d4d Refine our definition of what constitutes a system relation.
Although user-defined relations can't be directly created in
pg_catalog, it's possible for them to end up there, because you can
create them in some other schema and then use ALTER TABLE .. SET SCHEMA
to move them there.  Previously, such relations couldn't afterwards
be manipulated, because IsSystemRelation()/IsSystemClass() rejected
all attempts to modify objects in the pg_catalog schema, regardless
of their origin.  With this patch, they now reject only those
objects in pg_catalog which were created at initdb-time, allowing
most operations on user-created tables in pg_catalog to proceed
normally.

This patch also adds new functions IsCatalogRelation() and
IsCatalogClass(), which is similar to IsSystemRelation() and
IsSystemClass() but with a slightly narrower definition: only TOAST
tables of system catalogs are included, rather than *all* TOAST tables.
This is currently used only for making decisions about when
invalidation messages need to be sent, but upcoming logical decoding
patches will find other uses for this information.

Andres Freund, with some modifications by me.
2013-11-28 20:57:20 -05:00
Peter Eisentraut 85ed91ee7d Implement information_schema.parameters.parameter_default column
Reviewed-by: Ali Dar <ali.munir.dar@gmail.com>
Reviewed-by: Amit Khandekar <amit.khandekar@enterprisedb.com>
Reviewed-by: Rodolfo Campero <rodolfo.campero@anachronics.com>
2013-11-26 23:21:35 -05:00
Tom Lane 784e762e88 Support multi-argument UNNEST(), and TABLE() syntax for multiple functions.
This patch adds the ability to write TABLE( function1(), function2(), ...)
as a single FROM-clause entry.  The result is the concatenation of the
first row from each function, followed by the second row from each
function, etc; with NULLs inserted if any function produces fewer rows than
others.  This is believed to be a much more useful behavior than what
Postgres currently does with multiple SRFs in a SELECT list.

This syntax also provides a reasonable way to combine use of column
definition lists with WITH ORDINALITY: put the column definition list
inside TABLE(), where it's clear that it doesn't control the ordinality
column as well.

Also implement SQL-compliant multiple-argument UNNEST(), by turning
UNNEST(a,b,c) into TABLE(unnest(a), unnest(b), unnest(c)).

The SQL standard specifies TABLE() with only a single function, not
multiple functions, and it seems to require an implicit UNNEST() which is
not what this patch does.  There may be something wrong with that reading
of the spec, though, because if it's right then the spec's TABLE() is just
a pointless alternative spelling of UNNEST().  After further review of
that, we might choose to adopt a different syntax for what this patch does,
but in any case this functionality seems clearly worthwhile.

Andrew Gierth, reviewed by Zoltán Böszörményi and Heikki Linnakangas, and
significantly revised by me
2013-11-21 19:37:20 -05:00
Tom Lane 6cb86143e8 Allow aggregates to provide estimates of their transition state data size.
Formerly the planner had a hard-wired rule of thumb for guessing the amount
of space consumed by an aggregate function's transition state data.  This
estimate is critical to deciding whether it's OK to use hash aggregation,
and in many situations the built-in estimate isn't very good.  This patch
adds a column to pg_aggregate wherein a per-aggregate estimate can be
provided, overriding the planner's default, and infrastructure for setting
the column via CREATE AGGREGATE.

It may be that additional smarts will be required in future, perhaps even
a per-aggregate estimation function.  But this is already a step forward.

This is extracted from a larger patch to improve the performance of numeric
and int8 aggregates.  I (tgl) thought it was worth reviewing and committing
this infrastructure separately.  In this commit, all built-in aggregates
are given aggtransspace = 0, so no behavior should change.

Hadi Moshayedi, reviewed by Pavel Stehule and Tomas Vondra
2013-11-16 16:03:40 -05:00
Peter Eisentraut 001e114b8d Fix whitespace issues found by git diff --check, add gitattributes
Set per file type attributes in .gitattributes to fine-tune whitespace
checks.  With the associated cleanups, the tree is now clean for git
2013-11-10 14:48:29 -05:00
Robert Haas 07cacba983 Add the notion of REPLICA IDENTITY for a table.
Pending patches for logical replication will use this to determine
which columns of a tuple ought to be considered as its candidate key.

Andres Freund, with minor, mostly cosmetic adjustments by me
2013-11-08 12:30:43 -05:00
Robert Haas cacbdd7810 Use appendStringInfoString instead of appendStringInfo where possible.
This shaves a few cycles, and generally seems like good programming
practice.

David Rowley
2013-10-31 10:55:59 -04:00
Peter Eisentraut 5b6d08cd29 Add use of asprintf()
Add asprintf(), pg_asprintf(), and psprintf() to simplify string
allocation and composition.  Replacement implementations taken from
NetBSD.

Reviewed-by: Álvaro Herrera <alvherre@2ndquadrant.com>
Reviewed-by: Asif Naeem <anaeem.it@gmail.com>
2013-10-13 00:09:18 -04:00
Alvaro Herrera ada01014d4 Use $(PERL) to invoke duplicate_oids
Per buildfarm failure reported by smilodon
2013-10-10 23:45:38 -03:00
Peter Eisentraut 5dd41f3574 Remove maintainer-check target, fold into normal build
make maintainer-check was obscure and rarely called in practice, and
many breakages were missed.  Fold everything that make maintainer-check
used to do into the normal build.  Specifically:

- Call duplicate_oids when genbki.pl is called.

- Check for tabs in SGML files when the documentation is built.

- Run msgfmt with the -c option during the regular build.  Add an
  additional configure check to see whether we are using the GNU
  version.  (make maintainer-check probably used to fail with non-GNU
  msgfmt.)

Keep maintainer-check as around as phony target for the time being in
case anyone is calling it.  But it won't do anything anymore.
2013-10-10 20:11:56 -04:00
Alvaro Herrera 15732b34e8 Add WaitForLockers in lmgr, refactoring index.c code
This is in support of a future REINDEX CONCURRENTLY feature.

Michael Paquier
2013-10-01 17:57:01 -03:00
Peter Eisentraut b34f8f409b Show schemas in information_schema.schemata that the current has access to
Before, it would only show schemas that the current user owns.  Per
discussion, the new behavior is more useful and consistent for PostgreSQL.
2013-09-09 22:25:37 -04:00
Tom Lane 546f7c2e38 Don't fail for bad GUCs in CREATE FUNCTION with check_function_bodies off.
The previous coding attempted to activate all the GUC settings specified
in SET clauses, so that the function validator could operate in the GUC
environment expected by the function body.  However, this is problematic
when restoring a dump, since the SET clauses might refer to database
objects that don't exist yet.  We already have the parameter
check_function_bodies that's meant to prevent forward references in
function definitions from breaking dumps, so let's change CREATE FUNCTION
to not install the SET values if check_function_bodies is off.

Authors of function validators were already advised not to make any
"context sensitive" checks when check_function_bodies is off, if indeed
they're checking anything at all in that mode.  But extend the
documentation to point out the GUC issue in particular.

(Note that we still check the SET clauses to some extent; the behavior
with !check_function_bodies is now approximately equivalent to what ALTER
DATABASE/ROLE have been doing for awhile with context-dependent GUCs.)

This problem can be demonstrated in all active branches, so back-patch
all the way.
2013-09-03 18:32:20 -04:00
Tom Lane 0d3f4406df Allow aggregate functions to be VARIADIC.
There's no inherent reason why an aggregate function can't be variadic
(even VARIADIC ANY) if its transition function can handle the case.
Indeed, this patch to add the feature touches none of the planner or
executor, and little of the parser; the main missing stuff was DDL and
pg_dump support.

It is true that variadic aggregates can create the same sort of ambiguity
about parameters versus ORDER BY keys that was complained of when we
(briefly) had both one- and two-argument forms of string_agg().  However,
the policy formed in response to that discussion only said that we'd not
create any built-in aggregates with varying numbers of arguments, not that
we shouldn't allow users to do it.  So the logical extension of that is
we can allow users to make variadic aggregates as long as we're wary about
shipping any such in core.

In passing, this patch allows aggregate function arguments to be named, to
the extent of remembering the names in pg_proc and dumping them in pg_dump.
You can't yet call an aggregate using named-parameter notation.  That seems
like a likely future extension, but it'll take some work, and it's not what
this patch is really about.  Likewise, there's still some work needed to
make window functions handle VARIADIC fully, but I left that for another
day.

initdb forced because of new aggvariadic field in Aggref parse nodes.
2013-09-03 17:08:46 -04:00
Robert Haas c9e2e2db5c Partially restore comments discussing enum renumbering hazards.
As noted by Tom Lane, commit 813fb03155
was overly optimistic about how safe it is to concurrently change
enumsortorder values under MVCC catalog scan semantics.  Restore
some of the previous text, with hopefully-correct adjustments for
the new state of play.
2013-08-28 13:21:08 -04:00
Robert Haas 813fb03155 Remove SnapshotNow and HeapTupleSatisfiesNow.
We now use MVCC catalog scans, and, per discussion, have eliminated
all other remaining uses of SnapshotNow, so that we can now get rid of
it.  This will break third-party code which is still using it, which
is intentional, as we want such code to be updated to do things the
new way.
2013-08-01 10:46:19 -04:00
Robert Haas 0518eceec3 Adjust HeapTupleSatisfies* routines to take a HeapTuple.
Previously, these functions took a HeapTupleHeader, but upcoming
patches for logical replication will introduce new a new snapshot
type under which the tuple's TID will be used to lookup (CMIN, CMAX)
for visibility determination purposes.  This makes that information
available.  Code churn is minimal since HeapTupleSatisfiesVisibility
took the HeapTuple anyway, and deferenced it before calling the
satisfies function.

Independently of logical replication, this allows t_tableOid and
t_self to be cross-checked via assertions in tqual.c.  This seems
like a useful way to make sure that all callers are setting these
values properly, which has been previously put forward as
desirable.

Andres Freund, reviewed by Álvaro Herrera
2013-07-22 13:38:44 -04:00
Stephen Frost 4cbe3ac3e8 WITH CHECK OPTION support for auto-updatable VIEWs
For simple views which are automatically updatable, this patch allows
the user to specify what level of checking should be done on records
being inserted or updated.  For 'LOCAL CHECK', new tuples are validated
against the conditionals of the view they are being inserted into, while
for 'CASCADED CHECK' the new tuples are validated against the
conditionals for all views involved (from the top down).

This option is part of the SQL specification.

Dean Rasheed, reviewed by Pavel Stehule
2013-07-18 17:10:16 -04:00
Andrew Dunstan d26888bc4d Move checking an explicit VARIADIC "any" argument into the parser.
This is more efficient and simpler . It does mean that an untyped NULL
can no longer be used in such cases, which should be mentioned in
Release Notes, but doesn't seem a terrible loss. The workaround is to
cast the NULL to some array type.

Pavel Stehule, reviewed by Jeevan Chalke.
2013-07-18 11:52:12 -04:00
Noah Misch 02d2b694ee Update messages, comments and documentation for materialized views.
All instances of the verbiage lagging the code.  Back-patch to 9.3,
where materialized views were introduced.
2013-07-05 15:37:51 -04:00