Commit Graph

291 Commits

Author SHA1 Message Date
Robert Haas f0661c4e8c Make sequential scans parallel-aware.
In addition, this path fills in a number of missing bits and pieces in
the parallel infrastructure.  Paths and plans now have a parallel_aware
flag indicating whether whatever parallel-aware logic they have should
be engaged.  It is believed that we will need this flag for a number of
path/plan types, not just sequential scans, which is why the flag is
generic rather than part of the SeqScan structures specifically.
Also, execParallel.c now gives parallel nodes a chance to initialize
their PlanState nodes from the DSM during parallel worker startup.

Amit Kapila, with a fair amount of adjustment by me.  Review of previous
patch versions by Haribabu Kommi and others.
2015-11-11 08:57:52 -05:00
Robert Haas 6e71dd7ce9 Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend.  Transferring such tuples without modification between two
cooperating backends does not work.  This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves.  The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender.  That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.

Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues.  This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was.  typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
Robert Haas 8538a63070 Make Gather node projection-capable.
The original Gather code failed to mark a Gather node as not able to
do projection, but it couldn't, even though it did call initialize its
projection info via ExecAssignProjectionInfo.  There doesn't seem to
be any good reason for this node not to have projection capability,
so clean things up so that it does.  Without this, plans using Gather
nodes might need to carry extra Result nodes to do projection.
2015-10-28 00:27:58 +01:00
Robert Haas bfc78d7196 Rewrite interaction of parallel mode with parallel executor support.
In the previous coding, before returning from ExecutorRun, we'd shut
down all parallel workers.  This was dead wrong if ExecutorRun was
called with a non-zero tuple count; it had the effect of truncating
the query output.  To fix, give ExecutePlan control over whether to
enter parallel mode, and have it refuse to do so if the tuple count
is non-zero.  Rewrite the Gather logic so that it can cope with being
called outside parallel mode.

Commit 7aea8e4f2d is largely to blame
for this problem, though this patch modifies some subsequently-committed
code which relied on the guarantees it purported to make.
2015-10-16 11:56:02 -04:00
Robert Haas 5fc4c26db5 Allow FDWs to push down quals without breaking EvalPlanQual rechecks.
This fixes a long-standing bug which was discovered while investigating
the interaction between the new join pushdown code and the EvalPlanQual
machinery: if a ForeignScan appears on the inner side of a paramaterized
nestloop, an EPQ recheck would re-return the original tuple even if
it no longer satisfied the pushed-down quals due to changed parameter
values.

This fix adds a new member to ForeignScan and ForeignScanState and a
new argument to make_foreignscan, and requires changes to FDWs which
push down quals to populate that new argument with a list of quals they
have chosen to push down.  Therefore, I'm only back-patching to 9.5,
even though the bug is not new in 9.5.

Etsuro Fujita, reviewed by me and by Kyotaro Horiguchi.
2015-10-15 13:00:40 -04:00
Robert Haas 3bd909b220 Add a Gather executor node.
A Gather executor node runs any number of copies of a plan in an equal
number of workers and merges all of the results into a single tuple
stream.  It can also run the plan itself, if the workers are
unavailable or haven't started up yet.  It is intended to work with
the Partial Seq Scan node which will be added in future commits.

It could also be used to implement parallel query of a different sort
by itself, without help from Partial Seq Scan, if the single_copy mode
is used.  In that mode, a worker executes the plan, and the parallel
leader does not, merely collecting the worker's results.  So, a Gather
node could be inserted into a plan to split the execution of that plan
across two processes.  Nested Gather nodes aren't currently supported,
but we might want to add support for that in the future.

There's nothing in the planner to actually generate Gather nodes yet,
so it's not quite time to break out the champagne.  But we're getting
close.

Amit Kapila.  Some designs suggestions were provided by me, and I also
reviewed the patch.  Single-copy mode, documentation, and other minor
changes also by me.
2015-09-30 19:23:36 -04:00
Alvaro Herrera 1aba62ec63 Allow per-tablespace effective_io_concurrency
Per discussion, nowadays it is possible to have tablespaces that have
wildly different I/O characteristics from others.  Setting different
effective_io_concurrency parameters for those has been measured to
improve performance.

Author: Julien Rouhaud
Reviewed by: Andres Freund
2015-09-08 12:51:42 -03:00
Heikki Linnakangas 804163bc25 Share transition state between different aggregates when possible.
If there are two different aggregates in the query with same inputs, and
the aggregates have the same initial condition and transition function,
only calculate the state value once, and only call the final functions
separately. For example, AVG(x) and SUM(x) aggregates have the same
transition function, which accumulates the sum and number of input tuples.
For a query like "SELECT AVG(x), SUM(x) FROM x", we can therefore
accumulate the state function only once, which gives a nice speedup.

David Rowley, reviewed and edited by me.
2015-08-04 17:53:10 +03:00
Tom Lane dd7a8f66ed Redesign tablesample method API, and do extensive code review.
The original implementation of TABLESAMPLE modeled the tablesample method
API on index access methods, which wasn't a good choice because, without
specialized DDL commands, there's no way to build an extension that can
implement a TSM.  (Raw inserts into system catalogs are not an acceptable
thing to do, because we can't undo them during DROP EXTENSION, nor will
pg_upgrade behave sanely.)  Instead adopt an API more like procedural
language handlers or foreign data wrappers, wherein the only SQL-level
support object needed is a single handler function identified by having
a special return type.  This lets us get rid of the supporting catalog
altogether, so that no custom DDL support is needed for the feature.

Adjust the API so that it can support non-constant tablesample arguments
(the original coding assumed we could evaluate the argument expressions at
ExecInitSampleScan time, which is undesirable even if it weren't outright
unsafe), and discourage sampling methods from looking at invisible tuples.
Make sure that the BERNOULLI and SYSTEM methods are genuinely repeatable
within and across queries, as required by the SQL standard, and deal more
honestly with methods that can't support that requirement.

Make a full code-review pass over the tablesample additions, and fix
assorted bugs, omissions, infelicities, and cosmetic issues (such as
failure to put the added code stanzas in a consistent ordering).
Improve EXPLAIN's output of tablesample plans, too.

Back-patch to 9.5 so that we don't have to support the original API
in production.
2015-07-25 14:39:00 -04:00
Robert Haas 5ca611841b Improve handling of CustomPath/CustomPlan(State) children.
Allow CustomPath to have a list of paths, CustomPlan a list of plans,
and CustomPlanState a list of planstates known to the core system, so
that custom path/plan providers can more reasonably use this
infrastructure for nodes with multiple children.

KaiGai Kohei, per a design suggestion from Tom Lane, with some
further kibitzing by me.
2015-06-26 09:40:47 -04:00
Bruce Momjian 807b9e0dff pgindent run for 9.5 2015-05-23 21:35:49 -04:00
Andres Freund f3d3118532 Support GROUPING SETS, CUBE and ROLLUP.
This SQL standard functionality allows to aggregate data by different
GROUP BY clauses at once. Each grouping set returns rows with columns
grouped by in other sets set to NULL.

This could previously be achieved by doing each grouping as a separate
query, conjoined by UNION ALLs. Besides being considerably more concise,
grouping sets will in many cases be faster, requiring only one scan over
the underlying data.

The current implementation of grouping sets only supports using sorting
for input. Individual sets that share a sort order are computed in one
pass. If there are sets that don't share a sort order, additional sort &
aggregation steps are performed. These additional passes are sourced by
the previous sort step; thus avoiding repeated scans of the source data.

The code is structured in a way that adding support for purely using
hash aggregation or a mix of hashing and sorting is possible. Sorting
was chosen to be supported first, as it is the most generic method of
implementation.

Instead of, as in an earlier versions of the patch, representing the
chain of sort and aggregation steps as full blown planner and executor
nodes, all but the first sort are performed inside the aggregation node
itself. This avoids the need to do some unusual gymnastics to handle
having to return aggregated and non-aggregated tuples from underlying
nodes, as well as having to shut down underlying nodes early to limit
memory usage.  The optimizer still builds Sort/Agg node to describe each
phase, but they're not part of the plan tree, but instead additional
data for the aggregation node. They're a convenient and preexisting way
to describe aggregation and sorting.  The first (and possibly only) sort
step is still performed as a separate execution step. That retains
similarity with existing group by plans, makes rescans fairly simple,
avoids very deep plans (leading to slow explains) and easily allows to
avoid the sorting step if the underlying data is sorted by other means.

A somewhat ugly side of this patch is having to deal with a grammar
ambiguity between the new CUBE keyword and the cube extension/functions
named cube (and rollup). To avoid breaking existing deployments of the
cube extension it has not been renamed, neither has cube been made a
reserved keyword. Instead precedence hacking is used to make GROUP BY
cube(..) refer to the CUBE grouping sets feature, and not the function
cube(). To actually group by a function cube(), unlikely as that might
be, the function name has to be quoted.

Needs a catversion bump because stored rules may change.

Author: Andrew Gierth and Atri Sharma, with contributions from Andres Freund
Reviewed-By: Andres Freund, Noah Misch, Tom Lane, Svenne Krap, Tomas
    Vondra, Erik Rijkers, Marti Raudsepp, Pavel Stehule
Discussion: CAOeZVidmVRe2jU6aMk_5qkxnB7dfmPROzM7Ur8JPW5j8Y5X-Lw@mail.gmail.com
2015-05-16 03:46:31 +02:00
Simon Riggs f6d208d6e5 TABLESAMPLE, SQL Standard and extensible
Add a TABLESAMPLE clause to SELECT statements that allows
user to specify random BERNOULLI sampling or block level
SYSTEM sampling. Implementation allows for extensible
sampling functions to be written, using a standard API.
Basic version follows SQLStandard exactly. Usable
concrete use cases for the sampling API follow in later
commits.

Petr Jelinek

Reviewed by Michael Paquier and Simon Riggs
2015-05-15 14:37:10 -04:00
Heikki Linnakangas 35fcb1b3d0 Allow GiST distance function to return merely a lower-bound.
The distance function can now set *recheck = false, like index quals. The
executor will then re-check the ORDER BY expressions, and use a queue to
reorder the results on the fly.

This makes it possible to do kNN-searches on polygons and circles, which
don't store the exact value in the index, but just a bounding box.

Alexander Korotkov and me
2015-05-15 14:26:51 +03:00
Tom Lane afb9249d06 Add support for doing late row locking in FDWs.
Previously, FDWs could only do "early row locking", that is lock a row as
soon as it's fetched, even though local restriction/join conditions might
discard the row later.  This patch adds callbacks that allow FDWs to do
late locking in the same way that it's done for regular tables.

To make use of this feature, an FDW must support the "ctid" column as a
unique row identifier.  Currently, since ctid has to be of type TID,
the feature is of limited use, though in principle it could be used by
postgres_fdw.  We may eventually allow FDWs to specify another data type
for ctid, which would make it possible for more FDWs to use this feature.

This commit does not modify postgres_fdw to use late locking.  We've
tested some prototype code for that, but it's not in committable shape,
and besides it's quite unclear whether it actually makes sense to do late
locking against a remote server.  The extra round trips required are likely
to outweigh any benefit from improved concurrency.

Etsuro Fujita, reviewed by Ashutosh Bapat, and hacked up a lot by me
2015-05-12 14:10:17 -04:00
Andres Freund e8898e9169 Minor ON CONFLICT related comments and doc fixes.
Geoff Winkless, Stephen Frost, Peter Geoghegan and me.
2015-05-08 19:24:14 +02:00
Andres Freund 168d5805e4 Add support for INSERT ... ON CONFLICT DO NOTHING/UPDATE.
The newly added ON CONFLICT clause allows to specify an alternative to
raising a unique or exclusion constraint violation error when inserting.
ON CONFLICT refers to constraints that can either be specified using a
inference clause (by specifying the columns of a unique constraint) or
by naming a unique or exclusion constraint.  DO NOTHING avoids the
constraint violation, without touching the pre-existing row.  DO UPDATE
SET ... [WHERE ...] updates the pre-existing tuple, and has access to
both the tuple proposed for insertion and the existing tuple; the
optional WHERE clause can be used to prevent an update from being
executed.  The UPDATE SET and WHERE clauses have access to the tuple
proposed for insertion using the "magic" EXCLUDED alias, and to the
pre-existing tuple using the table name or its alias.

This feature is often referred to as upsert.

This is implemented using a new infrastructure called "speculative
insertion". It is an optimistic variant of regular insertion that first
does a pre-check for existing tuples and then attempts an insert.  If a
violating tuple was inserted concurrently, the speculatively inserted
tuple is deleted and a new attempt is made.  If the pre-check finds a
matching tuple the alternative DO NOTHING or DO UPDATE action is taken.
If the insertion succeeds without detecting a conflict, the tuple is
deemed inserted.

To handle the possible ambiguity between the excluded alias and a table
named excluded, and for convenience with long relation names, INSERT
INTO now can alias its target table.

Bumps catversion as stored rules change.

Author: Peter Geoghegan, with significant contributions from Heikki
    Linnakangas and Andres Freund. Testing infrastructure by Jeff Janes.
Reviewed-By: Heikki Linnakangas, Andres Freund, Robert Haas, Simon Riggs,
    Dean Rasheed, Stephen Frost and many others.
2015-05-08 05:43:10 +02:00
Stephen Frost e89bd02f58 Perform RLS WITH CHECK before constraints, etc
The RLS capability is built on top of the WITH CHECK OPTION
system which was added for auto-updatable views, however, unlike
WCOs on views (which are mandated by the SQL spec to not fire until
after all other constraints and checks are done), it makes much more
sense for RLS checks to happen earlier than constraint and uniqueness
checks.

This patch reworks the structure which holds the WCOs a bit to be
explicitly either VIEW or RLS checks and the RLS-related checks are
done prior to the constraint and uniqueness checks.  This also allows
better error reporting as we are now reporting when a violation is due
to a WITH CHECK OPTION and when it's due to an RLS policy violation,
which was independently noted by Craig Ringer as being confusing.

The documentation is also updated to include a paragraph about when RLS
WITH CHECK handling is performed, as there have been a number of
questions regarding that and the documentation was previously silent on
the matter.

Author: Dean Rasheed, with some kabitzing and comment changes by me.
2015-04-24 20:34:26 -04:00
Tom Lane cb1ca4d800 Allow foreign tables to participate in inheritance.
Foreign tables can now be inheritance children, or parents.  Much of the
system was already ready for this, but we had to fix a few things of
course, mostly in the area of planner and executor handling of row locks.

As side effects of this, allow foreign tables to have NOT VALID CHECK
constraints (and hence to accept ALTER ... VALIDATE CONSTRAINT), and to
accept ALTER SET STORAGE and ALTER SET WITH/WITHOUT OIDS.  Continuing to
disallow these things would've required bizarre and inconsistent special
cases in inheritance behavior.  Since foreign tables don't enforce CHECK
constraints anyway, a NOT VALID one is a complete no-op, but that doesn't
mean we shouldn't allow it.  And it's possible that some FDWs might have
use for SET STORAGE or SET WITH OIDS, though doubtless they will be no-ops
for most.

An additional change in support of this is that when a ModifyTable node
has multiple target tables, they will all now be explicitly identified
in EXPLAIN output, for example:

 Update on pt1  (cost=0.00..321.05 rows=3541 width=46)
   Update on pt1
   Foreign Update on ft1
   Foreign Update on ft2
   Update on child3
   ->  Seq Scan on pt1  (cost=0.00..0.00 rows=1 width=46)
   ->  Foreign Scan on ft1  (cost=100.00..148.03 rows=1170 width=46)
   ->  Foreign Scan on ft2  (cost=100.00..148.03 rows=1170 width=46)
   ->  Seq Scan on child3  (cost=0.00..25.00 rows=1200 width=46)

This was done mainly to provide an unambiguous place to attach "Remote SQL"
fields, but it is useful for inherited updates even when no foreign tables
are involved.

Shigeru Hanada and Etsuro Fujita, reviewed by Ashutosh Bapat and Kyotaro
Horiguchi, some additional hacking by me
2015-03-22 13:53:21 -04:00
Tom Lane 8abb3cda0d Use the typcache to cache constraints for domain types.
Previously, we cached domain constraints for the life of a query, or
really for the life of the FmgrInfo struct that was used to invoke
domain_in() or domain_check().  But plpgsql (and probably other places)
are set up to cache such FmgrInfos for the whole lifespan of a session,
which meant they could be enforcing really stale sets of constraints.
On the other hand, searching pg_constraint once per query gets kind of
expensive too: testing says that as much as half the runtime of a
trivial query such as "SELECT 0::domaintype" went into that.

To fix this, delegate the responsibility for tracking a domain's
constraints to the typcache, which has the infrastructure needed to
detect syscache invalidation events that signal possible changes.
This not only removes unnecessary repeat reads of pg_constraint,
but ensures that we never apply stale constraint data: whatever we
use is the current data according to syscache rules.

Unfortunately, the current configuration of the system catalogs means
we have to flush cached domain-constraint data whenever either pg_type
or pg_constraint changes, which happens rather a lot (eg, creation or
deletion of a temp table will do it).  It might be worth rearranging
things to split pg_constraint into two catalogs, of which the domain
constraint one would probably be very low-traffic.  That's a job for
another patch though, and in any case this patch should improve matters
materially even with that handicap.

This patch makes use of the recently-added memory context reset callback
feature to manage the lifespan of domain constraint caches, so that we
don't risk deleting a cache that might be in the midst of evaluation.

Although this is a bug fix as well as a performance improvement, no
back-patch.  There haven't been many if any field complaints about
stale domain constraint checks, so it doesn't seem worth taking the
risk of modifying data structures as basic as MemoryContexts in back
branches.
2015-03-01 14:06:55 -05:00
Bruce Momjian 4baaf863ec Update copyright for 2015
Backpatch certain files through 9.0
2015-01-06 11:43:47 -05:00
Tom Lane 447770404c Rearrange CustomScan API.
Make it work more like FDW plans do: instead of assuming that there are
expressions in a CustomScan plan node that the core code doesn't know
about, insist that all subexpressions that need planner attention be in
a "custom_exprs" list in the Plan representation.  (Of course, the
custom plugin can break the list apart again at executor initialization.)
This lets us revert the parts of the patch that exposed setrefs.c and
subselect.c processing to the outside world.

Also revert the GetSpecialCustomVar stuff in ruleutils.c; that concept
may work in future, but it's far from fully baked right now.
2014-11-21 18:21:46 -05:00
Tom Lane adbfab119b Remove dead code supporting mark/restore in SeqScan, TidScan, ValuesScan.
There seems no prospect that any of this will ever be useful, and indeed
it's questionable whether some of it would work if it ever got called;
it's certainly not been exercised in a very long time, if ever. So let's
get rid of it, and make the comments about mark/restore in execAmi.c less
wishy-washy.

The mark/restore support for Result nodes is also currently dead code,
but that's due to planner limitations not because it's impossible that
it could be useful.  So I left it in.
2014-11-20 20:20:54 -05:00
Tom Lane a34fa8ee7c Initial code review for CustomScan patch.
Get rid of the pernicious entanglement between planner and executor headers
introduced by commit 0b03e5951b.

Also, rearrange the CustomFoo struct/typedef definitions so that all the
typedef names are seen as used by the compiler.  Without this pgindent
will mess things up a bit, which is not so important perhaps, but it also
removes a bizarre discrepancy between the declaration arrangement used for
CustomExecMethods and that used for CustomScanMethods and
CustomPathMethods.

Clean up the commentary around ExecSupportsMarkRestore to reflect the
rather large change in its API.

Const-ify register_custom_path_provider's argument.  This necessitates
casting away const in the function, but that seems better than forcing
callers of the function to do so (or else not const-ify their method
pointer structs, which was sort of the whole point).

De-export fix_expr_common.  I don't like the exporting of fix_scan_expr
or replace_nestloop_params either, but this one surely has got little
excuse.
2014-11-20 18:36:07 -05:00
Tom Lane bf7ca15875 Ensure that RowExprs and whole-row Vars produce the expected column names.
At one time it wasn't terribly important what column names were associated
with the fields of a composite Datum, but since the introduction of
operations like row_to_json(), it's important that looking up the rowtype
ID embedded in the Datum returns the column names that users would expect.
That did not work terribly well before this patch: you could get the column
names of the underlying table, or column aliases from any level of the
query, depending on minor details of the plan tree.  You could even get
totally empty field names, which is disastrous for cases like row_to_json().

To fix this for whole-row Vars, look to the RTE referenced by the Var, and
make sure its column aliases are applied to the rowtype associated with
the result Datums.  This is a tad scary because we might have to return
a transient RECORD type even though the Var is declared as having some
named rowtype.  In principle it should be all right because the record
type will still be physically compatible with the named rowtype; but
I had to weaken one Assert in ExecEvalConvertRowtype, and there might be
third-party code containing similar assumptions.

Similarly, RowExprs have to be willing to override the column names coming
from a named composite result type and produce a RECORD when the column
aliases visible at the site of the RowExpr differ from the underlying
table's column names.

In passing, revert the decision made in commit 398f70ec07 to add
an alias-list argument to ExecTypeFromExprList: better to provide that
functionality in a separate function.  This also reverts most of the code
changes in d685814835, which we don't need because we're no longer
depending on the tupdesc found in the child plan node's result slot to be
blessed.

Back-patch to 9.4, but not earlier, since this solution changes the results
in some cases that users might not have realized were buggy.  We'll apply a
more restricted form of this patch in older branches.
2014-11-10 15:21:09 -05:00
Robert Haas 0b03e5951b Introduce custom path and scan providers.
This allows extension modules to define their own methods for
scanning a relation, and get the core code to use them.  It's
unclear as yet how much use this capability will find, but we
won't find out if we never commit it.

KaiGai Kohei, reviewed at various times and in various levels
of detail by Shigeru Hanada, Tom Lane, Andres Freund, Álvaro
Herrera, and myself.
2014-11-07 17:34:36 -05:00
Alvaro Herrera df630b0dd5 Implement SKIP LOCKED for row-level locks
This clause changes the behavior of SELECT locking clauses in the
presence of locked rows: instead of causing a process to block waiting
for the locks held by other processes (or raise an error, with NOWAIT),
SKIP LOCKED makes the new reader skip over such rows.  While this is not
appropriate behavior for general purposes, there are some cases in which
it is useful, such as queue-like tables.

Catalog version bumped because this patch changes the representation of
stored rules.

Reviewed by Craig Ringer (based on a previous attempt at an
implementation by Simon Riggs, who also provided input on the syntax
used in the current patch), David Rowley, and Álvaro Herrera.

Author: Thomas Munro
2014-10-07 17:23:34 -03:00
Tom Lane 45b0f35723 Avoid leaking memory while evaluating arguments for a table function.
ExecMakeTableFunctionResult evaluated the arguments for a function-in-FROM
in the query-lifespan memory context.  This is insignificant in simple
cases where the function relation is scanned only once; but if the function
is in a sub-SELECT or is on the inside of a nested loop, any memory
consumed during argument evaluation can add up quickly.  (The potential for
trouble here had been foreseen long ago, per existing comments; but we'd
not previously seen a complaint from the field about it.)  To fix, create
an additional temporary context just for this purpose.

Per an example from MauMau.  Back-patch to all active branches.
2014-06-19 22:14:26 -04:00
Tom Lane 8f889b1083 Implement UPDATE tab SET (col1,col2,...) = (SELECT ...), ...
This SQL-standard feature allows a sub-SELECT yielding multiple columns
(but only one row) to be used to compute the new values of several columns
to be updated.  While the same results can be had with an independent
sub-SELECT per column, such a workaround can require a great deal of
duplicated computation.

The standard actually says that the source for a multi-column assignment
could be any row-valued expression.  The implementation used here is
tightly tied to our existing sub-SELECT support and can't handle other
cases; the Bison grammar would have some issues with them too.  However,
I don't feel too bad about this since other cases can be converted into
sub-SELECTs.  For instance, "SET (a,b,c) = row_valued_function(x)" could
be written "SET (a,b,c) = (SELECT * FROM row_valued_function(x))".
2014-06-18 13:22:34 -04:00
Bruce Momjian 0a78320057 pgindent run for 9.4
This includes removing tabs after periods in C comments, which was
applied to back branches, so this change should not effect backpatching.
2014-05-06 12:12:18 -04:00
Tom Lane a9d9acbf21 Create infrastructure for moving-aggregate optimization.
Until now, when executing an aggregate function as a window function
within a window with moving frame start (that is, any frame start mode
except UNBOUNDED PRECEDING), we had to recalculate the aggregate from
scratch each time the frame head moved.  This patch allows an aggregate
definition to include an alternate "moving aggregate" implementation
that includes an inverse transition function for removing rows from
the aggregate's running state.  As long as this can be done successfully,
runtime is proportional to the total number of input rows, rather than
to the number of input rows times the average frame length.

This commit includes the core infrastructure, documentation, and regression
tests using user-defined aggregates.  Follow-on commits will update some
of the built-in aggregates to use this feature.

David Rowley and Florian Pflug, reviewed by Dean Rasheed; additional
hacking by me
2014-04-12 12:03:30 -04:00
Robert Haas 2bb1f14b89 Make bitmap heap scans show exact/lossy block info in EXPLAIN ANALYZE.
Etsuro Fujita
2014-01-13 14:42:16 -05:00
Bruce Momjian 7e04792a1c Update copyright for 2014
Update all files in head, and files COPYRIGHT and legal.sgml in all back
branches.
2014-01-07 16:05:30 -05:00
Tom Lane 8d65da1f01 Support ordered-set (WITHIN GROUP) aggregates.
This patch introduces generic support for ordered-set and hypothetical-set
aggregate functions, as well as implementations of the instances defined in
SQL:2008 (percentile_cont(), percentile_disc(), rank(), dense_rank(),
percent_rank(), cume_dist()).  We also added mode() though it is not in the
spec, as well as versions of percentile_cont() and percentile_disc() that
can compute multiple percentile values in one pass over the data.

Unlike the original submission, this patch puts full control of the sorting
process in the hands of the aggregate's support functions.  To allow the
support functions to find out how they're supposed to sort, a new API
function AggGetAggref() is added to nodeAgg.c.  This allows retrieval of
the aggregate call's Aggref node, which may have other uses beyond the
immediate need.  There is also support for ordered-set aggregates to
install cleanup callback functions, so that they can be sure that
infrastructure such as tuplesort objects gets cleaned up.

In passing, make some fixes in the recently-added support for variadic
aggregates, and make some editorial adjustments in the recent FILTER
additions for aggregates.  Also, simplify use of IsBinaryCoercible() by
allowing it to succeed whenever the target type is ANY or ANYELEMENT.
It was inconsistent that it dealt with other polymorphic target types
but not these.

Atri Sharma and Andrew Gierth; reviewed by Pavel Stehule and Vik Fearing,
and rather heavily editorialized upon by Tom Lane
2013-12-23 16:11:35 -05:00
Tom Lane 784e762e88 Support multi-argument UNNEST(), and TABLE() syntax for multiple functions.
This patch adds the ability to write TABLE( function1(), function2(), ...)
as a single FROM-clause entry.  The result is the concatenation of the
first row from each function, followed by the second row from each
function, etc; with NULLs inserted if any function produces fewer rows than
others.  This is believed to be a much more useful behavior than what
Postgres currently does with multiple SRFs in a SELECT list.

This syntax also provides a reasonable way to combine use of column
definition lists with WITH ORDINALITY: put the column definition list
inside TABLE(), where it's clear that it doesn't control the ordinality
column as well.

Also implement SQL-compliant multiple-argument UNNEST(), by turning
UNNEST(a,b,c) into TABLE(unnest(a), unnest(b), unnest(c)).

The SQL standard specifies TABLE() with only a single function, not
multiple functions, and it seems to require an implicit UNNEST() which is
not what this patch does.  There may be something wrong with that reading
of the spec, though, because if it's right then the spec's TABLE() is just
a pointless alternative spelling of UNNEST().  After further review of
that, we might choose to adopt a different syntax for what this patch does,
but in any case this functionality seems clearly worthwhile.

Andrew Gierth, reviewed by Zoltán Böszörményi and Heikki Linnakangas, and
significantly revised by me
2013-11-21 19:37:20 -05:00
Peter Eisentraut 001e114b8d Fix whitespace issues found by git diff --check, add gitattributes
Set per file type attributes in .gitattributes to fine-tune whitespace
checks.  With the associated cleanups, the tree is now clean for git
2013-11-10 14:48:29 -05:00
Greg Stark c62736cc37 Add SQL Standard WITH ORDINALITY support for UNNEST (and any other SRF)
Author: Andrew Gierth, David Fetter
Reviewers: Dean Rasheed, Jeevan Chalke, Stephen Frost
2013-07-29 16:38:01 +01:00
Stephen Frost 4cbe3ac3e8 WITH CHECK OPTION support for auto-updatable VIEWs
For simple views which are automatically updatable, this patch allows
the user to specify what level of checking should be done on records
being inserted or updated.  For 'LOCAL CHECK', new tuples are validated
against the conditionals of the view they are being inserted into, while
for 'CASCADED CHECK' the new tuples are validated against the
conditionals for all views involved (from the top down).

This option is part of the SQL specification.

Dean Rasheed, reviewed by Pavel Stehule
2013-07-18 17:10:16 -04:00
Noah Misch b560ec1b0d Implement the FILTER clause for aggregate function calls.
This is SQL-standard with a few extensions, namely support for
subqueries and outer references in clause expressions.

catversion bump due to change in Aggref and WindowFunc.

David Fetter, reviewed by Dean Rasheed.
2013-07-16 20:15:36 -04:00
Tom Lane 21734d2fb8 Support writable foreign tables.
This patch adds the core-system infrastructure needed to support updates
on foreign tables, and extends contrib/postgres_fdw to allow updates
against remote Postgres servers.  There's still a great deal of room for
improvement in optimization of remote updates, but at least there's basic
functionality there now.

KaiGai Kohei, reviewed by Alexander Korotkov and Laurenz Albe, and rather
heavily revised by Tom Lane.
2013-03-10 14:16:02 -04:00
Alvaro Herrera 0ac5ad5134 Improve concurrency of foreign key locking
This patch introduces two additional lock modes for tuples: "SELECT FOR
KEY SHARE" and "SELECT FOR NO KEY UPDATE".  These don't block each
other, in contrast with already existing "SELECT FOR SHARE" and "SELECT
FOR UPDATE".  UPDATE commands that do not modify the values stored in
the columns that are part of the key of the tuple now grab a SELECT FOR
NO KEY UPDATE lock on the tuple, allowing them to proceed concurrently
with tuple locks of the FOR KEY SHARE variety.

Foreign key triggers now use FOR KEY SHARE instead of FOR SHARE; this
means the concurrency improvement applies to them, which is the whole
point of this patch.

The added tuple lock semantics require some rejiggering of the multixact
module, so that the locking level that each transaction is holding can
be stored alongside its Xid.  Also, multixacts now need to persist
across server restarts and crashes, because they can now represent not
only tuple locks, but also tuple updates.  This means we need more
careful tracking of lifetime of pg_multixact SLRU files; since they now
persist longer, we require more infrastructure to figure out when they
can be removed.  pg_upgrade also needs to be careful to copy
pg_multixact files over from the old server to the new, or at least part
of multixact.c state, depending on the versions of the old and new
servers.

Tuple time qualification rules (HeapTupleSatisfies routines) need to be
careful not to consider tuples with the "is multi" infomask bit set as
being only locked; they might need to look up MultiXact values (i.e.
possibly do pg_multixact I/O) to find out the Xid that updated a tuple,
whereas they previously were assured to only use information readily
available from the tuple header.  This is considered acceptable, because
the extra I/O would involve cases that would previously cause some
commands to block waiting for concurrent transactions to finish.

Another important change is the fact that locking tuples that have
previously been updated causes the future versions to be marked as
locked, too; this is essential for correctness of foreign key checks.
This causes additional WAL-logging, also (there was previously a single
WAL record for a locked tuple; now there are as many as updated copies
of the tuple there exist.)

With all this in place, contention related to tuples being checked by
foreign key rules should be much reduced.

As a bonus, the old behavior that a subtransaction grabbing a stronger
tuple lock than the parent (sub)transaction held on a given tuple and
later aborting caused the weaker lock to be lost, has been fixed.

Many new spec files were added for isolation tester framework, to ensure
overall behavior is sane.  There's probably room for several more tests.

There were several reviewers of this patch; in particular, Noah Misch
and Andres Freund spent considerable time in it.  Original idea for the
patch came from Simon Riggs, after a problem report by Joel Jacobson.
Most code is from me, with contributions from Marti Raudsepp, Alexander
Shulgin, Noah Misch and Andres Freund.

This patch was discussed in several pgsql-hackers threads; the most
important start at the following message-ids:
	AANLkTimo9XVcEzfiBR-ut3KVNDkjm2Vxh+t8kAmWjPuv@mail.gmail.com
	1290721684-sup-3951@alvh.no-ip.org
	1294953201-sup-2099@alvh.no-ip.org
	1320343602-sup-2290@alvh.no-ip.org
	1339690386-sup-8927@alvh.no-ip.org
	4FE5FF020200002500048A3D@gw.wicourts.gov
	4FEAB90A0200002500048B7D@gw.wicourts.gov
2013-01-23 12:04:59 -03:00
Bruce Momjian bd61a623ac Update copyrights for 2013
Fully update git head, and update back branches in ./COPYRIGHT and
legal.sgml files.
2013-01-01 17:15:01 -05:00
Robert Haas 7a2fe9bd03 Basic binary heap implementation.
There are probably other places where this can be used, but for now,
this just makes MergeAppend use it, so that this code will have test
coverage.  There is other work in the queue that will use this, as
well.

Abhijit Menon-Sen, reviewed by Andres Freund, Robert Haas, Álvaro
Herrera, Tom Lane, and others.
2012-11-29 11:16:59 -05:00
Tom Lane 8e617e29aa Fix whole-row Var evaluation to cope with resjunk columns (again).
When a whole-row Var is reading the result of a subquery, we need it to
ignore any "resjunk" columns that the subquery might have evaluated for
GROUP BY or ORDER BY purposes.  We've hacked this area before, in commit
68e40998d0, but that fix only covered
whole-row Vars of named composite types, not those of RECORD type; and it
was mighty klugy anyway, since it just assumed without checking that any
extra columns in the result must be resjunk.  A proper fix requires getting
hold of the subquery's targetlist so we can actually see which columns are
resjunk (whereupon we can use a JunkFilter to get rid of them).  So bite
the bullet and add some infrastructure to make that possible.

Per report from Andrew Dunstan and additional testing by Merlin Moncure.
Back-patch to all supported branches.  In 8.3, also back-patch commit
292176a118, which for some reason I had
not done at the time, but it's a prerequisite for this change.
2012-07-20 13:10:58 -04:00
Tom Lane d14241c2cf Fix memory leak in ARRAY(SELECT ...) subqueries.
Repeated execution of an uncorrelated ARRAY_SUBLINK sub-select (which
I think can only happen if the sub-select is embedded in a larger,
correlated subquery) would leak memory for the duration of the query,
due to not reclaiming the array generated in the previous execution.
Per bug #6698 from Armando Miraglia.  Diagnosis and fix idea by Heikki,
patch itself by me.

This has been like this all along, so back-patch to all supported versions.
2012-06-21 17:27:19 -04:00
Bruce Momjian 927d61eeff Run pgindent on 9.2 source tree in preparation for first 9.3
commit-fest.
2012-06-10 15:20:04 -04:00
Tom Lane 9dbf2b7d75 Restructure SELECT INTO's parsetree representation into CreateTableAsStmt.
Making this operation look like a utility statement seems generally a good
idea, and particularly so in light of the desire to provide command
triggers for utility statements.  The original choice of representing it as
SELECT with an IntoClause appendage had metastasized into rather a lot of
places, unfortunately, so that this patch is a great deal more complicated
than one might at first expect.

In particular, keeping EXPLAIN working for SELECT INTO and CREATE TABLE AS
subcommands required restructuring some EXPLAIN-related APIs.  Add-on code
that calls ExplainOnePlan or ExplainOneUtility, or uses
ExplainOneQuery_hook, will need adjustment.

Also, the cases PREPARE ... SELECT INTO and CREATE RULE ... SELECT INTO,
which formerly were accepted though undocumented, are no longer accepted.
The PREPARE case can be replaced with use of CREATE TABLE AS EXECUTE.
The CREATE RULE case doesn't seem to have much real-world use (since the
rule would work only once before failing with "table already exists"),
so we'll not bother with that one.

Both SELECT INTO and CREATE TABLE AS still return a command tag of
"SELECT nnnn".  There was some discussion of returning "CREATE TABLE nnnn",
but for the moment backwards compatibility wins the day.

Andres Freund and Tom Lane
2012-03-19 21:38:12 -04:00
Robert Haas 9f9135d129 Instrument index-only scans to count heap fetches performed.
Patch by me; review by Tom Lane, Jeff Davis, and Peter Geoghegan.
2012-01-25 20:41:52 -05:00
Bruce Momjian e126958c2e Update copyright notices for year 2012. 2012-01-01 18:01:58 -05:00
Tom Lane c6e3ac11b6 Create a "sort support" interface API for faster sorting.
This patch creates an API whereby a btree index opclass can optionally
provide non-SQL-callable support functions for sorting.  In the initial
patch, we only use this to provide a directly-callable comparator function,
which can be invoked with a bit less overhead than the traditional
SQL-callable comparator.  While that should be of value in itself, the real
reason for doing this is to provide a datatype-extensible framework for
more aggressive optimizations, as in Peter Geoghegan's recent work.

Robert Haas and Tom Lane
2011-12-07 00:19:39 -05:00