Commit Graph

3017 Commits

Author SHA1 Message Date
Tom Lane b5bce6c1ec Final pgindent + perltidy run for 9.6. 2016-08-15 13:42:51 -04:00
Tom Lane ed0097e4f9 Add SQL-accessible functions for inspecting index AM properties.
Per discussion, we should provide such functions to replace the lost
ability to discover AM properties by inspecting pg_am (cf commit
65c5fcd35).  The added functionality is also meant to displace any code
that was looking directly at pg_index.indoption, since we'd rather not
believe that the bit meanings in that field are part of any client API
contract.

As future-proofing, define the SQL API to not assume that properties that
are currently AM-wide or index-wide will remain so unless they logically
must be; instead, expose them only when inquiring about a specific index
or even specific index column.  Also provide the ability for an index
AM to override the behavior.

In passing, document pg_am.amtype, overlooked in commit 473b93287.

Andrew Gierth, with kibitzing by me and others

Discussion: <87mvl5on7n.fsf@news-spur.riddles.org.uk>
2016-08-13 18:31:14 -04:00
Tom Lane 4b234fd8bf Fix inappropriate printing of never-measured times in EXPLAIN.
EXPLAIN (ANALYZE, TIMING OFF) would print an elapsed time of zero for
a trigger function, because no measurement has been taken but it printed
the field anyway.  This isn't what EXPLAIN does elsewhere, so suppress it.

In the same vein, EXPLAIN (ANALYZE, BUFFERS) with non-text output format
would print buffer I/O timing numbers even when no measurement has been
taken because track_io_timing is off.  That seems not per policy, either,
so change it.

Back-patch to 9.2 where these features were introduced.

Maksim Milyutin

Discussion: <081c0540-ecaa-bd29-3fd2-6358f3b359a9@postgrespro.ru>
2016-08-12 12:13:04 -04:00
Tom Lane 95bee941be Fix misestimation of n_distinct for a nearly-unique column with many nulls.
If ANALYZE found no repeated non-null entries in its sample, it set the
column's stadistinct value to -1.0, intending to indicate that the entries
are all distinct.  But what this value actually means is that the number
of distinct values is 100% of the table's rowcount, and thus it was
overestimating the number of distinct values by however many nulls there
are.  This could lead to very poor selectivity estimates, as for example
in a recent report from Andreas Joseph Krogh.  We should discount the
stadistinct value by whatever we've estimated the nulls fraction to be.
(That is what will happen if we choose to use a negative stadistinct for
a column that does have repeated entries, so this code path was just
inconsistent.)

In addition to fixing the stadistinct entries stored by several different
ANALYZE code paths, adjust the logic where get_variable_numdistinct()
forces an "all distinct" estimate on the basis of finding a relevant unique
index.  Unique indexes don't reject nulls, so there's no reason to assume
that the null fraction doesn't apply.

Back-patch to all supported branches.  Back-patching is a bit of a judgment
call, but this problem seems to affect only a few users (else we'd have
identified it long ago), and it's bad enough when it does happen that
destabilizing plan choices in a worse direction seems unlikely.

Patch by me, with documentation wording suggested by Dean Rasheed

Report: <VisenaEmail.26.df42f82acae38a58.156463942b8@tc7-visena>
Discussion: <16143.1470350371@sss.pgh.pa.us>
2016-08-07 18:52:02 -04:00
Tom Lane 9ee1cf04ab Fix TOAST access failure in RETURNING queries.
Discussion of commit 3e2f3c2e4 exposed a problem that is of longer
standing: since we don't detoast data while sticking it into a portal's
holdStore for PORTAL_ONE_RETURNING and PORTAL_UTIL_SELECT queries, and we
release the query's snapshot as soon as we're done loading the holdStore,
later readout of the holdStore can do TOAST fetches against data that can
no longer be seen by any of the session's live snapshots.  This means that
a concurrent VACUUM could remove the TOAST data before we can fetch it.
Commit 3e2f3c2e4 exposed the problem by showing that sometimes we had *no*
live snapshots while fetching TOAST data, but we'd be at risk anyway.

I believe this code was all right when written, because our management of a
session's exposed xmin was such that the TOAST references were safe until
end of transaction.  But that's no longer true now that we can advance or
clear our PGXACT.xmin intra-transaction.

To fix, copy the query's snapshot during FillPortalStore() and save it in
the Portal; release it only when the portal is dropped.  This essentially
implements a policy that we must hold a relevant snapshot whenever we
access potentially-toasted data.  We had already come to that conclusion
in other places, cf commits 08e261cbc9 and ec543db77b.

I'd have liked to add a regression test case for this, but I didn't see
a way to make one that's not unreasonably bloated; it seems to require
returning a toasted value to the client, and those will be big.

In passing, improve PortalRunUtility() so that it positively verifies
that its ending PopActiveSnapshot() call will pop the expected snapshot,
removing a rather shaky assumption about which utility commands might
do their own PopActiveSnapshot().  There's no known bug here, but now
that we're actively referencing the snapshot it's almost free to make
this code a bit more bulletproof.

We might want to consider back-patching something like this into older
branches, but it would be prudent to let it prove itself more in HEAD
beforehand.

Discussion: <87vazemeda.fsf@credativ.de>
2016-08-07 17:46:08 -04:00
Peter Eisentraut 40fcfec82c Message style improvements 2016-07-25 22:07:44 -04:00
Kevin Grittner 1c15aac53f Add comment & docs about no vacuum truncation with sto.
Omission noted by Andres Freund.
2016-07-19 16:25:53 -05:00
Andres Freund eca0f1db14 Clear all-frozen visibilitymap status when locking tuples.
Since a892234 & fd31cd265 the visibilitymap's freeze bit is used to
avoid vacuuming the whole relation in anti-wraparound vacuums. Doing so
correctly relies on not adding xids to the heap without also unsetting
the visibilitymap flag.  Tuple locking related code has not done so.

To allow selectively resetting all-frozen - to avoid pessimizing
heap_lock_tuple - allow to selectively reset the all-frozen with
visibilitymap_clear(). To avoid having to use
visibilitymap_get_status (e.g. via VM_ALL_FROZEN) inside a critical
section, have visibilitymap_clear() return whether any bits have been
reset.

There's a remaining issue (denoted by XXX): After the PageIsAllVisible()
check in heap_lock_tuple() and heap_lock_updated_tuple_rec() the page
status could theoretically change. Practically that currently seems
impossible, because updaters will hold a page level pin already.  Due to
the next beta coming up, it seems better to get the required WAL magic
bump done before resolving this issue.

The added flags field fields to xl_heap_lock and xl_heap_lock_updated
require bumping the WAL magic. Since there's already been a catversion
bump since the last beta, that's not an issue.

Reviewed-By: Robert Haas, Amit Kapila and Andres Freund
Author: Masahiko Sawada, heavily revised by Andres Freund
Discussion: CAEepm=3fWAbWryVW9swHyLTY4sXVf0xbLvXqOwUoDiNCx9mBjQ@mail.gmail.com
Backpatch: -
2016-07-18 02:01:13 -07:00
Tom Lane 4d042999f9 Print a given subplan only once in EXPLAIN.
We have, for a very long time, allowed the same subplan (same member of the
PlannedStmt.subplans list) to be referenced by more than one SubPlan node;
this avoids problems for cases such as subplans within an IndexScan's
indxqual and indxqualorig fields.  However, EXPLAIN had not gotten the memo
and would print each reference as though it were an independent identical
subplan.  To fix, track plan_ids of subplans we've printed and don't print
the same plan_id twice.  Per report from Pavel Stehule.

BTW: the particular case of IndexScan didn't cause visible duplication
in a plain EXPLAIN, only EXPLAIN ANALYZE, because in the former case we
short-circuit executor startup before the indxqual field is processed by
ExecInitExpr.  That seems like it could easily lead to other EXPLAIN
problems in future, but it's not clear how to avoid it without breaking
the "EXPLAIN a plan using hypothetical indexes" use-case.  For now I've
left that issue alone.

Although this is a longstanding bug, it's purely cosmetic (no great harm
is done by the repeat printout) and we haven't had field complaints before.
So I'm hesitant to back-patch it, especially since there is some small risk
of ABI problems due to the need to add a new field to ExplainState.

In passing, rearrange order of fields in ExplainState to be less random,
and update some obsolete comments about when/where to initialize them.

Report: <CAFj8pRAimq+NK-menjt+3J4-LFoodDD8Or6=Lc_stcFD+eD4DA@mail.gmail.com>
2016-07-11 18:14:29 -04:00
Robert Haas 10c0558ffe Fix several mistakes around parallel workers and client_encoding.
Previously, workers sent data to the leader using the client encoding.
That mostly worked, but the leader the converted the data back to the
server encoding.  Since not all encoding conversions are reversible,
that could provoke failures.  Fix by using the database encoding for
all communication between worker and leader.

Also, while temporary changes to GUC settings, as from the SET clause
of a function, are in general OK for parallel query, changing
client_encoding this way inside of a parallel worker is not OK.
Previously, that would have confused the leader; with these changes,
it would not confuse the leader, but it wouldn't do anything either.
So refuse such changes in parallel workers.

Also, the previous code naively assumed that when it received a
NotifyResonse from the worker, it could pass that directly back to the
user.  But now that worker-to-leader communication always uses the
database encoding, that's clearly no longer correct - though,
actually, the old way was always broken for V2 clients.  So
disassemble and reconstitute the message instead.

Issues reported by Peter Eisentraut.  Patch by me, reviewed by
Peter Eisentraut.
2016-06-30 18:35:32 -04:00
Tom Lane 8ebb69f854 Fix some infelicities in EXPLAIN output for parallel query plans.
In non-text output formats, parallelized aggregates were reporting
"Partial" or "Finalize" as a field named "Operation", which might be all
right in the absence of any context --- but other plan node types use that
field to report SQL-visible semantics, such as Select/Insert/Update/Delete.
So that naming choice didn't seem good to me.  I changed it to "Partial
Mode".

Also, the field did not appear at all for a non-parallelized Agg plan node,
which is contrary to expectation in non-text formats.  We're notionally
producing objects that conform to a schema, so the set of fields for a
given node type and EXPLAIN mode should be well-defined.  I set it up to
fill in "Simple" in such cases.

Other fields that were added for parallel query, namely "Parallel Aware"
and Gather's "Single Copy", had not gotten the word on that point either.
Make them appear always in non-text output.

Also, the latter two fields were nominally producing boolean output, but
were getting it wrong, because bool values shouldn't be quoted in JSON or
YAML.  Somehow we'd not needed an ExplainPropertyBool formatting subroutine
before 9.6; but now we do, so invent it.

Discussion: <16002.1466972724@sss.pgh.pa.us>
2016-06-29 18:51:20 -04:00
Tom Lane 874fe3aea1 Fix CREATE MATVIEW/CREATE TABLE AS ... WITH NO DATA to not plan the query.
Previously, these commands always planned the given query and went through
executor startup before deciding not to actually run the query if WITH NO
DATA is specified.  This behavior is problematic for pg_dump because it
may cause errors to be raised that we would rather not see before a
REFRESH MATERIALIZED VIEW command is issued.  See for example bug #13907
from Marian Krucina.  This change is not sufficient to fix that particular
bug, because we also need to tweak pg_dump to issue the REFRESH later,
but it's a necessary step on the way.

A user-visible side effect of doing things this way is that the returned
command tag for WITH NO DATA cases will now be "CREATE MATERIALIZED VIEW"
or "CREATE TABLE AS", not "SELECT 0".  We could preserve the old behavior
but it would take more code, and arguably that was just an implementation
artifact not intended behavior anyhow.

In 9.5 and HEAD, also get rid of the static variable CreateAsReladdr, which
was trouble waiting to happen; there is not any prohibition on nested
CREATE commands.

Back-patch to 9.3 where CREATE MATERIALIZED VIEW was introduced.

Michael Paquier and Tom Lane

Report: <20160202161407.2778.24659@wrigleys.postgresql.org>
2016-06-27 15:57:50 -04:00
Tom Lane 19e972d558 Rethink node-level representation of partial-aggregation modes.
The original coding had three separate booleans representing partial
aggregation behavior, which was confusing, unreadable, and error-prone,
not least because the booleans weren't always listed in the same order.
It was also inadequate for the allegedly-desirable future extension to
support intermediate partial aggregation, because we'd need separate
markers for serialization and deserialization in such a case.

Merge these bools into an enum "AggSplit" to provide symbolic names for
the supported operating modes (and document what those are).  By assigning
the values of the enum constants carefully, we can treat AggSplit values
as options bitmasks so that tests of what to do aren't noticeably more
expensive than before.

While at it, get rid of Aggref.aggoutputtype.  That's not needed since
commit 59a3795c2 got rid of setrefs.c's special-purpose Aggref comparison
code, and it likewise seemed more confusing than helpful.

Assorted comment cleanup as well (there's still more that I want to do
in that line).

catversion bump for change in Aggref node contents.  Should be the last
one for partial-aggregation changes.

Discussion: <29309.1466699160@sss.pgh.pa.us>
2016-06-26 14:33:38 -04:00
Tom Lane f8ace5477e Fix type-safety problem with parallel aggregate serial/deserialization.
The original specification for this called for the deserialization function
to have signature "deserialize(serialtype) returns transtype", which is a
security violation if transtype is INTERNAL (which it always would be in
practice) and serialtype is not (which ditto).  The patch blithely overrode
the opr_sanity check for that, which was sloppy-enough work in itself,
but the indisputable reason this cannot be allowed to stand is that CREATE
FUNCTION will reject such a signature and thus it'd be impossible for
extensions to create parallelizable aggregates.

The minimum fix to make the signature type-safe is to add a second, dummy
argument of type INTERNAL.  But to lock it down a bit more and make misuse
of INTERNAL-accepting functions less likely, let's get rid of the ability
to specify a "serialtype" for an aggregate and just say that the only
useful serialtype is BYTEA --- which, in practice, is the only interesting
value anyway, due to the usefulness of the send/recv infrastructure for
this purpose.  That means we only have to allow "serialize(internal)
returns bytea" and "deserialize(bytea, internal) returns internal" as
the signatures for these support functions.

In passing fix bogus signature of int4_avg_combine, which I found thanks
to adding an opr_sanity check on combinefunc signatures.

catversion bump due to removing pg_aggregate.aggserialtype and adjusting
signatures of assorted built-in functions.

David Rowley and Tom Lane

Discussion: <27247.1466185504@sss.pgh.pa.us>
2016-06-22 16:52:41 -04:00
Robert Haas ede62e56fb Add VACUUM (DISABLE_PAGE_SKIPPING) for emergencies.
If you really want to vacuum every single page in the relation,
regardless of apparent visibility status or anything else, you can use
this option.  In previous releases, this behavior could be achieved
using VACUUM (FREEZE), but because we can now recognize all-frozen
pages as not needing to be frozen again, that no longer works.  There
should be no need for routine use of this option, but maybe bugs or
disaster recovery will necessitate its use.

Patch by me, reviewed by Andres Freund.
2016-06-17 15:48:57 -04:00
Robert Haas 38e9f90a22 Fix lazy_scan_heap so that it won't mark pages all-frozen too soon.
Commit a892234f83 added a new bit per
page to the visibility map fork indicating whether the page is
all-frozen, but incorrectly assumed that if lazy_scan_heap chose to
freeze a tuple then that tuple would not need to later be frozen
again. This turns out to be false, because xmin and xmax (and
conceivably xvac, if dealing with tuples from very old releases) could
be frozen at separate times.

Thanks to Andres Freund for help in uncovering and tracking down this
issue.
2016-06-15 14:30:06 -04:00
Robert Haas 4bc424b968 pgindent run for 9.6 2016-06-09 18:02:36 -04:00
Peter Eisentraut 5c6d2a5e7c Message style and wording fixes 2016-06-07 14:18:55 -04:00
Tom Lane f64340e743 Don't reset changes_since_analyze after a selective-columns ANALYZE.
If we ANALYZE only selected columns of a table, we should not postpone
auto-analyze because of that; other columns may well still need stats
updates.  As committed, the counter is left alone if a column list is
given, whether or not it includes all analyzable columns of the table.
Per complaint from Tomasz Ostrowski.

It's been like this a long time, so back-patch to all supported branches.

Report: <ef99c1bd-ff60-5f32-2733-c7b504eb960c@ato.waw.pl>
2016-06-06 17:44:17 -04:00
Robert Haas c6dbf1fe79 Stop the executor if no more tuples can be sent from worker to leader.
If a Gather node has read as many tuples as it needs (for example, due
to Limit) it may detach the queue connecting it to the worker before
reading all of the worker's tuples.  Rather than let the worker
continue to generate and send all of the results, have it stop after
sending the next tuple.

More could be done here to stop the worker even quicker, but this is
about as well as we can hope to do for 9.6.

This is in response to a problem report from Andreas Seltenreich.
Commit 44339b892a should be actually be
sufficient to fix that example even without this change, but it seems
better to do this, too, since we might otherwise waste quite a large
amount of effort in one or more workers.

Discussion: CAA4eK1KOKGqmz9bGu+Z42qhRwMbm4R5rfnqsLCNqFs9j14jzEA@mail.gmail.com

Amit Kapila
2016-06-06 14:52:58 -04:00
Robert Haas 6436a853f1 Fix comment to be more accurate.
Now that we skip vacuuming all-frozen pages, this comment needs
updating.

Masahiko Sawada
2016-06-03 11:56:57 -04:00
Greg Stark e1623c3959 Fix various common mispellings.
Mostly these are just comments but there are a few in documentation
and a handful in code and tests. Hopefully this doesn't cause too much
unnecessary pain for backpatching. I relented from some of the most
common like "thru" for that reason. The rest don't seem numerous
enough to cause problems.

Thanks to Kevin Lyda's tool https://pypi.python.org/pypi/misspellings
2016-06-03 16:08:45 +01:00
Robert Haas fdfaccfa79 Cosmetic improvements to freeze map code.
Per post-commit review comments from Andres Freund, improve variable
names, comments, and in one place, slightly improve the code structure.

Masahiko Sawada
2016-06-03 08:43:41 -04:00
Tom Lane 83dbde94f7 Fix DROP ACCESS METHOD IF EXISTS.
The IF EXISTS option was documented, and implemented in the grammar, but
it didn't actually work for lack of support in does_not_exist_skipping().
Per bug #14160.

Report and patch by Kouhei Sutou

Report: <20160527070433.19424.81712@wrigleys.postgresql.org>
2016-05-27 11:03:18 -04:00
Tom Lane 2d2e40e3be Fetch XIDs atomically during vac_truncate_clog().
Because vac_update_datfrozenxid() updates datfrozenxid and datminmxid
in-place, it's unsafe to assume that successive reads of those values will
give consistent results.  Fetch each one just once to ensure sane behavior
in the minimum calculation.  Noted while reviewing Alexander Korotkov's
patch in the same area.

Discussion: <8564.1464116473@sss.pgh.pa.us>
2016-05-24 15:47:51 -04:00
Tom Lane 996d273978 Avoid consuming an XID during vac_truncate_clog().
vac_truncate_clog() uses its own transaction ID as the comparison point in
a sanity check that no database's datfrozenxid has already wrapped around
"into the future".  That was probably fine when written, but in a lazy
vacuum we won't have assigned an XID, so calling GetCurrentTransactionId()
causes an XID to be assigned when otherwise one would not be.  Most of the
time that's not a big problem ... but if we are hard up against the
wraparound limit, consuming XIDs during antiwraparound vacuums is a very
bad thing.

Instead, use ReadNewTransactionId(), which not only avoids this problem
but is in itself a better comparison point to test whether wraparound
has already occurred.

Report and patch by Alexander Korotkov.  Back-patch to all versions.

Report: <CAPpHfdspOkmiQsxh-UZw2chM6dRMwXAJGEmmbmqYR=yvM7-s6A@mail.gmail.com>
2016-05-24 15:20:36 -04:00
Stephen Frost a89505fd21 Remove various special checks around default roles
Default roles really should be like regular roles, for the most part.
This removes a number of checks that were trying to make default roles
extra special by not allowing them to be used as regular roles.

We still prevent users from creating roles in the "pg_" namespace or
from altering roles which exist in that namespace via ALTER ROLE, as
we can't preserve such changes, but otherwise the roles are very much
like regular roles.

Based on discussion with Robert and Tom.
2016-05-06 14:06:50 -04:00
Kevin Grittner a343e223a5 Revert no-op changes to BufferGetPage()
The reverted changes were intended to force a choice of whether any
newly-added BufferGetPage() calls needed to be accompanied by a
test of the snapshot age, to support the "snapshot too old"
feature.  Such an accompanying test is needed in about 7% of the
cases, where the page is being used as part of a scan rather than
positioning for other purposes (such as DML or vacuuming).  The
additional effort required for back-patching, and the doubt whether
the intended benefit would really be there, have indicated it is
best just to rely on developers to do the right thing based on
comments and existing usage, as we do with many other conventions.

This change should have little or no effect on generated executable
code.

Motivated by the back-patching pain of Tom Lane and Robert Haas
2016-04-20 08:31:19 -05:00
Tom Lane c34df8a003 Disallow creation of indexes on system columns (except for OID).
Although OID acts pretty much like user data, the other system columns do
not, so an index on one would likely misbehave.  And it's pretty hard to
see a use-case for one, anyway.  Let's just forbid the case rather than
worry about whether it should be supported.

David Rowley
2016-04-16 12:11:41 -04:00
Tom Lane 8f1911d5e6 Fix possible crash in ALTER TABLE ... REPLICA IDENTITY USING INDEX.
Careless coding added by commit 07cacba983 could result in a crash
or a bizarre error message if someone tried to select an index on the
OID column as the replica identity index for a table.  Back-patch to 9.4
where the feature was introduced.

Discussion: CAKJS1f8TQYgTRDyF1_u9PVCKWRWz+DkieH=U7954HeHVPJKaKg@mail.gmail.com

David Rowley
2016-04-15 12:11:40 -04:00
Robert Haas 5702277ca9 Tweak EXPLAIN for parallel query to show workers launched.
The previous display was sort of confusing, because it didn't
distinguish between the number of workers that we planned to launch
and the number that actually got launched.  This has already confused
several people, so display both numbers and label them clearly.

Julien Rouhaud, reviewed by me.
2016-04-15 11:52:18 -04:00
Tom Lane 92a30a7eb0 Fix broken dependency-mongering for index operator classes/families.
For a long time, opclasscmds.c explained that "we do not create a
dependency link to the AM [for an opclass or opfamily], because we don't
currently support DROP ACCESS METHOD".  Commit 473b932870 invented
DROP ACCESS METHOD, but it batted only 1 for 2 on adding the dependency
links, and 0 for 2 on updating the comments about the topic.

In passing, undo the same commit's entirely inappropriate decision to
blow away an existing index as a side-effect of create_am.sql.
2016-04-13 23:33:31 -04:00
Stephen Frost bfed4ab824 Disallow SET SESSION AUTHORIZATION pg_*
As part of reserving the pg_* namespace for default roles and in line
with SET ROLE and other previous efforts, disallow settings the role
to a default/reserved role using SET SESSION AUTHORIZATION.

These checks and restrictions on what is allowed regarding default /
reserved roles are under debate, but it seems prudent to ensure that
the existing checks at least cover the intended cases while the
debate rages on.  On me to clean it up if the consensus decision is
to remove these checks.
2016-04-13 21:31:24 -04:00
Alvaro Herrera bd905a0d04 Fix possible NULL dereference in ExecAlterObjectDependsStmt
I used the wrong variable here.  Doesn't make a difference today because
the only plausible caller passes a non-NULL variable, but someday it
will be wrong, and even today's correctness is subtle: the caller that
does pass a NULL is never invoked because of object type constraints.
Surely not a condition to rely on.

Noted by Coverity
2016-04-10 11:03:35 -03:00
Stephen Frost 293007898d Reserve the "pg_" namespace for roles
This will prevent users from creating roles which begin with "pg_" and
will check for those roles before allowing an upgrade using pg_upgrade.

This will allow for default roles to be provided at initdb time.

Reviews by José Luis Tallón and Robert Haas
2016-04-08 16:56:27 -04:00
Kevin Grittner 848ef42bb8 Add the "snapshot too old" feature
This feature is controlled by a new old_snapshot_threshold GUC.  A
value of -1 disables the feature, and that is the default.  The
value of 0 is just intended for testing.  Above that it is the
number of minutes a snapshot can reach before pruning and vacuum
are allowed to remove dead tuples which the snapshot would
otherwise protect.  The xmin associated with a transaction ID does
still protect dead tuples.  A connection which is using an "old"
snapshot does not get an error unless it accesses a page modified
recently enough that it might not be able to produce accurate
results.

This is similar to the Oracle feature, and we use the same SQLSTATE
and error message for compatibility.
2016-04-08 14:36:30 -05:00
Kevin Grittner 8b65cf4c5e Modify BufferGetPage() to prepare for "snapshot too old" feature
This patch is a no-op patch which is intended to reduce the chances
of failures of omission once the functional part of the "snapshot
too old" patch goes in.  It adds parameters for snapshot, relation,
and an enum to specify whether the snapshot age check needs to be
done for the page at this point.  This initial patch passes NULL
for the first two new parameters and BGP_NO_SNAPSHOT_TEST for the
third.  The follow-on patch will change the places where the test
needs to be made.
2016-04-08 14:30:10 -05:00
Teodor Sigaev 8b99edefca Revert CREATE INDEX ... INCLUDING ...
It's not ready yet, revert two commits
690c543550 - unstable test output
386e3d7609 - patch itself
2016-04-08 21:52:13 +03:00
Teodor Sigaev 386e3d7609 CREATE INDEX ... INCLUDING (column[, ...])
Now indexes (but only B-tree for now) can contain "extra" column(s) which
doesn't participate in index structure, they are just stored in leaf
tuples. It allows to use index only scan by using single index instead
of two or more indexes.

Author: Anastasia Lubennikova with minor editorializing by me
Reviewers: David Rowley, Peter Geoghegan, Jeff Janes
2016-04-08 19:45:59 +03:00
Peter Eisentraut 339025c68f Replace printf format %i by %d
see also ce8d7bb644
2016-04-08 12:42:58 -04:00
Tom Lane 93c301fc4f Fix multiple bugs in tablespace symlink removal.
Don't try to examine S_ISLNK(st.st_mode) after a failed lstat().
It's undefined.

Also, if the lstat() reported ENOENT, we do not wish that to be a hard
error, but the code might nonetheless treat it as one (giving an entirely
misleading error message, too) depending on luck-of-the-draw as to what
S_ISLNK() returned.

Don't throw error for ENOENT from rmdir(), either.  (We're not really
expecting ENOENT because we just stat'd the file successfully; but
if we're going to allow ENOENT in the symlink code path, surely the
directory code path should too.)

Generate an appropriate errcode for its-the-wrong-type-of-file complaints.
(ERRCODE_SYSTEM_ERROR doesn't seem appropriate, and failing to write
errcode() around it certainly doesn't work, and not writing an errcode
at all is not per project policy.)

Valgrind noticed the undefined S_ISLNK result; the other problems emerged
while reading the code in the area.

All of this appears to have been introduced in 8f15f74a44.
Back-patch to 9.5 where that commit appeared.
2016-04-08 12:31:53 -04:00
Tom Lane de94e2af18 Run pgindent on a batch of (mostly-planner-related) source files.
Getting annoyed at the amount of unrelated chatter I get from pgindent'ing
Rowley's unique-joins patch.  Re-indent all the files it touches.
2016-04-06 11:34:02 -04:00
Alvaro Herrera f2fcad27d5 Support ALTER THING .. DEPENDS ON EXTENSION
This introduces a new dependency type which marks an object as depending
on an extension, such that if the extension is dropped, the object
automatically goes away; and also, if the database is dumped, the object
is included in the dump output.  Currently the grammar supports this for
indexes, triggers, materialized views and functions only, although the
utility code is generic so adding support for more object types is a
matter of touching the parser rules only.

Author: Abhijit Menon-Sen
Reviewed-by: Alexander Korotkov, Álvaro Herrera
Discussion: http://www.postgresql.org/message-id/20160115062649.GA5068@toroid.org
2016-04-05 18:38:54 -03:00
Robert Haas 41ea0c2376 Fix parallel-safety code for parallel aggregation.
has_parallel_hazard() was ignoring the proparallel markings for
aggregates, which is no good.  Fix that.  There was no way to mark
an aggregate as actually being parallel-safe, either, so add a
PARALLEL option to CREATE AGGREGATE.

Patch by me, reviewed by David Rowley.
2016-04-05 16:06:15 -04:00
Tom Lane 3c69b33f45 Add a few comments about ANALYZE's strategy for collecting MCVs.
Alex Shulgin complained that the underlying strategy wasn't all that
apparent, particularly not the fact that we intentionally have two
code paths depending on whether we think the column has a limited set
of possible values or not.  Try to make it clearer.
2016-04-04 17:06:33 -04:00
Tom Lane 391159e03a Partially revert commit 3d3bf62f30.
On reflection, the pre-existing logic in ANALYZE is specifically meant to
compare the frequency of a candidate MCV against the estimated frequency of
a random distinct value across the whole table.  The change to compare it
against the average frequency of values actually seen in the sample doesn't
seem very principled, and if anything it would make us less likely not more
likely to consider a value an MCV.  So revert that, but keep the aspect of
considering only nonnull values, which definitely is correct.

In passing, rename the local variables in these stanzas to
"ndistinct_table", to avoid confusion with the "ndistinct" that appears at
an outer scope in compute_scalar_stats.
2016-04-04 16:48:13 -04:00
Tom Lane 3d3bf62f30 Omit null rows when setting the threshold for what's a most-common value.
As with the previous patch, large numbers of null rows could skew this
calculation unfavorably, causing us to discard values that have a
legitimate claim to be MCVs, since our definition of MCV is that it's
most common among the non-null population of the column.  Hence, make
the numerator of avgcount be the number of non-null sample values not
the number of sample rows; likewise for maxmincount in the
compute_scalar_stats variant.

Also, make the denominator be the number of distinct values actually
observed in the sample, rather than reversing it back out of the computed
stadistinct.  This avoids depending on the accuracy of the Haas-Stokes
approximation, and really it's what we want anyway; the threshold should
depend only on what we see in the sample, not on what we extrapolate
about the contents of the whole column.

Alex Shulgin, reviewed by Tomas Vondra and myself
2016-04-01 17:03:27 -04:00
Tom Lane be4b4dc759 Omit null rows when applying the Haas-Stokes estimator for ndistinct.
Previously, we included null rows in the values of n and N that went
into the formula, which amounts to considering null as a value in its
own right; but the d and f1 values do not include nulls.  This is
inconsistent, and it contributes to significant underestimation of
ndistinct when the column is mostly nulls.  In any case stadistinct
is defined as the number of distinct non-null values, so we should
exclude nulls when doing this computation.

This is an aboriginal bug in our application of the Haas-Stokes formula,
but we'll refrain from back-patching for fear of destabilizing plan
choices in released branches.

While at it, make the code a bit more readable by omitting unnecessary
casts and intermediate variables.

Observation and original patch by Tomas Vondra, adjusted to fix both
uses of the formula by Alex Shulgin, cosmetic improvements by me
2016-04-01 15:48:24 -04:00
Alvaro Herrera f402b99501 Type names should not be quoted
Our actual convention, contrary to what I said in 59a2111b23, is not to
quote type names, as evidenced by unquoted use of format_type_be()
result value in error messages.  Remove quotes from recently tweaked
messages accordingly.

Per note from Tom Lane
2016-04-01 13:35:48 -03:00
Robert Haas 5fe5a2cee9 Allow aggregate transition states to be serialized and deserialized.
This is necessary infrastructure for supporting parallel aggregation
for aggregates whose transition type is "internal".  Such values
can't be passed between cooperating processes, because they are
just pointers.

David Rowley, reviewed by Tomas Vondra and by me.
2016-03-29 15:04:05 -04:00