Commit Graph

15390 Commits

Author SHA1 Message Date
Andres Freund 920218cbc0 Improve errhint() about replication slot naming restrictions.
The existing hint talked about "may only contain letters", but the
actual requirement is more strict: only lower case letters are allowed.

Reported-By: Rushabh Lathia
Author: Rushabh Lathia
Discussion: AGPqQf2x50qcwbYOBKzb4x75sO_V3g81ZsA8+Ji9iN5t_khFhQ@mail.gmail.com
Backpatch: 9.4-, where replication slots were added
2015-10-03 15:29:08 +02:00
Andres Freund ad22783792 Fix several bugs related to ON CONFLICT's EXCLUDED pseudo relation.
Four related issues:

1) attnos/varnos/resnos for EXCLUDED were out of sync when a column
   after one dropped in the underlying relation was referenced.
2) References to whole-row variables (i.e. EXCLUDED.*) lead to errors.
3) It was possible to reference system columns in the EXCLUDED pseudo
   relations, even though they would not have valid contents.
4) References to EXCLUDED were rewritten by the RLS machinery, as
   EXCLUDED was treated as if it were the underlying relation.

To fix the first two issues, generate the excluded targetlist with
dropped columns in mind and add an entry for whole row
variables. Instead of unconditionally adding a wholerow entry we could
pull up the expression if needed, but doing it unconditionally seems
simpler. The wholerow entry is only really needed for ruleutils/EXPLAIN
support anyway.

The remaining two issues are addressed by changing the EXCLUDED RTE to
have relkind = composite. That fits with EXCLUDED not actually being a
real relation, and allows to treat it differently in the relevant
places. scanRTEForColumn now skips looking up system columns when the
RTE has a composite relkind; fireRIRrules() already had a corresponding
check, thereby preventing RLS expansion on EXCLUDED.

Also add tests for these issues, and improve a few comments around
excluded handling in setrefs.c.

Reported-By: Peter Geoghegan, Geoff Winkless
Author: Andres Freund, Amit Langote, Peter Geoghegan
Discussion: CAEzk6fdzJ3xYQZGbcuYM2rBd2BuDkUksmK=mY9UYYDugg_GgZg@mail.gmail.com,
   CAM3SWZS+CauzbiCEcg-GdE6K6ycHE_Bz6Ksszy8AoixcMHOmsA@mail.gmail.com
Backpatch: 9.5, where ON CONFLICT was introduced
2015-10-03 15:12:10 +02:00
Tom Lane 2e8cfcf4ea Add recursion depth protection to LIKE matching.
Since MatchText() recurses, it could in principle be driven to stack
overflow, although quite a long pattern would be needed.
2015-10-02 15:00:51 -04:00
Tom Lane b63fc28776 Add recursion depth protections to regular expression matching.
Some of the functions in regex compilation and execution recurse, and
therefore could in principle be driven to stack overflow.  The Tcl crew
has seen this happen in practice in duptraverse(), though their fix was
to put in a hard-wired limit on the number of recursive levels, which is
not too appetizing --- fortunately, we have enough infrastructure to check
the actually available stack.  Greg Stark has also seen it in other places
while fuzz testing on a machine with limited stack space.  Let's put guards
in to prevent crashes in all these places.

Since the regex code would leak memory if we simply threw elog(ERROR),
we have to introduce an API that checks for stack depth without throwing
such an error.  Fortunately that's not difficult.
2015-10-02 14:51:58 -04:00
Tom Lane f2c4ffc330 Fix potential infinite loop in regular expression execution.
In cfindloop(), if the initial call to shortest() reports that a
zero-length match is possible at the current search start point, but then
it is unable to construct any actual match to that, it'll just loop around
with the same start point, and thus make no progress.  We need to force the
start point to be advanced.  This is safe because the loop over "begin"
points has already tried and failed to match starting at "close", so there
is surely no need to try that again.

This bug was introduced in commit e2bd904955,
wherein we allowed continued searching after we'd run out of match
possibilities, but evidently failed to think hard enough about exactly
where we needed to search next.

Because of the way this code works, such a match failure is only possible
in the presence of backrefs --- otherwise, shortest()'s judgment that a
match is possible should always be correct.  That probably explains how
come the bug has escaped detection for several years.

The actual fix is a one-liner, but I took the trouble to add/improve some
comments related to the loop logic.

After fixing that, the submitted test case "()*\1" didn't loop anymore.
But it reported failure, though it seems like it ought to match a
zero-length string; both Tcl and Perl think it does.  That seems to be from
overenthusiastic optimization on my part when I rewrote the iteration match
logic in commit 173e29aa5deefd9e71c183583ba37805c8102a72: we can't just
"declare victory" for a zero-length match without bothering to set match
data for capturing parens inside the iterator node.

Per fuzz testing by Greg Stark.  The first part of this is a bug in all
supported branches, and the second part is a bug since 9.2 where the
iteration rewrite happened.
2015-10-02 14:26:36 -04:00
Tom Lane 9fe8fe9c9e Add some more query-cancel checks to regular expression matching.
Commit 9662143f0c added infrastructure to
allow regular-expression operations to be terminated early in the event
of SIGINT etc.  However, fuzz testing by Greg Stark disclosed that there
are still cases where regex compilation could run for a long time without
noticing a cancel request.  Specifically, the fixempties() phase never
adds new states, only new arcs, so it doesn't hit the cancel check I'd put
in newstate().  Add one to newarc() as well to cover that.

Some experimentation of my own found that regex execution could also run
for a long time despite a pending cancel.  We'd put a high-level cancel
check into cdissect(), but there was none inside the core text-matching
routines longest() and shortest().  Ordinarily those inner loops are very
very fast ... but in the presence of lookahead constraints, not so much.
As a compromise, stick a cancel check into the stateset cache-miss
function, which is enough to guarantee a cancel check at least once per
lookahead constraint test.

Making this work required more attention to error handling throughout the
regex executor.  Henry Spencer had apparently originally intended longest()
and shortest() to be incapable of incurring errors while running, so
neither they nor their subroutines had well-defined error reporting
behaviors.  However, that was already broken by the lookahead constraint
feature, since lacon() can surely suffer an out-of-memory failure ---
which, in the code as it stood, might never be reported to the user at all,
but just silently be treated as a non-match of the lookahead constraint.
Normalize all that by inserting explicit error tests as needed.  I took the
opportunity to add some more comments to the code, too.

Back-patch to all supported branches, like the previous patch.
2015-10-02 13:45:39 -04:00
Alvaro Herrera e06b2e1d2e Don't disable commit_ts in standby if enabled locally
Bug noticed by Fujii Masao
2015-10-02 12:49:01 -03:00
Peter Eisentraut 87c2b517ac Fix message punctuation according to style guide 2015-10-01 21:39:56 -04:00
Alvaro Herrera f12e814b88 Fix commit_ts for standby
Module initialization was still not completely correct after commit
6b61955135, per crash report from Takashi Ohnishi.  To fix, instead of
trying to monkey around with the value of the GUC setting directly, add
a separate boolean flag that enables the feature on a standby, but only
for the startup (recovery) process, when it sees that its master server
has the feature enabled.
Discussion: http://www.postgresql.org/message-id/ca44c6c7f9314868bdc521aea4f77cbf@MP-MSGSS-MBX004.msg.nttdata.co.jp

Also change the deactivation routine to delete all segment files rather
than leaving the last one around.  (This doesn't need separate
WAL-logging, because on recovery we execute the same deactivation
routine anyway.)

In passing, clean up the code structure somewhat, particularly so that
xlog.c doesn't know so much about when to activate/deactivate the
feature.

Thanks to Fujii Masao for testing and Petr Jelínek for off-list discussion.

Back-patch to 9.5, where commit_ts was introduced.
2015-10-01 15:06:55 -03:00
Tom Lane 21995d3f6d Fix documentation error in commit 8703059c6b.
Etsuro Fujita spotted a thinko in the README commentary.
2015-10-01 10:32:11 -04:00
Robert Haas 286a3a68dc Fix readfuncs/outfuncs problems in last night's Gather patch.
KaiGai Kohei, with one correction by me.
2015-10-01 09:19:26 -04:00
Tom Lane 5884b92a84 Fix errors in commit a04bb65f70.
Not a lot of commentary needed here really.
2015-09-30 23:37:26 -04:00
Tom Lane 07e4d03fb4 Improve LISTEN startup time when there are many unread notifications.
If some existing listener is far behind, incoming new listener sessions
would start from that session's read pointer and then need to advance over
many already-committed notification messages, which they have no interest
in.  This was expensive in itself and also thrashed the pg_notify SLRU
buffers a lot more than necessary.  We can improve matters considerably
in typical scenarios, without much added cost, by starting from the
furthest-ahead read pointer, not the furthest-behind one.  We do have to
consider only sessions in our own database when doing this, which requires
an extra field in the data structure, but that's a pretty small cost.

Back-patch to 9.0 where the current LISTEN/NOTIFY logic was introduced.

Matt Newell, slightly adjusted by me
2015-09-30 23:32:43 -04:00
Robert Haas 3bd909b220 Add a Gather executor node.
A Gather executor node runs any number of copies of a plan in an equal
number of workers and merges all of the results into a single tuple
stream.  It can also run the plan itself, if the workers are
unavailable or haven't started up yet.  It is intended to work with
the Partial Seq Scan node which will be added in future commits.

It could also be used to implement parallel query of a different sort
by itself, without help from Partial Seq Scan, if the single_copy mode
is used.  In that mode, a worker executes the plan, and the parallel
leader does not, merely collecting the worker's results.  So, a Gather
node could be inserted into a plan to split the execution of that plan
across two processes.  Nested Gather nodes aren't currently supported,
but we might want to add support for that in the future.

There's nothing in the planner to actually generate Gather nodes yet,
so it's not quite time to break out the champagne.  But we're getting
close.

Amit Kapila.  Some designs suggestions were provided by me, and I also
reviewed the patch.  Single-copy mode, documentation, and other minor
changes also by me.
2015-09-30 19:23:36 -04:00
Robert Haas 227d57f358 Don't dump core when destroying an unused ParallelContext.
If a transaction or subtransaction creates a ParallelContext but ends
without calling InitializeParallelDSM, the previous code would
seg fault.  Fix that.
2015-09-30 18:36:31 -04:00
Stephen Frost 7d8db3e8f3 Include policies based on ACLs needed
When considering which policies should be included, rather than look at
individual bits of the query (eg: if a RETURNING clause exists, or if a
WHERE clause exists which is referencing the table, or if it's a
FOR SHARE/UPDATE query), consider any case where we've determined
the user needs SELECT rights on the relation while doing an UPDATE or
DELETE to be a case where we apply SELECT policies, and any case where
we've deteremind that the user needs UPDATE rights on the relation while
doing a SELECT to be a case where we apply UPDATE policies.

This simplifies the logic and addresses concerns that a user could use
UPDATE or DELETE with a WHERE clauses to determine if rows exist, or
they could use SELECT .. FOR UPDATE to lock rows which they are not
actually allowed to modify through UPDATE policies.

Use list_append_unique() to avoid adding the same quals multiple times,
as, on balance, the cost of checking when adding the quals will almost
always be cheaper than keeping them and doing busywork for each tuple
during execution.

Back-patch to 9.5 where RLS was added.
2015-09-30 07:39:24 -04:00
Tom Lane 6057f61b4d Small improvements in comments in async.c.
We seem to have lost a line somewhere along the way in the comment block
that discusses async.c's locks, because it suddenly refers to "both locks"
without previously having mentioned more than one.  Add a sentence to make
that read more sanely.  Also, refer to the "pos of the slowest backend"
not the "tail of the slowest backend", since we have no per-backend value
called "tail".
2015-09-29 22:07:16 -04:00
Alvaro Herrera 6b61955135 Code review for transaction commit timestamps
There are three main changes here:

1. No longer cause a start failure in a standby if the feature is
disabled in postgresql.conf but enabled in the master.  This reverts one
part of commit 4f3924d9cd43; what we keep is the ability of the standby
to activate/deactivate the module (which includes creating and removing
segments as appropriate) during replay of such actions in the master.

2. Replay WAL records affecting commitTS even if the feature is
disabled.  This means the standby will always have the same state as the
master after replay.

3. Have COMMIT PREPARE record the transaction commit time as well.  We
were previously only applying it in the normal transaction commit path.

Author: Petr Jelínek
Discussion: http://www.postgresql.org/message-id/CAHGQGwHereDzzzmfxEBYcVQu3oZv6vZcgu1TPeERWbDc+gQ06g@mail.gmail.com
Discussion: http://www.postgresql.org/message-id/CAHGQGwFuzfO4JscM9LCAmCDCxp_MfLvN4QdB+xWsS-FijbjTYQ@mail.gmail.com

Additionally, I cleaned up nearby code related to replication origins,
which I found a bit hard to follow, and fixed a couple of typos.

Backpatch to 9.5, where this code was introduced.

Per bug reports from Fujii Masao and subsequent discussion.
2015-09-29 14:40:56 -03:00
Robert Haas 758fcfdc01 Comment update for join pushdown.
Etsuro Fujita
2015-09-29 07:42:30 -04:00
Robert Haas d1b7c1ffe7 Parallel executor support.
This code provides infrastructure for a parallel leader to start up
parallel workers to execute subtrees of the plan tree being executed
in the master.  User-supplied parameters from ParamListInfo are passed
down, but PARAM_EXEC parameters are not.  Various other constructs,
such as initplans, subplans, and CTEs, are also not currently shared.
Nevertheless, there's enough here to support a basic implementation of
parallel query, and we can lift some of the current restrictions as
needed.

Amit Kapila and Robert Haas
2015-09-28 21:55:57 -04:00
Alvaro Herrera 17f5831c81 Fix "sesssion" typo
It was introduced alongside replication origins, by commit
5aa2350426, so backpatch to 9.5.

Pointed out by Fujii Masao
2015-09-28 19:13:42 -03:00
Alvaro Herrera 590e2d12f0 COPY: use pg_plan_query() instead of planner()
While at it, trim the includes list in copy.c.  The planner headers
cannot be removed, but there are a few others that are not of any use.
2015-09-28 15:14:08 -03:00
Andres Freund 617db3a2d8 Fix ON CONFLICT DO UPDATE for tables with oids.
When taking the UPDATE path in an INSERT .. ON CONFLICT .. UPDATE tables
with oids were not supported. The tuple generated by the update target
list was projected without space for an oid - a simple oversight.

Reported-By: Peter Geoghegan
Author: Andres Freund
Backpatch: 9.5, where ON CONFLICT was introduced
2015-09-28 19:29:44 +02:00
Robert Haas f40792a93c Use LOCKBIT_ON() instead of a bit shift in a few places.
We do this mostly everywhere, so it seems just as well to do it here,
too.

Thomas Munro
2015-09-28 10:57:15 -04:00
Andres Freund aa29c1ccd9 Remove legacy multixact truncation support.
In 9.5 and master there is no need to support legacy truncation. This is
just committed separately to make it easier to backpatch the WAL logged
multixact truncation to 9.3 and 9.4 if we later decide to do so.

I bumped master's magic from 0xD086 to 0xD088 and 9.5's from 0xD085 to
0xD087 to avoid 9.5 reusing a value that has been in use on master while
keeping the numbers increasing between major versions.

Discussion: 20150621192409.GA4797@alap3.anarazel.de
Backpatch: 9.5
2015-09-26 19:04:25 +02:00
Andres Freund 4f627f8973 Rework the way multixact truncations work.
The fact that multixact truncations are not WAL logged has caused a fair
share of problems. Amongst others it requires to do computations during
recovery while the database is not in a consistent state, delaying
truncations till checkpoints, and handling members being truncated, but
offset not.

We tried to put bandaids on lots of these issues over the last years,
but it seems time to change course. Thus this patch introduces WAL
logging for multixact truncations.

This allows:
1) to perform the truncation directly during VACUUM, instead of delaying it
   to the checkpoint.
2) to avoid looking at the offsets SLRU for truncation during recovery,
   we can just use the master's values.
3) simplify a fair amount of logic to keep in memory limits straight,
   this has gotten much easier

During the course of fixing this a bunch of additional bugs had to be
fixed:
1) Data was not purged from memory the member's SLRU before deleting
   segments. This happened to be hard or impossible to hit due to the
   interlock between checkpoints and truncation.
2) find_multixact_start() relied on SimpleLruDoesPhysicalPageExist - but
   that doesn't work for offsets that haven't yet been flushed to
   disk. Add code to flush the SLRUs to fix. Not pretty, but it feels
   slightly safer to only make decisions based on actual on-disk state.
3) find_multixact_start() could be called concurrently with a truncation
   and thus fail. Via SetOffsetVacuumLimit() that could lead to a round
   of emergency vacuuming. The problem remains in
   pg_get_multixact_members(), but that's quite harmless.

For now this is going to only get applied to 9.5+, leaving the issues in
the older branches in place. It is quite possible that we need to
backpatch at a later point though.

For the case this gets backpatched we need to handle that an updated
standby may be replaying WAL from a not-yet upgraded primary. We have to
recognize that situation and use "old style" truncation (i.e. looking at
the SLRUs) during WAL replay. In contrast to before, this now happens in
the startup process, when replaying a checkpoint record, instead of the
checkpointer. Doing truncation in the restartpoint is incorrect, they
can happen much later than the original checkpoint, thereby leading to
wraparound.  To avoid "multixact_redo: unknown op code 48" errors
standbys would have to be upgraded before primaries.

A later patch will bump the WAL page magic, and remove the legacy
truncation codepaths. Legacy truncation support is just included to make
a possible future backpatch easier.

Discussion: 20150621192409.GA4797@alap3.anarazel.de
Reviewed-By: Robert Haas, Alvaro Herrera, Thomas Munro
Backpatch: 9.5 for now
2015-09-26 19:04:25 +02:00
Tom Lane 2abfd9d5e9 Second try at fixing O(N^2) problem in foreign key references.
This replaces ill-fated commit 5ddc72887a,
which was reverted because it broke active uses of FK cache entries.  In
this patch, we still do nothing more to invalidatable cache entries than
mark them as needing revalidation, so we won't break active uses.  To keep
down the overhead of InvalidateConstraintCacheCallBack(), keep a list of
just the currently-valid cache entries.  (The entries are large enough that
some added space for list links doesn't seem like a big problem.)  This
would still be O(N^2) when there are many valid entries, though, so when
the list gets too long, just force the "sinval reset" behavior to remove
everything from the list.  I set the threshold at 1000 entries, somewhat
arbitrarily.  Possibly that could be fine-tuned later.  Another item for
future study is whether it's worth adding reference counting so that we
could safely remove invalidated entries.  As-is, problem cases are likely
to end up with large and mostly invalid FK caches.

Like the previous attempt, backpatch to 9.3.

Jan Wieck and Tom Lane
2015-09-25 13:16:30 -04:00
Tom Lane 39df0f150c Allow planner to use expression-index stats for function calls in WHERE.
Previously, a function call appearing at the top level of WHERE had a
hard-wired selectivity estimate of 0.3333333, a kludge conveniently dated
in the source code itself to July 1992.  The expectation at the time was
that somebody would soon implement estimator support functions analogous
to those for operators; but no such code has appeared, nor does it seem
likely to in the near future.  We do have an alternative solution though,
at least for immutable functions on single relations: creating an
expression index on the function call will allow ANALYZE to gather stats
about the function's selectivity.  But the code in clause_selectivity()
failed to make use of such data even if it exists.

Refactor so that that will happen.  I chose to make it try this technique
for any clause type for which clause_selectivity() doesn't have a special
case, not just functions.  To avoid adding unnecessary overhead in the
common case where we don't learn anything new, make selfuncs.c provide an
API that hooks directly to examine_variable() and then var_eq_const(),
rather than the previous coding which laboriously constructed an OpExpr
only so that it could be expensively deconstructed again.

I preserved the behavior that the default estimate for a function call
is 0.3333333.  (For any other expression node type, it's 0.5, as before.)
I had originally thought to make the default be 0.5 across the board, but
changing a default estimate that's survived for twenty-three years seems
like something not to do without a lot more testing than I care to put
into it right now.

Per a complaint from Jehan-Guillaume de Rorthais.  Back-patch into 9.5,
but not further, at least for the moment.
2015-09-24 18:35:46 -04:00
Robert Haas 9f1255ac85 Don't zero opfuncid when reading nodes.
The comments here stated that this was just in case we ever had an
ALTER OPERATOR command that could remap an operator to a different
function.  But those comments have been here for a long time, and no
such command has come about.  In the absence of such a feature,
forcing the pg_proc OID to be looked up again each time we reread a
stored rule or similar is just a waste of cycles.  Moreover, parallel
query needs a way to reread the exact same node tree that was written
out, not one that has been slightly stomped on.  So just get rid of
this for now.

Per discussion with Tom Lane.
2015-09-24 11:36:29 -04:00
Andres Freund 020235a575 Lower *_freeze_max_age minimum values.
The old minimum values are rather large, making it time consuming to
test related behaviour. Additionally the current limits, especially for
multixacts, can be problematic in space-constrained systems. 10000000
multixacts can contain a lot of members.

Since there's no good reason for the current limits, lower them a good
bit. Setting them to 0 would be a bad idea, triggering endless vacuums,
so still retain a limit.

While at it fix autovacuum_multixact_freeze_max_age to refer to
multixact.c instead of varsup.c.

Reviewed-By: Robert Haas
Discussion: CA+TgmoYmQPHcrc3GSs7vwvrbTkbcGD9Gik=OztbDGGrovkkEzQ@mail.gmail.com
Backpatch: back to 9.0 (in parts)
2015-09-24 14:53:32 +02:00
Tom Lane 82e1ba7fd6 Make ANALYZE compute basic statistics even for types with no "=" operator.
Previously, ANALYZE simply ignored columns of datatypes that have neither
a btree nor hash opclass (which means they have no recognized equality
operator).  Without a notion of equality, we can't identify most-common
values nor estimate the number of distinct values.  But we can still
count nulls and compute the average physical column width, and those
stats might be of value.  Moreover there are some tools out there that
don't work so well if rows are missing from pg_statistic.  So let's
add suitable logic for this case.

While this is arguably a bug fix, it also has the potential to change
query plans, and the gain seems not worth taking a risk of that in
stable branches.  So back-patch into 9.5 but not further.

Oleksandr Shulgin, rewritten a bit by me.
2015-09-23 18:26:49 -04:00
Robert Haas a0d9f6e434 Add readfuncs.c support for plan nodes.
For parallel query, we need to be able to pass a Plan to a worker, so
that it knows what it's supposed to do.  We could invent our own way
of serializing plans for that purpose, but piggybacking on the
existing node infrastructure seems like a much better idea.

Initially, we'll probably only support a limited number of nodes
within parallel workers, but this commit adds support for everything
in plannodes.h except CustomScan, because doing it all at once seems
easier than doing it piecemeal, and it makes testing this code easier,
too.  CustomScan is excluded because making that work requires a
larger rework of that facility.

Amit Kapila, reviewed and slightly revised by me.
2015-09-23 11:51:50 -04:00
Robert Haas 4fe6f72bda Print a MergeJoin's mergeNullsFirst array as bool, not int.
It's declared as being an array of bool, but it's printed
differently from the way bool and arrays of bool are handled
elsewhere.

Patch by Amit Kapila.  Anomaly noted independently by Amit Kapila
and KaiGai Kohei.
2015-09-23 10:53:29 -04:00
Teodor Sigaev dc943ad952 Allow autoanalyze to add pages deleted from pending list to FSM
Commit e956808328 introduces adding pages
to FSM for ordinary insert, but autoanalyze was able just cleanup
pending list without adding to FSM.

Also fix double call of IndexFreeSpaceMapVacuum() during ginvacuumcleanup()

Report from Fujii Masao
Patch by me
Review by Jeff Janes
2015-09-23 15:33:51 +03:00
Robert Haas 262e56bcae Teach planstate_tree_walker about custom scans.
This logic was missing from ExplainPreScanNode, from which I derived
planstate_tree_walker.  But it shouldn't be missing, especially not
from a generic walker function, so add it.

KaiGai Kohei
2015-09-22 21:42:00 -04:00
Andres Freund 98d5b084d2 Correct value of LW_SHARED_MASK.
The previous wrong value lead to wrong LOCK_DEBUG output, never showing
any shared lock holders.

Reported-By: Alexander Korotkov
Discussion: CAPpHfdsPmWqz9FB0AnxJrwp1=KLF0n=-iB+QvR0Q8GSmpFVdUQ@mail.gmail.com
Backpatch: 9.5, where the bug was introduced.
2015-09-22 11:14:14 +02:00
Tom Lane 246693e5ae Fix possible internal overflow in numeric multiplication.
mul_var() postpones propagating carries until it risks overflow in its
internal digit array.  However, the logic failed to account for the
possibility of overflow in the carry propagation step, allowing wrong
results to be generated in corner cases.  We must slightly reduce the
when-to-propagate-carries threshold to avoid that.

Discovered and fixed by Dean Rasheed, with small adjustments by me.

This has been wrong since commit d72f6c7503,
so back-patch to all supported branches.
2015-09-21 12:11:32 -04:00
Noah Misch 7f11724bd6 Remove the SECURITY_ROW_LEVEL_DISABLED security context bit.
This commit's parent made superfluous the bit's sole usage.  Referential
integrity checks have long run as the subject table's owner, and that
now implies RLS bypass.  Safe use of the bit was tricky, requiring
strict control over the SQL expressions evaluating therein.  Back-patch
to 9.5, where the bit was introduced.

Based on a patch by Stephen Frost.
2015-09-20 20:47:17 -04:00
Noah Misch 537bd178c7 Remove the row_security=force GUC value.
Every query of a single ENABLE ROW SECURITY table has two meanings, with
the row_security GUC selecting between them.  With row_security=force
available, every function author would have been advised to either set
the GUC locally or test both meanings.  Non-compliance would have
threatened reliability and, for SECURITY DEFINER functions, security.
Authors already face an obligation to account for search_path, and we
should not mimic that example.  With this change, only BYPASSRLS roles
need exercise the aforementioned care.  Back-patch to 9.5, where the
row_security GUC was introduced.

Since this narrows the domain of pg_db_role_setting.setconfig and
pg_proc.proconfig, one might bump catversion.  A row_security=force
setting in one of those columns will elicit a clear message, so don't.
2015-09-20 20:45:41 -04:00
Tom Lane ba51774d87 Be more wary about partially-valid LOCALLOCK data in RemoveLocalLock().
RemoveLocalLock() must consider the possibility that LockAcquireExtended()
failed to palloc the initial space for a locallock's lockOwners array.
I had evidently meant to cope with this hazard when the code was originally
written (commit 1785acebf2), but missed that
the pfree needed to be protected with an if-test.  Just to make sure things
are left in a clean state, reset numLockOwners as well.

Per low-memory testing by Andreas Seltenreich.  Back-patch to all supported
branches.
2015-09-20 16:48:44 -04:00
Peter Eisentraut 4a1e15e4a9 Add missing serial comma 2015-09-18 22:41:42 -04:00
Peter Eisentraut f2dd10613e Remove trailing slashes from directories in find command
BSD find is not very smart and ends up writing double slashes into the
output in those cases.  Also, xgettext is not very smart and splits the
file names incorrectly in those cases, resulting in slightly incorrect
file names being written into the POT file.
2015-09-18 22:06:54 -04:00
Robert Haas 4a4e6893aa Glue layer to connect the executor to the shm_mq mechanism.
The shm_mq mechanism was built to send error (and notice) messages and
tuples between backends.  However, shm_mq itself only deals in raw
bytes.  Since commit 2bd9e412f9, we have
had infrastructure for one message to redirect protocol messages to a
queue and for another backend to parse them and do useful things with
them.  This commit introduces a somewhat analogous facility for tuples
by adding a new type of DestReceiver, DestTupleQueue, which writes
each tuple generated by a query into a shm_mq, and a new
TupleQueueFunnel facility which reads raw tuples out of the queue and
reconstructs the HeapTuple format expected by the executor.

The TupleQueueFunnel abstraction supports reading from multiple tuple
streams at the same time, but only in round-robin fashion.  Someone
could imaginably want other policies, but this should be good enough
to meet our short-term needs related to parallel query, and we can
always extend it later.

This also makes one minor addition to the shm_mq API that didn'
seem worth breaking out as a separate patch.

Extracted from Amit Kapila's parallel sequential scan patch.  This
code was originally written by me, and then it was revised by Amit,
and then it was revised some more by me.
2015-09-18 21:56:58 -04:00
Andrew Dunstan c00c3249e3 Cache argument type information in json(b) aggregate functions.
These functions have been looking up type info for every row they
process. Instead of doing that we only look them up the first time
through and stash the information in the aggregate state object.

Affects json_agg, json_object_agg, jsonb_agg and jsonb_object_agg.

There is plenty more work to do in making these more efficient,
especially the jsonb functions, but this is a virtually cost free
improvement that can be done right away.

Backpatch to 9.5 where the jsonb variants were introduced.
2015-09-18 14:39:39 -04:00
Tom Lane d9c0c728af Fix low-probability memory leak in regex execution.
After an internal failure in shortest() or longest() while pinning down the
exact location of a match, find() forgot to free the DFA structure before
returning.  This is pretty unlikely to occur, since we just successfully
ran the "search" variant of the DFA; but it could happen, and it would
result in a session-lifespan memory leak since this code uses malloc()
directly.  Problem seems to have been aboriginal in Spencer's library,
so back-patch all the way.

In passing, correct a thinko in a comment I added awhile back about the
meaning of the "ntree" field.

I happened across these issues while comparing our code to Tcl's version
of the library.
2015-09-18 13:55:17 -04:00
Teodor Sigaev d63a1720fa Add header forgotten in 213335c145
Report from Peter Eisentraut
2015-09-18 14:32:09 +03:00
Teodor Sigaev 9acb9007de Fix oversight in tsearch type check
Use IsBinaryCoercible() method instead of custom
is_expected_type/is_text_type functions which was introduced when tsearch2
was moved into core.

Per report by David E. Wheeler
Analysis by Tom Lane
Patch by me
2015-09-17 19:50:51 +03:00
Robert Haas 8dd401aa07 Add new function planstate_tree_walker.
ExplainPreScanNode knows how to iterate over a generic tree of plan
states; factor that logic out into a separate walker function so that
other code, such as upcoming patches for parallel query, can also use
it.

Patch by me, reviewed by Tom Lane.
2015-09-17 11:27:06 -04:00
Teodor Sigaev 22f519c92a Fix bug introduced by microvacuum for GiST
Commit 013ebc0a7b introduces microvacuum for
GiST, deletetion of tuple marked LP_DEAD uses IndexPageMultiDelete while
recovery code uses IndexPageTupleDelete in loop. This causes a difference
in offset numbers of tuples to delete. Patch introduces usage of
IndexPageMultiDelete in GiST except gistplacetopage() where only one tuple is
deleted at once. That also slightly improve performance, because
IndexPageMultiDelete is more effective.

Patch changes WAL format, so bump wal page magic.

Bug report from Jeff Janes
Diagnostic and patch by Anastasia Lubennikova and me
2015-09-17 14:22:37 +03:00
Robert Haas 7aea8e4f2d Determine whether it's safe to attempt a parallel plan for a query.
Commit 924bcf4f16 introduced a framework
for parallel computation in PostgreSQL that makes most but not all
built-in functions safe to execute in parallel mode.  In order to have
parallel query, we'll need to be able to determine whether that query
contains functions (either built-in or user-defined) that cannot be
safely executed in parallel mode.  This requires those functions to be
labeled, so this patch introduces an infrastructure for that.  Some
functions currently labeled as safe may need to be revised depending on
how pending issues related to heavyweight locking under paralllelism
are resolved.

Parallel plans can't be used except for the case where the query will
run to completion.  If portal execution were suspended, the parallel
mode restrictions would need to remain in effect during that time, but
that might make other queries fail.  Therefore, this patch introduces
a framework that enables consideration of parallel plans only when it
is known that the plan will be run to completion.  This probably needs
some refinement; for example, at bind time, we do not know whether a
query run via the extended protocol will be execution to completion or
run with a limited fetch count.  Having the client indicate its
intentions at bind time would constitute a wire protocol break.  Some
contexts in which parallel mode would be safe are not adjusted by this
patch; the default is not to try parallel plans except from call sites
that have been updated to say that such plans are OK.

This commit doesn't introduce any parallel paths or plans; it just
provides a way to determine whether they could potentially be used.
I'm committing it on the theory that the remaining parallel sequential
scan patches will also get committed to this release, hopefully in the
not-too-distant future.

Robert Haas and Amit Kapila.  Reviewed (in earlier versions) by Noah
Misch.
2015-09-16 15:38:47 -04:00
Tom Lane b44d92b67b Sync regex code with Tcl 8.6.4.
Sync our regex code with upstream changes since last time we did this,
which was Tcl 8.5.11 (see commit 08fd6ff37f).

The only functional change here is to disbelieve that an octal escape is
three digits long if it would exceed \377.  That's a bug fix, but it's
a minor one and could change the interpretation of working regexes, so
don't back-patch.

In addition to that, s/INFINITY/DUPINF/ to eliminate the risk of collisions
with <math.h>'s macro, and s/LOCAL/NOPROP/ because that also seems like
an unnecessarily collision-prone macro name.

There were some other cosmetic changes in their copy that I did not adopt,
notably a rather half-hearted attempt at renaming some of the C functions
in a more verbose style.  (I'm not necessarily against the concept, but
renaming just a few functions in the package is not an improvement.)
2015-09-16 15:25:25 -04:00
Stephen Frost 4f3b2a8883 Enforce ALL/SELECT policies in RETURNING for RLS
For the UPDATE/DELETE RETURNING case, filter the records which are not
visible to the user through ALL or SELECT policies from those considered
for the UPDATE or DELETE.  This is similar to how the GRANT system
works, which prevents RETURNING unless the caller has SELECT rights on
the relation.

Per discussion with Robert, Dean, Tom, and Kevin.

Back-patch to 9.5 where RLS was introduced.
2015-09-15 15:49:31 -04:00
Stephen Frost 22eaf35c1d RLS refactoring
This refactors rewrite/rowsecurity.c to simplify the handling of the
default deny case (reducing the number of places where we check for and
add the default deny policy from three to one) by splitting up the
retrival of the policies from the application of them.

This also allowed us to do away with the policy_id field.  A policy_name
field was added for WithCheckOption policies and is used in error
reporting, when available.

Patch by Dean Rasheed, with various mostly cosmetic changes by me.

Back-patch to 9.5 where RLS was introduced to avoid unnecessary
differences, since we're still in alpha, per discussion with Robert.
2015-09-15 15:49:31 -04:00
Tom Lane 3d9e8db9e5 Revert "Fix an O(N^2) problem in foreign key references".
Commit 5ddc72887a does not actually work
because it will happily blow away ri_constraint_cache entries that are
in active use in outer call levels.  In any case, it's a very ugly,
brute-force solution to the problem of limiting the cache size.
Revert until it can be redesigned.
2015-09-15 11:09:15 -04:00
Fujii Masao 10fbb79f1a Improve log messages related to tablespace_map file
This patch changes the log message which is logged when the server
successfully renames backup_label file to *.old but fails to rename
tablespace_map file during the shutdown. Previously the WARNING
message "online backup mode was not canceled" was logged in that case.
However this message is confusing because the backup mode is treated
as canceled whenever backup_label is successfully renamed. So this
commit makes the server log the message "online backup mode canceled"
in that case.

Also this commit changes errdetail messages so that they follow the
error message style guide.

Back-patch to 9.5 where tablespace_map file is introduced.

Original patch by Amit Kapila, heavily modified by me.
2015-09-15 23:21:51 +09:00
Andrew Dunstan e7e3ac2d51 Fix the fastpath rule for jsonb_concat with an empty operand.
To prevent perverse results, we now only return the other operand if
it's not scalar, and if both operands are of the same kind (array or
object).

Original bug complaint and patch from Oskari Saarenmaa, extended by me
to cover the cases of different kinds of jsonb.

Backpatch to 9.5 where jsonb_concat was introduced.
2015-09-13 17:06:45 -04:00
Peter Eisentraut b2ae8f1e35 Update SQL features list 2015-09-12 00:08:18 -04:00
Robert Haas 2ccc4e972e Fix build problems in commit aa65de042f.
The previous way didn't work for vpath builds, and make distprep was
busted too.

Reported off-list by Andres Freund.
2015-09-11 14:56:17 -04:00
Alvaro Herrera 5cd6538345 Add missing ReleaseBuffer call in BRIN revmap code
I think this particular branch is actually dead, but the analysis to
prove that is not trivial, so instead take the weasel way.

Reported by Jinyu Zhang
Backpatch to 9.5, where BRIN was introduced.
2015-09-11 15:29:46 -03:00
Kevin Grittner 5ddc72887a Fix an O(N^2) problem in foreign key references.
Commit 45ba424f improved foreign key lookups during bulk updates
when the FK value does not change.  When restoring a schema dump
from a database with many (say 100,000) foreign keys, this cache
would grow very big and every ALTER TABLE command was causing an
InvalidateConstraintCacheCallBack(), which uses a sequential hash
table scan.  This could cause a severe performance regression in
restoring a schema dump (including during pg_upgrade).

The patch uses a heuristic method of detecting when the hash table
should be destroyed and recreated.
InvalidateConstraintCacheCallBack() adds the current size of the
hash table to a counter.  When that sum reaches 1,000,000, the hash
table is flushed.  This fixes the regression without noticeable
harm to the bulk update use case.

Jan Wieck
Backpatch to 9.3 where the performance regression was introduced.
2015-09-11 13:06:51 -05:00
Robert Haas aa65de042f When trace_lwlocks is used, identify individual lwlocks by name.
Naming the individual lwlocks seems like something that may be useful
for other types of debugging, monitoring, or instrumentation output,
but this commit just implements it for the specific case of
trace_lwlocks.

Patch by me, reviewed by Amit Kapila and Kyotaro Horiguchi
2015-09-11 14:01:39 -04:00
Tom Lane 87efbc2be1 Fix setrefs.c comment properly.
The "typo" alleged in commit 1e460d4bd was actually a comment that was
correct when written, but I missed updating it in commit b5282aa89.
Use a slightly less specific (and hopefully more future-proof) description
of what is collected.  Back-patch to 9.2 where that commit appeared, and
revert the comment to its then-entirely-correct state before that.
2015-09-10 10:23:56 -04:00
Stephen Frost 1e460d4bd6 Fix typo in setrefs.c
We're adding OIDs, not TIDs, to invalItems.

Pointed out by Etsuro Fujita.

Back-patch to all supported branches.
2015-09-10 09:22:03 -04:00
Tom Lane 91cf3135b9 Fix minor bug in regexp makesearch() function.
The list-wrangling here was done wrong, allowing the same state to get
put into the list twice.  The following loop then would clone it twice.
The second clone would wind up with no inarcs, so that there was no
observable misbehavior AFAICT, but a useless state in the finished NFA
isn't an especially good thing.
2015-09-09 20:14:58 -04:00
Teodor Sigaev 223936e226 Fix oversight in 013ebc0a7b commit
Declaration of varibale inside ÓÝ×Õ
2015-09-09 19:21:16 +03:00
Teodor Sigaev 013ebc0a7b Microvacuum for GIST
Mark index tuple as dead if it's pointed by kill_prior_tuple during
ordinary (search) scan and remove it during insert process if there is no
enough space for new tuple to insert. This improves select performance
because index will not return tuple marked as dead and improves insert
performance because it reduces number of page split.

Anastasia Lubennikova <a.lubennikova@postgrespro.ru> with
 minor editorialization by me
2015-09-09 18:43:37 +03:00
Fujii Masao 96f6a0cb41 Remove files signaling a standby promotion request at postmaster startup
This commit makes postmaster forcibly remove the files signaling
a standby promotion request. Otherwise, the existence of those files
can trigger a promotion too early, whether a user wants that or not.

This removal of files is usually unnecessary because they can exist
only during a few moments during a standby promotion. However
there is a race condition: if pg_ctl promote is executed and creates
the files during a promotion, the files can stay around even after
the server is brought up to new master. Then, if new standby starts
by using the backup taken from that master, the files can exist
at the server startup and should be removed in order to avoid
an unexpected promotion.

Back-patch to 9.1 where promote signal file was introduced.

Problem reported by Feike Steenbergen.
Original patch by Michael Paquier, modified by me.

Discussion: 20150528100705.4686.91426@wrigleys.postgresql.org
2015-09-09 22:51:44 +09:00
Stephen Frost c3e0ddd403 Lock all relations referred to in updatable views
Even views considered "simple" enough to be automatically updatable may
have mulitple relations involved (eg: in a where clause).  We need to
make sure and lock those relations when rewriting the query.

Back-patch to 9.3 where updatable views were added.

Pointed out by Andres, patch thanks to Dean Rasheed.
2015-09-08 17:02:49 -04:00
Fujii Masao 043113e798 Add gin_fuzzy_search_limit to postgresql.conf.sample.
This was forgotten in 8a3631f (commit that originally added the parameter)
and 0ca9907 (commit that added the documentation later that year).

Back-patch to all supported versions.
2015-09-09 02:25:50 +09:00
Alvaro Herrera 1aba62ec63 Allow per-tablespace effective_io_concurrency
Per discussion, nowadays it is possible to have tablespaces that have
wildly different I/O characteristics from others.  Setting different
effective_io_concurrency parameters for those has been measured to
improve performance.

Author: Julien Rouhaud
Reviewed by: Andres Freund
2015-09-08 12:51:42 -03:00
Jeff Davis b1e1862a12 Coordinate log_line_prefix options 'm' and 'n' to share a timeval.
Commit f828654e introduced the 'n' option, but it invoked
gettimeofday() independently of the 'm' option. If both options were
in use (or multiple 'n' options), or if 'n' was in use along with
csvlog, then the reported times could be different for the same log
message.

To fix, initialize a global variable with gettimeofday() once per log
message, and use that for both formats.

Don't bother coordinating the time for the 't' option, which has much
lower resolution.

Per complaint by Alvaro Herrera.
2015-09-07 15:40:49 -07:00
Jeff Davis f828654e10 Add log_line_prefix option 'n' for Unix epoch.
Prints time as Unix epoch with milliseconds.

Tomas Vondra, reviewed by Fabien Coelho.
2015-09-07 13:46:31 -07:00
Teodor Sigaev e26692248a Make GIN's cleanup pending list process interruptable
Cleanup process could be called by ordinary insert/update and could take a lot
of time. Add vacuum_delay_point() to make this process interruptable. Under
vacuum this call will also throttle a vacuum process to decrease system load,
called from insert/update it will not throttle, and that reduces a latency.

Backpatch for all supported branches.

Jeff Janes <jeff.janes@gmail.com>
2015-09-07 17:16:29 +03:00
Teodor Sigaev e956808328 Add pages deleted from pending list to FSM
Add pages deleted from GIN's pending list during cleanup to free space map
immediately. Clean up process could be initiated by ordinary insert but adding
page to FSM might occur only at vacuum. On some workload like never-vacuumed
insert-only tables it could cause a huge bloat.

Jeff Janes <jeff.janes@gmail.com>
2015-09-07 16:24:01 +03:00
Magnus Hagander 643beffe8f Support RADIUS passwords up to 128 characters
Previous limit was 16 characters, due to lack of support for multiple passes
of encryption.

Marko Tiikkaja
2015-09-06 14:31:53 +02:00
Andres Freund c314ead5be Add ability to reserve WAL upon slot creation via replication protocol.
Since 6fcd885 it is possible to immediately reserve WAL when creating a
slot via pg_create_physical_replication_slot(). Extend the replication
protocol to allow that as well.

Although, in contrast to the SQL interface, it is possible to update the
reserved location via the replication interface, it is still useful
being able to reserve upon creation there. Otherwise the logic in
ReplicationSlotReserveWal() has to be repeated in slot employing
clients.

Author: Michael Paquier
Discussion: CAB7nPqT0Wc1W5mdYGeJ_wbutbwNN+3qgrFR64avXaQCiJMGaYA@mail.gmail.com
2015-09-06 13:30:57 +02:00
Greg Stark 258ee1b635 Move DTK_ISODOW DTK_DOW and DTK_DOY to be type UNITS rather than
RESERV. RESERV is meant for tokens like "now" and having them in that
category throws errors like these when used as an input date:

stark=# SELECT 'doy'::timestamptz;
ERROR:  unexpected dtype 33 while parsing timestamptz "doy"
LINE 1: SELECT 'doy'::timestamptz;
               ^
stark=# SELECT 'dow'::timestamptz;
ERROR:  unexpected dtype 32 while parsing timestamptz "dow"
LINE 1: SELECT 'dow'::timestamptz;
               ^

Found by LLVM's Libfuzzer
2015-09-06 03:35:56 +01:00
Tom Lane 9270d8db9a Fix CreateTableSpace() so it will compile without HAVE_SYMLINK.
This has been broken since 9.3 (commit 82b1b213ca to be exact),
which suggests that nobody is any longer using a Windows build system that
doesn't provide a symlink emulation.  Still, it's wrong on its own terms,
so repair.

YUriy Zhuravlev
2015-09-05 16:15:38 -04:00
Heikki Linnakangas c80b5f66c6 Fix misc typos.
Oskari Saarenmaa. Backpatch to stable branches where applicable.
2015-09-05 11:35:49 +03:00
Tatsuo Ishii c39f5674df Fix brin index summarizing while vacuuming.
If the number of heap blocks is not multiples of pages per range, the
summarizing produces wrong summary information for the last brin index
tuple while vacuuming.

Problem reported by Tatsuo Ishii and fixed by Amit Langote.

Discussion at "[HACKERS] BRIN INDEX value (message id :20150903.174935.1946402199422994347.t-ishii@sraoss.co.jp)
Backpatched to 9.5 in which brin index was added.
2015-09-05 09:19:25 +09:00
Tom Lane c5454f99c4 Fix subtransaction cleanup after an outer-subtransaction portal fails.
Formerly, we treated only portals created in the current subtransaction as
having failed during subtransaction abort.  However, if the error occurred
while running a portal created in an outer subtransaction (ie, a cursor
declared before the last savepoint), that has to be considered broken too.

To allow reliable detection of which ones those are, add a bookkeeping
field to struct Portal that tracks the innermost subtransaction in which
each portal has actually been executed.  (Without this, we'd end up
failing portals containing functions that had called the subtransaction,
thereby breaking plpgsql exception blocks completely.)

In addition, when we fail an outer-subtransaction Portal, transfer its
resources into the subtransaction's resource owner, so that they're
released early in cleanup of the subxact.  This fixes a problem reported by
Jim Nasby in which a function executed in an outer-subtransaction cursor
could cause an Assert failure or crash by referencing a relation created
within the inner subtransaction.

The proximate cause of the Assert failure is that AtEOSubXact_RelationCache
assumed it could blow away a relcache entry without first checking that the
entry had zero refcount.  That was a bad idea on its own terms, so add such
a check there, and to the similar coding in AtEOXact_RelationCache.  This
provides an independent safety measure in case there are still ways to
provoke the situation despite the Portal-level changes.

This has been broken since subtransactions were invented, so back-patch
to all supported branches.

Tom Lane and Michael Paquier
2015-09-04 13:37:14 -04:00
Robert Haas 4aec49899e Assorted code review for recent ProcArrayLock patch.
Post-commit review by Andres Freund discovered a couple of concurrency
bugs in the original patch: specifically, if the leader cleared a
follower's XID before it reached PGSemaphoreLock, the semaphore would be
left in the wrong state; and if another process did PGSemaphoreUnlock
for some unrelated reason, we might resume execution before the fact
that our XID was cleared was globally visible.

Also, improve the wording of some comments, rename nextClearXidElem
to firstClearXidElem in PROC_HDR for clarity, and drop some volatile
qualifiers that aren't necessary.

Amit Kapila, reviewed and slightly revised by me.
2015-09-03 13:19:15 -04:00
Fujii Masao 1ea5ce5c5f Document that max_worker_processes must be high enough in standby.
The setting values of some parameters including max_worker_processes
must be equal to or higher than the values on the master. However,
previously max_worker_processes was not listed as such parameter
in the document. So this commit adds it to that list.

Back-patch to 9.4 where max_worker_processes was added.
2015-09-03 22:30:16 +09:00
Teodor Sigaev 30bb26b5e0 Allow usage of huge maintenance_work_mem for GIN build.
Currently, in-memory posting list during GIN build process is limited 1GB
because of using repalloc. The patch replaces call of repalloc to repalloc_huge.
It increases limit of posting list from 180 millions
(1GB / sizeof(ItemPointerData)) to 4 billions limited by maxcount/count fields
in GinEntryAccumulator and subsequent calls. Check added.

Also, fix accounting of allocatedMemory during build to prevent integer
overflow with maintenance_work_mem > 4GB.

Robert Abraham <robert.abraham86@googlemail.com> with additions by me
2015-09-02 20:08:58 +03:00
Robert Haas 8a02b3d732 Allow notifications to bgworkers without database connections.
Previously, if one background worker registered another background
worker and set bgw_notify_pid while for the second background worker,
it would not receive notifications from the postmaster unless, at the
time the "parent" was registered, BGWORKER_BACKEND_DATABASE_CONNECTION
was set.

To fix, instead instead of including only those background workers that
requested database connections in the postmater's BackendList, include
them all.  There doesn't seem to be any reason not do this, and indeed
it removes a significant amount of duplicated code.  The other option
is to make PostmasterMarkPIDForWorkerNotify look at BackgroundWorkerList
in addition to BackendList, but that adds more code duplication instead
of getting rid of it.

Patch by me.  Review and testing by Ashutosh Bapat.
2015-09-01 15:30:19 -04:00
Tom Lane 123c9d2fc1 Clean up icc + ia64 situation.
Some googling turned up multiple sources saying that older versions of icc
do not accept gcc-compatible asm blocks on IA64, though asm does work on
x86[_64].  This is apparently fixed as of icc version 12.0 or so, but that
doesn't help us much; if we have to carry the extra implementation anyway,
we may as well just use it for icc rather than add a compiler version test.

Hence, revert commit 2c713d6ea2 (though I
separated the icc code from the gcc code completely, producing what seems
cleaner code).  Document the state of affairs more explicitly, both in
s_lock.h and postgres.c, and make some cosmetic adjustments around the
IA64 code in s_lock.h.
2015-08-31 18:10:04 -04:00
Tom Lane f333204bbc Actually, it's not that hard to merge the Windows pqsignal code ...
... just need to typedef sigset_t and provide sigemptyset/sigfillset,
which are easy enough.
2015-08-31 15:52:56 -04:00
Tom Lane 2c713d6ea2 Remove theoretically-unnecessary special case for icc.
Intel's icc is generally able to swallow asm blocks written for gcc.
We have a few places that don't seem to know that, though.  Experiment
with removing the special case for icc in ia64_get_bsp(); if the buildfarm
likes this, I'll try more cleanup.  This is a good test case because it
involves a "stop" notation that seems like it might not be very portable.
2015-08-31 14:43:10 -04:00
Tom Lane a65e086453 Remove support for Unix systems without the POSIX signal APIs.
Remove configure's checks for HAVE_POSIX_SIGNALS, HAVE_SIGPROCMASK, and
HAVE_SIGSETJMP.  These APIs are required by the Single Unix Spec v2
(POSIX 1997), which we generally consider to define our minimum required
set of Unix APIs.  Moreover, no buildfarm member has reported not having
them since 2012 or before, which means that even if the code is still live
somewhere, it's untested --- and we've made plenty of signal-handling
changes of late.  So just take these APIs as given and save the cycles for
configure probes for them.

However, we can't remove as much C code as I'd hoped, because the Windows
port evidently still uses the non-POSIX code paths for signal masking.
Since we're largely emulating these BSD-style APIs for Windows anyway, it
might be a good thing to switch over to POSIX-like notation and thereby
remove a few more #ifdefs.  But I'm not in a position to code or test that.
In the meantime, we can at least make things a bit more transparent by
testing for WIN32 explicitly in these places.
2015-08-31 12:56:10 -04:00
Stephen Frost 2ba9e2b778 Ensure locks are acquired on RLS-added relations
During fireRIRrules(), get_row_security_policies can add to
securityQuals and withCheckOptions.  Make sure to lock any relations
added at that point and before firing RIR rules on those expressions.

Back-patch to 9.5 where RLS was added.
2015-08-28 11:39:37 -04:00
Andres Freund c0f0d8097b Clarify what some historic terms in rewriteHandler.c mean.
Discussion: 20150827131352.GF2435@awork2.anarazel.de
2015-08-28 16:27:58 +02:00
Tom Lane 8a7d070181 Speed up HeapTupleSatisfiesMVCC() by replacing the XID-in-progress test.
Rather than consulting TransactionIdIsInProgress to see if an in-doubt
transaction is still running, consult XidInMVCCSnapshot.  That requires
the same or fewer cycles as TransactionIdIsInProgress, and what's far
more important, it does not access shared data structures (at least in the
no-subxip-overflow case) so it incurs no contention.  Furthermore, we would
have had to check XidInMVCCSnapshot anyway before deciding that we were
allowed to see the tuple.

There should never be a case where XidInMVCCSnapshot says a transaction is
done while TransactionIdIsInProgress says it's still running.  The other
way around is quite possible though.  The result of that difference is that
HeapTupleSatisfiesMVCC will no longer set hint bits on tuples whose source
transactions recently finished but are still running according to our
snapshot.  The main cost of delaying the hint-bit setting is that repeated
visits to a just-committed tuple, by transactions none of which have
snapshots new enough to see the source transaction as done, will each
execute TransactionIdIsCurrentTransactionId, which they need not have done
before.  However, that's normally just a small overhead, and no contention
costs are involved; so it seems well worth the benefit of removing
TransactionIdIsInProgress calls during the life of the source transaction.

The core idea for this patch is due to Jeff Janes, who also did the legwork
proving its performance benefits.  His original proposal was to swap the
order of TransactionIdIsInProgress and XidInMVCCSnapshot calls in some
cases within HeapTupleSatisfiesMVCC.  That was a bit messy though.
The idea that we could dispense with calling TransactionIdIsInProgress
altogether was mine, as is the final patch.
2015-08-26 18:19:07 -04:00
Tom Lane 7b5ef8f2d0 Limit the verbosity of memory context statistics dumps.
We had a report from Stefan Kaltenbrunner of a case in which postmaster
log files overran available disk space because multiple backends spewed
enormous context stats dumps upon hitting an out-of-memory condition.
Given the lack of similar reports, this isn't a common problem, but it
still seems worth doing something about.  However, we don't want to just
blindly truncate the output, because that might prevent diagnosis of OOM
problems.  What seems like a workable compromise is to limit the dump to
100 child contexts per parent, and summarize the space used within any
additional child contexts.  That should help because practical cases where
the dump gets long will typically be huge numbers of siblings under the
same parent context; while the additional debugging value from seeing
details about individual siblings beyond 100 will not be large, we hope.
Anyway it doesn't take much code or memory space to do this, so let's try
it like this and see how things go.

Since the summarization mechanism requires passing totals back up anyway,
I took the opportunity to add a "grand total" line to the end of the
printout.
2015-08-25 13:09:48 -04:00
Tom Lane aad663a0b4 Reduce number of bytes examined by convert_one_string_to_scalar().
Previously, convert_one_string_to_scalar() would examine up to 20 bytes of
the input string, producing a scalar conversion with theoretical precision
far greater than is of any possible use considering the other limitations
on the accuracy of the resulting selectivity estimate.  (I think this
choice might pre-date the caller-level logic that strips any common prefix
of the strings; before that, there could have been value in scanning the
strings far enough to use all the precision available in a double.)

Aside from wasting cycles to little purpose, this choice meant that the
"denom" variable could grow to as much as 256^21 = 3.74e50, which could
overflow in some non-IEEE float arithmetics.  While we don't really support
any machines with non-IEEE arithmetic anymore, this still seems like quite
an unnecessary platform dependency.  Limit the scan to 12 bytes instead,
thus limiting "denom" to 256^13 = 2.03e31, a value more likely to be
computable everywhere.

Per testing by Greg Stark, which showed overflow failures in our standard
regression tests on VAX.
2015-08-23 15:15:47 -04:00
Tom Lane 44ed65a545 Avoid use of float arithmetic in bipartite_match.c.
Since the distances used in this algorithm are small integers (not more
than the size of the U set, in fact), there is no good reason to use float
arithmetic for them.  Use short ints instead: they're smaller, faster, and
require no special portability assumptions.

Per testing by Greg Stark, which disclosed that the code got into an
infinite loop on VAX for lack of IEEE-style float infinities.  We don't
really care all that much whether Postgres can run on a VAX anymore,
but there seems sufficient reason to change this code anyway.

In passing, make a few other small adjustments to make the code match
usual Postgres coding style a bit better.
2015-08-23 13:02:18 -04:00
Kevin Grittner 5956b7f9e8 Fix typo in C comment.
Merlin Moncure
Backpatch to 9.5, where the misspelling was introduced
2015-08-23 10:38:57 -05:00
Peter Eisentraut b386271594 Improve whitespace 2015-08-22 21:54:35 -04:00
Tom Lane 6e5d9f278c Avoid O(N^2) behavior when enlarging SPI tuple table in spi_printtup().
For no obvious reason, spi_printtup() was coded to enlarge the tuple
pointer table by just 256 slots at a time, rather than doubling the size at
each reallocation, as is our usual habit.  For very large SPI results, this
makes for O(N^2) time spent in repalloc(), which of course soon comes to
dominate the runtime.  Use the standard doubling approach instead.

This is a longstanding performance bug, so back-patch to all active
branches.

Neil Conway
2015-08-21 20:32:11 -04:00
Alvaro Herrera e68be16b0d Do not allow *timestamp to be passed as NULL
The code had bugs that would cause crashes if NULL was passed as that
argument (originally intended to mean not to bother returning its
value), and after inspection it turns out that nothing seems interested
in the case that *ts is NULL anyway.  Therefore, remove the partial
checks intended to support that case.

Author: Michael Paquier
though I didn't include a proposed Assert.

Backpatch to 9.5.
2015-08-21 14:36:54 -03:00
Alvaro Herrera 8c3d63c521 Remove ExecGetScanType function
This became unused in a191a169d6.
2015-08-21 14:11:58 -03:00
Tom Lane 09b3d27256 Allow record_in() and record_recv() to work for transient record types.
If we have the typmod that identifies a registered record type, there's no
reason that record_in() should refuse to perform input conversion for it.
Now, in direct SQL usage, record_in() will always be passed typmod = -1
with type OID RECORDOID, because no typmodin exists for type RECORD, so the
case can't arise.  However, some InputFunctionCall users such as PLs may be
able to supply the right typmod, so we should allow this to support them.

Note: the previous coding and comment here predate commit 59c016aa9f.
There has been no case since 8.1 in which the passed type OID wouldn't be
valid; and if it weren't, this error message wouldn't be apropos anyway.
Better to let lookup_rowtype_tupdesc complain about it.

Back-patch to 9.1, as this is necessary for my upcoming plpython fix.
I'm committing it separately just to make it a bit more visible in the
commit history.
2015-08-21 11:19:33 -04:00
Stephen Frost 3c99788797 Rename 'cmd' to 'cmd_name' in CreatePolicyStmt
To avoid confusion, rename CreatePolicyStmt's 'cmd' to 'cmd_name',
parse_policy_command's 'cmd' to 'polcmd', and AlterPolicy's 'cmd_datum'
to 'polcmd_datum', per discussion with Noah and as a follow-up to his
correction of copynodes/equalnodes handling of the CreatePolicyStmt
'cmd' field.

Back-patch to 9.5 where the CreatePolicyStmt was introduced, as we
are still only in alpha.
2015-08-21 08:22:22 -04:00
Stephen Frost 7ec8296e70 In AlterRole, make bypassrls an int
When reworking bypassrls in AlterRole to operate the same way the other
attribute handling is done, I missed that the variable was incorrectly a
bool rather than an int.  This meant that on platforms with an unsigned
char, we could end up with incorrect behavior during ALTER ROLE.

Pointed out by Andres thanks to tests he did changing our bool to be the
one from stdbool.h which showed this and a number of other issues.

Add regression tests to test CREATE/ALTER role for the various role
attributes.  Arrange to leave roles behind for testing pg_dumpall, but
none which have the LOGIN attribute.

Back-patch to 9.5 where the AlterRole bug exists.
2015-08-21 08:22:22 -04:00
Kevin Grittner 1cac8c9820 Fix bug in calculations of hash join buckets.
Commit 8cce08f168 used a left-shift
on a literal of 1 that could (in large allocations) be shifted by
31 or more bits.  This was assigned to a local variable that was
already declared to be a long to protect against overruns of int,
but the literal in this shift needs to be declared long to allow it
to work correctly in some compilers.

Backpatch to 9.5, where the bug was introduced.

Report and patch by KaiGai Kohei, slighly modified based on
discussion.
2015-08-19 08:20:55 -05:00
Andres Freund e95126cf04 Don't use function definitions looking like old-style ones.
This fixes a bunch of somewhat pedantic warnings with new
compilers. Since by far the majority of other functions definitions use
the (void) style it just seems to be consistent to do so as well in the
remaining few places.
2015-08-15 17:25:00 +02:00
Andres Freund f9dec81a54 Correct type of waitMode variable in ExecInsertIndexTuples().
It was a bool, even though it should be CEOUC_WAIT_MODE. That's unlikely
to have a negative effect with the current definition of bool (char),
but it's definitely wrong.

Discussion: 20150812084351.GD8470@awork2.anarazel.de
Backpatch: 9.5, where ON CONFLICT was merged
2015-08-15 17:11:42 +02:00
Andres Freund 6c772c7453 Don't use 'bool' as a struct member name in help_config.c.
Doing so doesn't work if bool is a macro rather than a typedef.

Although c.h spends some effort to support configurations where bool is
a preexisting macro, help_config.c has existed this way since
2003 (b700a6), and there have not been any reports of
problems. Backpatch anyway since this is as riskless as it gets.

Discussion: 20150812084351.GD8470@awork2.anarazel.de
Backpatch: 9.0-master
2015-08-15 16:32:38 +02:00
Noah Misch ec79978dd0 Encoding PG_UHC is code page 949.
This fixes presentation of non-ASCII messages to the Windows event log
and console in rare cases involving Korean locale.  Processes like the
postmaster and checkpointer, but not processes attached to databases,
were affected.  Back-patch to 9.4, where MessageEncoding was introduced.
The problem exists in all supported versions, but this change has no
effect in the absence of the code recognizing PG_UHC MessageEncoding.

Noticed while investigating bug #13427 from Dmitri Bourlatchkov.
2015-08-14 20:23:13 -04:00
Noah Misch 43adc7a714 Restore old pgwin32_message_to_UTF16() behavior outside transactions.
Commit 49c817eab7 replaced with a hard
error the dubious pg_do_encoding_conversion() behavior when outside a
transaction.  Reintroduce the historic soft failure locally within
pgwin32_message_to_UTF16().  This fixes errors when writing messages in
less-common encodings to the Windows event log or console.  Back-patch
to 9.4, where the aforementioned commit first appeared.

Per bug #13427 from Dmitri Bourlatchkov.
2015-08-14 20:23:09 -04:00
Simon Riggs 47167b7907 Reduce lock levels for ALTER TABLE SET autovacuum storage options
Reduce lock levels down to ShareUpdateExclusiveLock for all autovacuum-related
relation options when setting them using ALTER TABLE.

Add infrastructure to allow varying lock levels for relation options in later
patches. Setting multiple options together uses the highest lock level required
for any option. Works for both main and toast tables.

Fabrízio Mello, reviewed by Michael Paquier, mild edit and additional regression
tests from myself
2015-08-14 14:19:28 +01:00
Alvaro Herrera fcbf455842 Fix unitialized variables
As complained by clang, reported by Andres Freund.  Brown paper bag bug
in ccc4c07499.

Add some comments, too.

Backpatch to 9.5, like that one.
2015-08-13 00:12:07 -03:00
Tom Lane cfe30a72fa Undo mistaken tightening in join_is_legal().
One of the changes I made in commit 8703059c6b turns out not to have
been such a good idea: we still need the exception in join_is_legal() that
allows a join if both inputs already overlap the RHS of the special join
we're checking.  Otherwise we can miss valid plans, and might indeed fail
to find a plan at all, as in recent report from Andreas Seltenreich.

That code was added way back in commit c17117649b, but I failed to
include a regression test case then; my bad.  Put it back with a better
explanation, and a test this time.  The logic does end up a bit different
than before though: I now believe it's appropriate to make this check
first, thereby allowing such a case whether or not we'd consider the
previous SJ(s) to commute with this one.  (Presumably, we already decided
they did; but it was confusing to have this consideration in the middle
of the code that was handling the other case.)

Back-patch to all active branches, like the previous patch.
2015-08-12 21:19:03 -04:00
Alvaro Herrera ccc4c07499 Close some holes in BRIN page assignment
In some corner cases, it is possible for the BRIN index relation to be
extended by brin_getinsertbuffer but the new page not be used
immediately for anything by its callers; when this happens, the page is
initialized and the FSM is updated (by brin_getinsertbuffer) with the
info about that page, but these actions are not WAL-logged.  A later
index insert/update can use the page, but since the page is already
initialized, the initialization itself is not WAL-logged then either.
Replay of this sequence of events causes recovery to fail altogether.

There is a related corner case within brin_getinsertbuffer itself, in
which we extend the relation to put a new index tuple there, but later
find out that we cannot do so, and do not return the buffer; the page
obtained from extension is not even initialized.  The resulting page is
lost forever.

To fix, shuffle the code so that initialization is not the
responsibility of brin_getinsertbuffer anymore, in normal cases;
instead, the initialization is done by its callers (brin_doinsert and
brin_doupdate) once they're certain that the page is going to be used.
When either those functions determine that the new page cannot be used,
before bailing out they initialize the page as an empty regular page,
enter it in FSM and WAL-log all this.  This way, the page is usable for
future index insertions, and WAL replay doesn't find trying to insert
tuples in pages whose initialization didn't make it to the WAL.  The
same strategy is used in brin_getinsertbuffer when it cannot return the
new page.

Additionally, add a new step to vacuuming so that all pages of the index
are scanned; whenever an uninitialized page is found, it is initialized
as empty and WAL-logged.  This closes the hole that the relation is
extended but the system crashes before anything is WAL-logged about it.
We also take this opportunity to update the FSM, in case it has gotten
out of date.

Thanks to Heikki Linnakangas for finding the problem that kicked some
additional analysis of BRIN page assignment code.

Backpatch to 9.5, where BRIN was introduced.

Discussion: https://www.postgresql.org/message-id/20150723204810.GY5596@postgresql.org
2015-08-12 14:20:38 -03:00
Andres Freund a4b059fdde Remove duplicated assignment in pg_create_physical_replication_slot.
Reported-By: Gurjeet Singh
2015-08-12 17:35:50 +02:00
Andres Freund d25fbf9f3e Fix two off-by-one errors in bufmgr.c.
In 4b4b680c I passed a buffer index number (starting from 0) instead of
a proper Buffer id (which start from 1 for shared buffers) in two
places.

This wasn't noticed so far as one of those locations isn't compiled at
all (PrintPinnedBufs) and the other one (InvalidBuffer) requires a
unlikely, but possible, set of circumstances to trigger a symptom.

To reduce the likelihood of such incidents a bit also convert existing
open coded mappings from buffer descriptors to buffer ids with
BufferDescriptorGetBuffer().

Author: Qingqing Zhou
Reported-By: Qingqing Zhou
Discussion: CAJjS0u2ai9ooUisKtkV8cuVUtEkMTsbK8c7juNAjv8K11zeCQg@mail.gmail.com
Backpatch: 9.5 where the private ref count infrastructure was introduced
2015-08-12 17:35:50 +02:00
Tom Lane 8a0258c318 Fix some possible low-memory failures in regexp compilation.
newnfa() failed to set the regex error state when malloc() fails.
Several places in regcomp.c failed to check for an error after calling
subre().  Each of these mistakes could lead to null-pointer-dereference
crashes in memory-starved backends.

Report and patch by Andreas Seltenreich.  Back-patch to all branches.
2015-08-12 00:48:11 -04:00
Tom Lane 68fa28f771 Postpone extParam/allParam calculations until the very end of planning.
Until now we computed these Param ID sets at the end of subquery_planner,
but that approach depends on subquery_planner returning a concrete Plan
tree.  We would like to switch over to returning one or more Paths for a
subquery, and in that representation the necessary details aren't fully
fleshed out (not to mention that we don't really want to do this work for
Paths that end up getting discarded).  Hence, refactor so that we can
compute the param ID sets at the end of planning, just before
set_plan_references is run.

The main change necessary to make this work is that we need to capture
the set of outer-level Param IDs available to the current query level
before exiting subquery_planner, since the outer levels' plan_params lists
are transient.  (That's not going to pose a problem for returning Paths,
since all the work involved in producing that data is part of expression
preprocessing, which will continue to happen before Paths are produced.)
On the plus side, this change gets rid of several existing kluges.

Eventually I'd like to get rid of SS_finalize_plan altogether in favor of
doing this work during set_plan_references, but that will require some
complex rejiggering because SS_finalize_plan needs to visit subplans and
initplans before the main plan.  So leave that idea for another day.
2015-08-11 23:48:37 -04:00
Alvaro Herrera 4901b2f495 Don't include rel.h when relcache.h is sufficient
Trivial change to reduce exposure of rel.h.
2015-08-11 13:03:14 -03:00
Andres Freund 6fcd88511f Allow pg_create_physical_replication_slot() to reserve WAL.
When creating a physical slot it's often useful to immediately reserve
the current WAL position instead of only doing after the first feedback
message arrives. That e.g. allows slots to guarantee that all the WAL
for a base backup will be available afterwards.

Logical slots already have to reserve WAL during creation, so generalize
that logic into being usable for both physical and logical slots.

Catversion bump because of the new parameter.

Author: Gurjeet Singh
Reviewed-By: Andres Freund
Discussion: CABwTF4Wh_dBCzTU=49pFXR6coR4NW1ynb+vBqT+Po=7fuq5iCw@mail.gmail.com
2015-08-11 12:34:31 +02:00
Andres Freund 093d0c83c1 Introduce macros determining if a replication slot is physical or logical.
These make the code a bit easier to read, and make it easier to add a
more explicit notion of a slot's type at some point in the future.

Author: Gurjeet Singh
Discussion: CABwTF4Wh_dBCzTU=49pFXR6coR4NW1ynb+vBqT+Po=7fuq5iCw@mail.gmail.com
2015-08-11 12:32:48 +02:00
Andres Freund 3b425b7c02 Minor cleanups in slot related code.
Fix a bunch of typos, and remove two superflous includes.

Author: Gurjeet Singh
Discussion: CABwTF4Wh_dBCzTU=49pFXR6coR4NW1ynb+vBqT+Po=7fuq5iCw@mail.gmail.com
Backpatch: 9.4
2015-08-11 12:32:48 +02:00
Tom Lane 4200a92862 Further mucking with PlaceHolderVar-related restrictions on join order.
Commit 85e5e222b1 turns out not to have taken
care of all cases of the partially-evaluatable-PlaceHolderVar problem found
by Andreas Seltenreich's fuzz testing.  I had set it up to check for risky
PHVs only in the event that we were making a star-schema-based exception to
the param_source_rels join ordering heuristic.  However, it turns out that
the problem can occur even in joins that satisfy the param_source_rels
heuristic, in which case allow_star_schema_join() isn't consulted.
Refactor so that we check for risky PHVs whenever the proposed join has
any remaining parameterization.

Back-patch to 9.2, like the previous patch (except for the regression test
case, which only works back to 9.3 because it uses LATERAL).

Note that this discovery implies that problems of this sort could've
occurred in 9.2 and up even before the star-schema patch; though I've not
tried to prove that experimentally.
2015-08-10 17:18:17 -04:00
Andres Freund 3f811c2d6f Add confirmed_flush column to pg_replication_slots.
There's no reason not to expose both restart_lsn and confirmed_flush
since they have rather distinct meanings. The former is the oldest WAL
still required and valid for both physical and logical slots, whereas
the latter is the location up to which a logical slot's consumer has
confirmed receiving data. Most of the time a slot will require older
WAL (i.e. restart_lsn) than the confirmed
position (i.e. confirmed_flush_lsn).

Author: Marko Tiikkaja, editorialized by me
Discussion: 559D110B.1020109@joh.to
2015-08-10 13:28:18 +02:00
Andres Freund 5c4b25acce Fix copy & paste mistake in pg_get_replication_slots().
XLogRecPtr was compared with InvalidTransactionId instead of
InvalidXLogRecPtr. As both are defined to the same value this doesn't
cause any actual problems, but it's still wrong.

Backpatch: 9.4-master, bug was introduced in 9.4
2015-08-10 13:28:18 +02:00
Tom Lane 1e3e1ae266 Remove gram.y's precedence declaration for OVERLAPS.
The allowed syntax for OVERLAPS, viz "row OVERLAPS row", is sufficiently
constrained that we don't actually need a precedence declaration for
OVERLAPS; indeed removing this declaration does not change the generated
gram.c file at all.  Let's remove it to avoid confusion about whether
OVERLAPS has precedence or not.  If we ever generalize what we allow for
OVERLAPS, we might need to put back a precedence declaration for it,
but we might want some other level than what it has today --- and leaving
the declaration there would just risk confusion about whether that would
be an incompatible change.

Likewise, remove OVERLAPS from the documentation's precedence table.

Per discussion with Noah Misch.  Back-patch to 9.5 where we hacked up some
nearby precedence decisions.
2015-08-09 19:01:04 -04:00
Tom Lane 89db83922a Further adjustments to PlaceHolderVar removal.
A new test case from Andreas Seltenreich showed that we were still a bit
confused about removing PlaceHolderVars during join removal.  Specifically,
remove_rel_from_query would remove a PHV that was used only underneath
the removable join, even if the place where it's used was the join partner
relation and not the join clause being deleted.  This would lead to a
"too late to create a new PlaceHolderInfo" error later on.  We can defend
against that by checking ph_eval_at to see if the PHV could possibly be
getting used at some partner rel.

Also improve some nearby LATERAL-related logic.  I decided that the check
on ph_lateral needed to take precedence over the check on ph_needed, in
case there's a lateral reference underneath the join being considered.
(That may be impossible, but I'm not convinced of it, and it's easy enough
to defend against the case.)  Also, I realized that remove_rel_from_query's
logic for updating LateralJoinInfos is dead code, because we don't build
those at all until after join removal.

Back-patch to 9.3.  Previous versions didn't have the LATERAL issues, of
course, and they also didn't attempt to remove PlaceHolderInfos during join
removal.  (I'm starting to wonder if changing that was really such a great
idea.)
2015-08-07 14:13:50 -04:00
Robert Haas 846f8c9483 Fix attach-related race condition in shm_mq_send_bytes.
Spotted by Antonin Houska.
2015-08-07 10:04:07 -04:00
Andres Freund 4eda0a6470 Don't include low level locking code from frontend code.
Some frontend code like e.g. pg_xlogdump or pg_resetxlog, has to use
backend headers. Unfortunately until now that code includes most of the
locking code. It's generally not nice to expose such low level details,
but de6fd1c898 made that a hard problem. We fall back to defining
'inline' away if the compiler doesn't support it - that can cause linker
errors like on buildfarm animal pademelon if a inline function
references backend only code.

To fix that problem separate definitions from lock.h that are required
from frontend code into lockdefs.h and use it in the relevant
places. I've only removed the minimal amount of necessary definitions
for now - it might turn out that we want more for other reasons.

To avoid such details being exposed again put some checks against being
included from frontend code into atomics.h, lock.h, lwlock.h and
s_lock.h. It's otherwise fairly easy to indirectly include these
headers.

Discussion: 20150806070902.GE12214@awork2.anarazel.de
2015-08-07 15:10:56 +02:00
Andres Freund 18e8613564 Address points made in post-commit review of replication origins.
Amit reviewed the replication origins patch and made some good
points. Address them. This fixes typos in error messages, docs and
comments and adds a missing error check (although in a
should-never-happen scenario).

Discussion: CAA4eK1JqUBVeWWKwUmBPryFaje4190ug0y-OAUHWQ6tD83V4xg@mail.gmail.com
Backpatch: 9.5, where replication origins were introduced.
2015-08-07 15:09:05 +02:00
Tom Lane bab163e121 Fix old oversight in join removal logic.
Commit 9e7e29c75a introduced an Assert that
join removal didn't reduce the eval_at set of any PlaceHolderVar to empty.
At first glance it looks like join_is_removable ensures that's true --- but
actually, the loop in join_is_removable skips PlaceHolderVars that are not
referenced above the join due to be removed.  So, if we don't want any
empty eval_at sets, the right thing to do is to delete any now-unreferenced
PlaceHolderVars from the data structure entirely.

Per fuzz testing by Andreas Seltenreich.  Back-patch to 9.3 where the
aforesaid Assert was added.
2015-08-06 22:14:27 -04:00
Tom Lane cde35cf4ae Fix eclass_useful_for_merging to give valid results for appendrel children.
Formerly, this function would always return "true" for an appendrel child
relation, because it would think that the appendrel parent was a potential
join target for the child.  In principle that should only lead to some
inefficiency in planning, but fuzz testing by Andreas Seltenreich disclosed
that it could lead to "could not find pathkey item to sort" planner errors
in odd corner cases.  Specifically, we would think that all columns of a
child table's multicolumn index were interesting pathkeys, causing us to
generate a MergeAppend path that sorts by all the columns.  However, if any
of those columns weren't actually used above the level of the appendrel,
they would not get added to that rel's targetlist, which would result in
being unable to resolve the MergeAppend's sort keys against its targetlist
during createplan.c.

Backpatch to 9.3.  In older versions, columns of an appendrel get added
to its targetlist even if they're not mentioned above the scan level,
so that the failure doesn't occur.  It might be worth back-patching this
fix to older versions anyway, but I'll refrain for the moment.
2015-08-06 20:14:53 -04:00
Tom Lane 8703059c6b Further fixes for degenerate outer join clauses.
Further testing revealed that commit f69b4b9495 was still a few
bricks shy of a load: minor tweaking of the previous test cases resulted
in the same wrong-outer-join-order problem coming back.  After study
I concluded that my previous changes in make_outerjoininfo() were just
accidentally masking the problem, and should be reverted in favor of
forcing syntactic join order whenever an upper outer join's predicate
doesn't mention a lower outer join's LHS.  This still allows the
chained-outer-joins style that is the normally optimizable case.

I also tightened things up some more in join_is_legal().  It seems to me
on review that what's really happening in the exception case where we
ignore a mismatched special join is that we're allowing the proposed join
to associate into the RHS of the outer join we're comparing it to.  As
such, we should *always* insist that the proposed join be a left join,
which eliminates a bunch of rather dubious argumentation.  The case where
we weren't enforcing that was the one that was already known buggy anyway
(it had a violatable Assert before the aforesaid commit) so it hardly
deserves a lot of deference.

Back-patch to all active branches, like the previous patch.  The added
regression test case failed in all branches back to 9.1, and I think it's
only an unrelated change in costing calculations that kept 9.0 from
choosing a broken plan.
2015-08-06 15:35:46 -04:00
Robert Haas df0a67f754 Fix incorrect calculation in shm_mq_receive.
If some, but not all, of the length word has already been read, and the
next attempt to read sees exactly the number of bytes needed to complete
the length word, or fewer, then we'll incorrectly read less than all of
the available data.

Antonin Houska
2015-08-06 13:25:45 -04:00
Robert Haas 0e141c0fbb Reduce ProcArrayLock contention by removing backends in batches.
When a write transaction commits, it must clear its XID advertised via
the ProcArray, which requires that we hold ProcArrayLock in exclusive
mode in order to prevent concurrent processes running GetSnapshotData
from seeing inconsistent results.  When many processes try to commit
at once, ProcArrayLock must change hands repeatedly, with each
concurrent process trying to commit waking up to acquire the lock in
turn.  To make things more efficient, when more than one backend is
trying to commit a write transaction at the same time, have just one
of them acquire ProcArrayLock in exclusive mode and clear the XIDs of
all processes in the group.  Benchmarking reveals that this is much
more efficient at very high client counts.

Amit Kapila, heavily revised by me, with some review also from Pavan
Deolasee.
2015-08-06 12:02:12 -04:00
Noah Misch b8fe12a836 Reconcile nodes/*funcs.c with recent work.
A few of the discrepancies had semantic significance, but I did not
track down the resulting user-visible bugs, if any.  Back-patch to 9.5,
where all but one discrepancy appeared.  The _equalCreateEventTrigStmt()
situation dates to 9.3 but does not affect semantics.

catversion bump due to readfuncs.c field order changes.
2015-08-05 20:44:27 -04:00
Alvaro Herrera 2834855cb9 Fix BRIN to use SnapshotAny during summarization
For correctness of summarization results, it is critical that the
snapshot used during the summarization scan is able to see all tuples
that are live to all transactions -- including tuples inserted or
deleted by in-progress transactions.  Otherwise, it would be possible
for a transaction to insert a tuple, then idle for a long time while a
concurrent transaction executes summarization of the range: this would
result in the inserted value not being considered in the summary.
Previously we were trying to use a MVCC snapshot in conjunction with
adding a "placeholder" tuple in the index: the snapshot would see all
committed tuples, and the placeholder tuple would catch insertions by
any new inserters.  The hole is that prior insertions by transactions
that are still in progress by the time the MVCC snapshot was taken were
ignored.

Kevin Grittner reported this as a bogus error message during vacuum with
default transaction isolation mode set to repeatable read (because the
error report mentioned a function name not being invoked during), but
the problem is larger than that.

To fix, tweak IndexBuildHeapRangeScan to have a new mode that behaves
the way we need using SnapshotAny visibility rules.  This change
simplifies the BRIN code a bit, mainly by removing large comments that
were mistaken.  Instead, rely on the SnapshotAny semantics to provide
what it needs.  (The business about a placeholder tuple needs to remain:
that covers the case that a transaction inserts a a tuple in a page that
summarization already scanned.)

Discussion: https://www.postgresql.org/message-id/20150731175700.GX2441@postgresql.org

In passing, remove a couple of unused declarations from brin.h and
reword a comment to be proper English.  This part submitted by Kevin
Grittner.

Backpatch to 9.5, where BRIN was introduced.
2015-08-05 16:20:50 -03:00
Tom Lane 6af9ee4c8c Make real sure we don't reassociate joins into or out of SEMI/ANTI joins.
Per the discussion in optimizer/README, it's unsafe to reassociate anything
into or out of the RHS of a SEMI or ANTI join.  An example from Piotr
Stefaniak showed that join_is_legal() wasn't sufficiently enforcing this
rule, so lock it down a little harder.

I couldn't find a reasonably simple example of the optimizer trying to
do this, so no new regression test.  (Piotr's example involved the random
search in GEQO accidentally trying an invalid case and triggering a sanity
check way downstream in clause selectivity estimation, which did not seem
like a sequence of events that would be useful to memorialize in a
regression test as-is.)

Back-patch to all active branches.
2015-08-05 14:39:29 -04:00
Andres Freund de6fd1c898 Rely on inline functions even if that causes warnings in older compilers.
So far we have worked around the fact that some very old compilers do
not support 'inline' functions by only using inline functions
conditionally (or not at all). Since such compilers are very rare by
now, we have decided to rely on inline functions from 9.6 onwards.

To avoid breaking these old compilers inline is defined away when not
supported. That'll cause "function x defined but not used" type of
warnings, but since nobody develops on such compilers anymore that's
ok.

This change in policy will allow us to more easily employ inline
functions.

I chose to remove code previously conditional on PG_USE_INLINE as it
seemed confusing to have code dependent on a define that's always
defined.

Blacklisting of compilers, like in c53f73879f, now has to be done
differently. A platform template can define PG_FORCE_DISABLE_INLINE to
force inline to be defined empty.

Discussion: 20150701161447.GB30708@awork2.anarazel.de
2015-08-05 18:19:52 +02:00
Andres Freund a855118be3 Fix debug message output when connecting to a logical slot.
Previously the message erroneously printed the same LSN twice as the
assignment to the start_lsn variable was before the message. Correct
that.

Reported-By: Marko Tiikkaja
Author: Marko Tiikkaja
Backpatch: 9.5, where logical decoding was introduced
2015-08-05 13:26:01 +02:00
Tom Lane 8ea3e7a75c Fix bogus "out of memory" reports in tuplestore.c.
The tuplesort/tuplestore memory management logic assumed that the chunk
allocation overhead for its memtuples array could not increase when
increasing the array size.  This is and always was true for tuplesort,
but we (I, I think) blindly copied that logic into tuplestore.c without
noticing that the assumption failed to hold for the much smaller array
elements used by tuplestore.  Given rather small work_mem, this could
result in an improper complaint about "unexpected out-of-memory situation",
as reported by Brent DeSpain in bug #13530.

The easiest way to fix this is just to increase tuplestore's initial
array size so that the assumption holds.  Rather than relying on magic
constants, though, let's export a #define from aset.c that represents
the safe allocation threshold, and make tuplestore's calculation depend
on that.

Do the same in tuplesort.c to keep the logic looking parallel, even though
tuplesort.c isn't actually at risk at present.  This will keep us from
breaking it if we ever muck with the allocation parameters in aset.c.

Back-patch to all supported versions.  The error message doesn't occur
pre-9.3, not so much because the problem can't happen as because the
pre-9.3 tuplestore code neglected to check for it.  (The chance of
trouble is a great deal larger as of 9.3, though, due to changes in the
array-size-increasing strategy.)  However, allowing LACKMEM() to become
true unexpectedly could still result in less-than-desirable behavior,
so let's patch it all the way back.
2015-08-04 18:18:46 -04:00
Tom Lane 85e5e222b1 Fix a PlaceHolderVar-related oversight in star-schema planning patch.
In commit b514a7460d, I changed the planner
so that it would allow nestloop paths to remain partially parameterized,
ie the inner relation might need parameters from both the current outer
relation and some upper-level outer relation.  That's fine so long as we're
talking about distinct parameters; but the patch also allowed creation of
nestloop paths for cases where the inner relation's parameter was a
PlaceHolderVar whose eval_at set included the current outer relation and
some upper-level one.  That does *not* work.

In principle we could allow such a PlaceHolderVar to be evaluated at the
lower join node using values passed down from the upper relation along with
values from the join's own outer relation.  However, nodeNestloop.c only
supports simple Vars not arbitrary expressions as nestloop parameters.
createplan.c is also a few bricks shy of being able to handle such cases;
it misplaces the PlaceHolderVar parameters in the plan tree, which is why
the visible symptoms of this bug are "plan should not reference subplan's
variable" and "failed to assign all NestLoopParams to plan nodes" planner
errors.

Adding the necessary complexity to make this work doesn't seem like it
would be repaid in significantly better plans, because in cases where such
a PHV exists, there is probably a corresponding join order constraint that
would allow a good plan to be found without using the star-schema exception.
Furthermore, adding complexity to nodeNestloop.c would create a run-time
penalty even for plans where this whole consideration is irrelevant.
So let's just reject such paths instead.

Per fuzz testing by Andreas Seltenreich; the added regression test is based
on his example query.  Back-patch to 9.2, like the previous patch.
2015-08-04 14:55:50 -04:00
Robert Haas 369342cf70 Cap wal_buffers to avoid a server crash when it's set very large.
It must be possible to multiply wal_buffers by XLOG_BLCKSZ without
overflowing int, or calculations in StartupXLOG will go badly wrong
and crash the server.  Avoid that by imposing a maximum value on
wal_buffers.  This will be just under 2GB, assuming the usual value
for XLOG_BLCKSZ.

Josh Berkus, per an analysis by Andrew Gierth.
2015-08-04 12:58:54 -04:00
Robert Haas a6a2357820 Update comment to match behavior of latest code.
Peter Geoghegan
2015-08-04 11:45:29 -04:00
Heikki Linnakangas 804163bc25 Share transition state between different aggregates when possible.
If there are two different aggregates in the query with same inputs, and
the aggregates have the same initial condition and transition function,
only calculate the state value once, and only call the final functions
separately. For example, AVG(x) and SUM(x) aggregates have the same
transition function, which accumulates the sum and number of input tuples.
For a query like "SELECT AVG(x), SUM(x) FROM x", we can therefore
accumulate the state function only once, which gives a nice speedup.

David Rowley, reviewed and edited by me.
2015-08-04 17:53:10 +03:00
Stephen Frost dee0200f02 RLS: Keep deny policy when only restrictive exist
Only remove the default deny policy when a permissive policy exists
(either from the hook or defined by the user).  If only restrictive
policies exist then no rows will be visible, as restrictive policies
shouldn't make rows visible.  To address this requirement, a single
"USING (true)" permissive policy can be created.

Update the test_rls_hooks regression tests to create the necessary
"USING (true)" permissive policy.

Back-patch to 9.5 where RLS was added.

Per discussion with Dean.
2015-08-03 15:32:49 -04:00
Fujii Masao dd85acf0c4 Make recovery rename tablespace_map to *.old if backup_label is not present.
If tablespace_map file is present without backup_label file, there is
no use of such file.  There is no harm in retaining it, but it is better
to get rid of the map file so that we don't have any redundant file
in data directory and it will avoid any sort of confusion. It seems
prudent though to just rename the file out of the way rather than
delete it completely, also we ignore any error that occurs in rename
operation as even if map file is present without backup_label file,
it is harmless.

Back-patch to 9.5 where tablespace_map file was introduced.

Amit Kapila, reviewed by Robert Haas, Alvaro Herrera and me.
2015-08-03 23:04:41 +09:00
Tom Lane 09cecdf285 Fix a number of places that produced XX000 errors in the regression tests.
It's against project policy to use elog() for user-facing errors, or to
omit an errcode() selection for errors that aren't supposed to be "can't
happen" cases.  Fix all the violations of this policy that result in
ERRCODE_INTERNAL_ERROR log entries during the standard regression tests,
as errors that can reliably be triggered from SQL surely should be
considered user-facing.

I also looked through all the files touched by this commit and fixed
other nearby problems of the same ilk.  I do not claim to have fixed
all violations of the policy, just the ones in these files.

In a few places I also changed existing ERRCODE choices that didn't
seem particularly appropriate; mainly replacing ERRCODE_SYNTAX_ERROR
by something more specific.

Back-patch to 9.5, but no further; changing ERRCODE assignments in
stable branches doesn't seem like a good idea.
2015-08-02 23:49:19 -04:00
Tom Lane 13bba02271 Avoid calling memcpy() with a NULL source pointer and count == 0.
As in commit 0a52d378b0, avoid doing something that has undefined
results according to the C standard, even though in practice there does
not seem to be any problem with it.

This fixes two places in numeric.c that demonstrably could call memcpy()
with such arguments.  I looked through that file and didn't see any other
places with similar hazards; this is not to claim that there are not such
places in other files.

Per report from Piotr Stefaniak.  Back-patch to 9.5 which is where the
previous commit was added.  We're more or less setting a precedent that
we will not worry about this type of issue in pre-9.5 branches unless
someone demonstrates a problem in the field.
2015-08-02 15:48:31 -04:00
Tom Lane d73d14c271 Fix incorrect order of lock file removal and failure to close() sockets.
Commit c9b0cbe98b accidentally broke the
order of operations during postmaster shutdown: it resulted in removing
the per-socket lockfiles after, not before, postmaster.pid.  This creates
a race-condition hazard for a new postmaster that's started immediately
after observing that postmaster.pid has disappeared; if it sees the
socket lockfile still present, it will quite properly refuse to start.
This error appears to be the explanation for at least some of the
intermittent buildfarm failures we've seen in the pg_upgrade test.

Another problem, which has been there all along, is that the postmaster
has never bothered to close() its listen sockets, but has just allowed them
to close at process death.  This creates a different race condition for an
incoming postmaster: it might be unable to bind to the desired listen
address because the old postmaster is still incumbent.  This might explain
some odd failures we've seen in the past, too.  (Note: this is not related
to the fact that individual backends don't close their client communication
sockets.  That behavior is intentional and is not changed by this patch.)

Fix by adding an on_proc_exit function that closes the postmaster's ports
explicitly, and (in 9.3 and up) reshuffling the responsibility for where
to unlink the Unix socket files.  Lock file unlinking can stay where it
is, but teach it to unlink the lock files in reverse order of creation.
2015-08-02 14:55:03 -04:00
Heikki Linnakangas 358cde320b Fix race condition that lead to WALInsertLock deadlock with commit_delay.
If a call to WaitForXLogInsertionsToFinish() returned a value in the middle
of a page, and another backend then started to insert a record to the same
page, and then you called WaitXLogInsertionsToFinish() again, the second
call might return a smaller value than the first call. The problem was in
GetXLogBuffer(), which always updated the insertingAt value to the
beginning of the requested page, not the actual requested location. Because
of that, the second call might return a xlog pointer to the beginning of
the page, while the first one returned a later position on the same page.
XLogFlush() performs two calls to WaitXLogInsertionsToFinish() in
succession, and holds WALWriteLock on the second call, which can deadlock
if the second call to WaitXLogInsertionsToFinish() blocks.

Reported by Spiros Ioannou. Backpatch to 9.4, where the more scalable
WALInsertLock mechanism, and this bug, was introduced.
2015-08-02 20:08:10 +03:00