Commit Graph

28050 Commits

Author SHA1 Message Date
Robert Haas
0218e8b3fa Fix typos.
Jim Nasby
2016-03-17 07:26:20 -04:00
Peter Eisentraut
fc201dfd95 Add syslog_split_messages parameter
Reviewed-by: Andreas Karlsson <andreas@proxel.se>
2016-03-16 23:21:44 -04:00
Peter Eisentraut
f4c454e9ba Add syslog_sequence_numbers parameter
Reviewed-by: Andreas Karlsson <andreas@proxel.se>
2016-03-16 23:21:44 -04:00
Tom Lane
47211af17a Fix "pg_bench -C -M prepared".
This didn't work because when we dropped and re-established a database
connection, we did not bother to reset session-specific state such as
the statements-are-prepared flags.

The st->prepared[] array certainly needs to be flushed, and I cleared a
couple of other fields as well that couldn't possibly retain meaningful
state for a new connection.

In passing, fix some bogus comments and strange field order choices.

Per report from Robins Tharakan.
2016-03-16 23:18:07 -04:00
Tom Lane
5db5146431 Fix j2day() to behave sanely for negative Julian dates.
Somebody had apparently once figured that casting to unsigned int would
produce the right output for negative inputs, but that would only be
true if 2^32 were a multiple of 7, which of course it ain't.  We need
to use a signed division and then correct the sign of the remainder.

AFAICT, the only case where this would arise currently is when doing
ISO-week calculations for dates in 4714BC, where we'd compute a
negative Julian date representing 4714-01-04BC and then do some
arithmetic with it.  Since we don't even really document support for
such dates, this is not of much consequence.  But we may as well
get it right.

Per report from Vitaly Burovoy.
2016-03-16 20:57:45 -04:00
Tom Lane
a70e13a39e Be more careful about out-of-range dates and timestamps.
Tighten the semantics of boundary-case timestamptz so that we allow
timestamps >= '4714-11-24 00:00+00 BC' and < 'ENDYEAR-01-01 00:00+00 AD'
exactly, no more and no less, but it is allowed to enter timestamps
within that range using non-GMT timezone offsets (which could make the
nominal date 4714-11-23 BC or ENDYEAR-01-01 AD).  This eliminates
dump/reload failure conditions for timestamps near the endpoints.
To do this, separate checking of the inputs for date2j() from the
final range check, and allow the Julian date code to handle a range
slightly wider than the nominal range of the datatypes.

Also add a bunch of checks to detect out-of-range dates and timestamps
that formerly could be returned by operations such as date-plus-integer.
All C-level functions that return date, timestamp, or timestamptz should
now be proof against returning a value that doesn't pass IS_VALID_DATE()
or IS_VALID_TIMESTAMP().

Vitaly Burovoy, reviewed by Anastasia Lubennikova, and substantially
whacked around by me
2016-03-16 19:09:28 -04:00
Robert Haas
f2b74b01d4 Another comment update.
I thought this was in my last commit, but I goofed.
2016-03-16 14:28:25 -04:00
Robert Haas
bc55cc0b6a Fix problems in commit c16dc1aca5.
Vinayak Pokale provided a patch for a copy-and-paste error in a
comment.  I noticed that I'd use the word "automatically" nearby where
I meant to talk about things being "atomic".  Rahila Syed spotted a
misplaced counter update.  Fix all that stuff.
2016-03-16 13:54:04 -04:00
Robert Haas
c6dda1f48e Add idle_in_transaction_session_timeout.
Vik Fearing, reviewed by Stéphane Schildknecht and me, and revised
slightly by me.
2016-03-16 11:30:45 -04:00
Peter Eisentraut
f9e5ed61ed UCS_to_EUC_JIS_2004.pl: Turn off "test" mode by default
It produces debugging output files that are of no further use, so we
don't need that by default.
2016-03-16 10:43:05 -04:00
Peter Eisentraut
9dbcb500ca Make spacing and punctuation consistent 2016-03-16 10:43:05 -04:00
Robert Haas
3aff33aa68 Fix typos.
Oskari Saarenmaa
2016-03-15 18:06:11 -04:00
Stephen Frost
fd658dbb30 Avoid incorrectly indicating exclusion constraint wait
INSERT ... ON CONFLICT's precheck may have to wait on the outcome of
another insertion, which may or may not itself be a speculative
insertion.  This wait is not necessarily associated with an exclusion
constraint, but was always reported that way in log messages if the wait
happened to involve a tuple that had no speculative token.

Initially discovered through use of ON CONFLICT DO NOTHING, where
spurious references to exclusion constraints in log messages were more
likely.

Patch by Peter Geoghegan.
Reviewed by Julien Rouhaud.

Back-patch to 9.5 where INSERT ... ON CONFLICT was added.
2016-03-15 18:04:39 -04:00
Alvaro Herrera
5bcc413f80 Fix typos in comments 2016-03-15 17:57:17 -03:00
Robert Haas
c16dc1aca5 Add simple VACUUM progress reporting.
There's a lot more that could be done here yet - in particular, this
reports only very coarse-grained information about the index vacuuming
phase - but even as it stands, the new pg_stat_progress_vacuum can
tell you quite a bit about what a long-running vacuum is actually
doing.

Amit Langote and Robert Haas, based on earlier work by Vinayak Pokale
and Rahila Syed.
2016-03-15 13:32:56 -04:00
Tom Lane
0e9b89986b Cope if platform declares mbstowcs_l(), but not locale_t, in <xlocale.h>.
Previously, we included <xlocale.h> only if necessary to get the definition
of type locale_t.  According to notes in PGAC_TYPE_LOCALE_T, this is
important because on some versions of glibc that file supplies an
incompatible declaration of locale_t.  (This info may be obsolete, because
on my RHEL6 box that seems to be the *only* definition of locale_t; but
there may still be glibc's in the wild for which it's a live concern.)

It turns out though that on FreeBSD and maybe other BSDen, you can get
locale_t from stdlib.h or locale.h but mbstowcs_l() and friends only from
<xlocale.h>.  This was leaving us compiling calls to mbstowcs_l() and
friends with no visible prototype, which causes a warning and could
possibly cause actual trouble, since it's not declared to return int.

Hence, adjust the configure checks so that we'll include <xlocale.h>
either if it's necessary to get type locale_t or if it's necessary to
get a declaration of mbstowcs_l().

Report and patch by Aleksander Alekseev, somewhat whacked around by me.
Back-patch to all supported branches, since we have been using
mbstowcs_l() since 9.1.
2016-03-15 13:19:57 -04:00
Tom Lane
101fd9349e Add a GetForeignUpperPaths callback function for FDWs.
This is basically like the just-added create_upper_paths_hook, but
control is funneled only to the FDW responsible for all the baserels
of the current query; so providing such a callback is much less likely
to add useless overhead than using the hook function is.

The documentation is a bit sketchy.  We'll likely want to improve it,
and/or adjust the call conventions, when we get some experience with
actually using this callback.  Hopefully somebody will find time to
experiment with it before 9.6 feature freeze.
2016-03-14 20:04:48 -04:00
Peter Eisentraut
be6de4c121 Add missing include for self-containment 2016-03-14 19:56:33 -04:00
Robert Haas
270b7daf5c Fix EXPLAIN ANALYZE SELECT INTO not to choose a parallel plan.
We don't support any parallel write operations at present, so choosing
a parallel plan causes us to error out.  Also, add a new regression
test that uses EXPLAIN ANALYZE SELECT INTO; if we'd had this previously,
force_parallel_mode testing would have caught this issue.

Mithun Cy and Robert Haas
2016-03-14 19:48:46 -04:00
Tom Lane
5864d6a4b6 Provide a planner hook at a suitable place for creating upper-rel Paths.
In the initial revision of the upper-planner pathification work, the only
available way for an FDW or custom-scan provider to inject Paths
representing post-scan-join processing was to insert them during scan-level
GetForeignPaths or similar processing.  While that's not impossible, it'd
require quite a lot of duplicative processing to look forward and see if
the extension would be capable of implementing the whole query.  To improve
matters for custom-scan providers, provide a hook function at the point
where the core code is about to start filling in upperrel Paths.  At this
point Paths are available for the whole scan/join tree, which should reduce
the amount of redundant effort considerably.

(An alternative design that was suggested was to provide a separate hook
for each post-scan-join processing step, but that seems messy and not
clearly more useful.)

Following our time-honored tradition, there's no documentation for this
hook outside the source code.

As-is, this hook is only meant for custom scan providers, which we can't
assume very much about.  A followon patch will implement an FDW callback
to let FDWs do the same thing in a somewhat more structured fashion.
2016-03-14 19:23:29 -04:00
Tom Lane
28048cbaa2 Allow callers of create_foreignscan_path to specify nondefault PathTarget.
Although the default choice of rel->reltarget should typically be
sufficient for scan or join paths, it's not at all sufficient for the
purposes PathTargets were invented for; in particular not for
upper-relation Paths.  So break API compatibility by adding a PathTarget
argument to create_foreignscan_path().  To ease updating of existing
code, accept a NULL value of the argument as selecting rel->reltarget.
2016-03-14 17:31:28 -04:00
Tom Lane
307c78852f Rethink representation of PathTargets.
In commit 19a541143a I did not make PathTarget a subtype of Node,
and embedded a RelOptInfo's reltarget directly into it rather than having
a separately-allocated Node.  In hindsight that was misguided
micro-optimization, enabled by the fact that at that point we didn't have
any Paths with custom PathTargets.  Now that PathTarget processing has
been fleshed out some more, it's easier to see that it's better to have
PathTarget as an indepedent Node type, even if it does cost us one more
palloc to create a RelOptInfo.  So change it while we still can.

This commit just changes the representation, without doing anything more
interesting than that.
2016-03-14 16:59:59 -04:00
Tom Lane
07341a2980 Update PL/Perl's comment about hv_store().
Negative klen is documented since Perl 5.16, and 5.6 is no longer
supported so no need to comment about it.

Dagfinn Ilmari Mannsåker
2016-03-14 14:45:45 -04:00
Tom Lane
f3f3aae4b7 Improve conversions from uint64 to Perl types.
Perl's integers are pointer-sized, so can hold more than INT_MAX on LP64
platforms, and come in both signed (IV) and unsigned (UV).  Floating
point values (NV) may also be larger than double.

Since Perl 5.19.4 array indices are SSize_t instead of I32, so allow up
to SSize_t_max on those versions.  The limit is not imposed just by
av_extend's argument type, but all the array handling code, so remove
the speculative comment.

Dagfinn Ilmari Mannsåker
2016-03-14 14:38:44 -04:00
Robert Haas
6be84eeb8d Update more comments for 96198d94cb.
Etsuro Fujita, reviewed (though not completely endorsed) by Ashutosh
Bapat, and slightly expanded by me.
2016-03-14 14:29:12 -04:00
Tom Lane
74a379b984 Use repalloc_huge() to enlarge a SPITupleTable's tuple pointer array.
Commit 23a27b039d widened the rows-stored counters to uint64, but
that's academic unless we allow the tuple pointer array to exceed 1GB.

(It might be a good idea to provide some other limit on how much storage
a SPITupleTable can eat.  On the other hand, there are plenty of other
ways to drive a backend into swap hell.)

Dagfinn Ilmari Mannsåker
2016-03-14 14:22:34 -04:00
Robert Haas
3adf9ced17 Improve check for overly-long extensible node name.
The old code is bad for two reasons.  First, it has an off-by-one
error.  Second, it won't help if you aren't running with assertions
enabled.  Per discussion, we want a check here in that case too.

Author: KaiGai Kohei, adjusted by me.
Reviewed-by: Petr Jelinek
Discussion: 56E0D547.1030101@2ndquadrant.com
2016-03-14 13:52:52 -04:00
Tom Lane
2da7549987 pg_stat_get_progress_info() should be marked STRICT.
I didn't bother with a catversion bump.

Report and patch by Thomas Munro
2016-03-14 12:51:55 -04:00
Tom Lane
ab4ff2889d Fix memory leak in repeated GIN index searches.
Commit d88976cfa1 removed this code from ginFreeScanKeys():
-		if (entry->list)
-			pfree(entry->list);
evidently in the belief that that ItemPointer array is allocated in the
keyCtx and so would be reclaimed by the following MemoryContextReset.
Unfortunately, it isn't and it won't.  It'd likely be a good idea for
that to become so, but as a simple and back-patchable fix in the
meantime, restore this code to ginFreeScanKeys().

Also, add a similar pfree to where startScanEntry() is about to zero out
entry->list.  I am not sure if there are any code paths where this
change prevents a leak today, but it seems like cheap future-proofing.

In passing, make the initial allocation of so->entries[] use palloc
not palloc0.  The code doesn't depend on unused entries being zero;
if it did, the array-enlargement code in ginFillScanEntry() would be
wrong.  So using palloc0 initially can only serve to confuse readers
about what the invariant is.

Per report from Felipe de Jesús Molina Bravo, via Jaime Casanova in
<CAJGNTeMR1ndMU2Thpr8GPDUfiHTV7idELJRFusA5UXUGY1y-eA@mail.gmail.com>
2016-03-13 16:44:31 -04:00
Peter Eisentraut
96adb14d93 Fix whitespace and remove obsolete gitattributes entry 2016-03-13 16:03:13 -04:00
Magnus Hagander
a1aa8b7ea0 Fix order of MemSet arguments
Noted by Tomas Vondra
2016-03-13 13:11:06 +01:00
Tom Lane
4b980167cb Report memory context stats upon out-of-memory in repalloc[_huge].
This longstanding functionality evidently got lost in commit
3d6d1b5855.  Noted while studying an OOM report from Jaime
Casanova.  Backpatch to 9.5 where the bug was introduced.
2016-03-13 00:21:07 -05:00
Tom Lane
ab737f6ba9 Fix Windows portability issue in 23a27b039d.
_strtoui64() is available in MSVC builds, but apparently not with
other Windows toolchains.  Thanks to Petr Jelinek for the diagnosis.
2016-03-12 22:34:47 -05:00
Tom Lane
fc7a9dfddb Get rid of scribbling on a const variable in psql's print.c.
Commit a2dabf0e1d had the bright idea that it could modify a "const"
global variable if it merely casted away const from a pointer.  This does
not work on platforms where the compiler puts "const" variables into
read-only storage.  Depressingly, we evidently have no such platforms in
our buildfarm ... an oversight I have now remedied.  (The one platform
that is known to catch this is recent OS X with -fno-common.)

Per report from Chris Ruprecht.  Back-patch to 9.5 where the bogus
code was introduced.
2016-03-12 18:16:24 -05:00
Tom Lane
23a27b039d Widen query numbers-of-tuples-processed counters to uint64.
This patch widens SPI_processed, EState's es_processed field, PortalData's
portalPos field, FuncCallContext's call_cntr and max_calls fields,
ExecutorRun's count argument, PortalRunFetch's result, and the max number
of rows in a SPITupleTable to uint64, and deals with (I hope) all the
ensuing fallout.  Some of these values were declared uint32 before, and
others "long".

I also removed PortalData's posOverflow field, since that logic seems
pretty useless given that portalPos is now always 64 bits.

The user-visible results are that command tags for SELECT etc will
correctly report tuple counts larger than 4G, as will plpgsql's GET
GET DIAGNOSTICS ... ROW_COUNT command.  Queries processing more tuples
than that are still not exactly the norm, but they're becoming more
common.

Most values associated with FETCH/MOVE distances, such as PortalRun's count
argument and the count argument of most SPI functions that have one, remain
declared as "long".  It's not clear whether it would be worth promoting
those to int64; but it would definitely be a large dollop of additional
API churn on top of this, and it would only help 32-bit platforms which
seem relatively less likely to see any benefit.

Andreas Scherbaum, reviewed by Christian Ullrich, additional hacking by me
2016-03-12 16:05:29 -05:00
Andres Freund
e01157500f Include portability/mem.h into fd.c for MAP_FAILED.
Buildfarm members gaur and pademelon are old enough not to know about
MAP_FAILED; which is used in 428b1d6. Include portability/mem.h to fix;
as already done in a bunch of other places.
2016-03-12 12:16:48 -08:00
Tom Lane
570be1f73f Re-export a few of createplan.c's make_xxx() functions.
CitusDB is using these and don't wish to redesign their code right now.
I am not on board with this being a good idea, or a good precedent,
but I lack the energy to fight about it.
2016-03-12 12:12:59 -05:00
Robert Haas
7087166a88 pg_upgrade: Convert old visibility map format to new format.
Commit a892234f83 added a second bit per
page to the visibility map, but pg_upgrade has been unaware of it up
until now.  Therefore, a pg_upgrade from an earlier major release of
PostgreSQL to any commit preceding this one and following the one
mentioned above would result in invalid visibility map contents on the
new cluster, very possibly leading to data corruption.  This plugs
that hole.

Masahiko Sawada, reviewed by Jeff Janes, Bruce Momjian, Simon Riggs,
Michael Paquier, Andres Freund, me, and others.
2016-03-11 12:34:20 -05:00
Tom Lane
9118d03a8c When appropriate, postpone SELECT output expressions till after ORDER BY.
It is frequently useful for volatile, set-returning, or expensive functions
in a SELECT's targetlist to be postponed till after ORDER BY and LIMIT are
done.  Otherwise, the functions might be executed for every row of the
table despite the presence of LIMIT, and/or be executed in an unexpected
order.  For example, in
	SELECT x, nextval('seq') FROM tab ORDER BY x LIMIT 10;
it's probably desirable that the nextval() values are ordered the same
as x, and that nextval() is not run more than 10 times.

In the past, Postgres was inconsistent in this area: you would get the
desirable behavior if the ordering were performed via an indexscan, but
not if it had to be done by an explicit sort step.  Getting the desired
behavior reliably required contortions like
	SELECT x, nextval('seq')
	  FROM (SELECT x FROM tab ORDER BY x) ss LIMIT 10;

This patch conditionally postpones evaluation of pure-output target
expressions (that is, those that are not used as DISTINCT, ORDER BY, or
GROUP BY columns) so that they effectively occur after sorting, even if an
explicit sort step is necessary.  Volatile expressions and set-returning
expressions are always postponed, so as to provide consistent semantics.
Expensive expressions (costing more than 10 times typical operator cost,
which by default would include any user-defined function) are postponed
if there is a LIMIT or if there are expressions that must be postponed.

We could be more aggressive and postpone any nontrivial expression, but
there are costs associated with doing so: it requires an extra Result plan
node which adds some overhead, and postponement changes the volume of data
going through the sort step, perhaps for the worse.  Since we tend not to
have very good estimates of the output width of nontrivial expressions,
it's hard to have much confidence in our ability to predict whether
postponement would increase or decrease the cost of the sort; therefore
this patch doesn't attempt to make decisions conditionally on that.
Between these factors and a general desire not to change query behavior
when there's not a demonstrable benefit, it seems best to be conservative
about applying postponement.  We might tweak the decision rules in the
future, though.

Konstantin Knizhnik, heavily rewritten by me
2016-03-11 12:27:50 -05:00
Teodor Sigaev
b1fdc727c3 Fix Windows build broken in 6943a946c7
Also it fixes dynamic array allocation disallowed by ANSI-C.

Author: Stas Kelvich
2016-03-11 20:10:20 +03:00
Teodor Sigaev
8829af47ef Fix merge affixes for numeric ones
Some dictionaries have duplicated base words with different affix set, we
just merge that sets into one set. But previously merging of sets of affixes
was actually a concatenation of strings but it's wrong for numeric
representation of affixes because such representation uses comma to
separate affixes.

Author: Artur Zakirov
2016-03-11 19:47:50 +03:00
Teodor Sigaev
a9eb6c83ef Bump catalog version missed in 6943a946c7 2016-03-11 19:31:04 +03:00
Teodor Sigaev
6943a946c7 Tsvector editing functions
Adds several tsvector editting function: convert tsvector to/from text array,
set weight for given lexemes, delete lexeme(s), unnest, filter lexemes
with given weights

Author: Stas Kelvich with some editorization by me
Reviewers: Tomas Vondram, Teodor Sigaev
2016-03-11 19:22:36 +03:00
Tom Lane
49635d7b3e Minor additional refactoring of planner.c's PathTarget handling.
Teach make_group_input_target() and make_window_input_target() to work
entirely with the PathTarget representation of tlists, rather than
constructing a tlist and immediately deconstructing it into PathTarget
format.  In itself this only saves a few palloc's; the bigger picture is
that it opens the door for sharing cost_qual_eval work across all of
planner.c's constructions of PathTargets.  I'll come back to that later.

In support of this, flesh out tlist.c's infrastructure for PathTargets
a bit more.
2016-03-11 10:24:55 -05:00
Robert Haas
69ab7b9d6c psql: Don't automatically use expanded format when there's 1 column.
Andreas Karlsson and Robert Haas
2016-03-11 08:04:01 -05:00
Robert Haas
481c76abf4 Fix a typo, and remove unnecessary pgstat_report_wait_end().
Per Amit Kapila.
2016-03-11 07:34:00 -05:00
Magnus Hagander
38c83c9b75 Refactor receivelog.c parameters
Much cruft had accumulated over time with a large number of parameters
passed down between functions very deep. With this refactoring, instead
introduce a StreamCtl structure that holds the parameters, and pass around
a pointer to this structure instead. This makes it much easier to add or
remove fields that are needed deeper down in the implementation without
having to modify every function header in the file.

Patch by me after much nagging from Andres
Reviewed by Craig Ringer and Daniel Gustafsson
2016-03-11 11:15:12 +01:00
Simon Riggs
73e7e49da3 Allow emit_log_hook to see original message text
emit_log_hook could only see the translated text, making it harder to identify
which message was being sent. Pass original text to allow the exact message to
be identified, whichever language is used for logging.

Discussion: 20160216.184755.59721141.horiguchi.kyotaro@lab.ntt.co.jp
Author: Kyotaro Horiguchi
2016-03-11 09:53:06 +00:00
Robert Haas
a414d96ad2 Simplify GetLockNameFromTagType.
The old code is wrong, because it returns a pointer to an automatic
variable.  And it's also more clever than we really need to be
considering that the case it's worrying about should never happen.
2016-03-10 21:37:22 -05:00
Andres Freund
c94f0c29ce Blindly try to fix dtrace enabled builds, broken in 9cd00c45.
Reported-By: Peter Eisentraut
Discussion: 56E2239E.1050607@gmx.net
2016-03-10 17:51:03 -08:00
Andres Freund
9cd00c457e Checkpoint sorting and balancing.
Up to now checkpoints were written in the order they're in the
BufferDescriptors. That's nearly random in a lot of cases, which
performs badly on rotating media, but even on SSDs it causes slowdowns.

To avoid that, sort checkpoints before writing them out. We currently
sort by tablespace, relfilenode, fork and block number.

One of the major reasons that previously wasn't done, was fear of
imbalance between tablespaces. To address that balance writes between
tablespaces.

The other prime concern was that the relatively large allocation to sort
the buffers in might fail, preventing checkpoints from happening. Thus
pre-allocate the required memory in shared memory, at server startup.

This particularly makes it more efficient to have checkpoint flushing
enabled, because that'll often result in a lot of writes that can be
coalesced into one flush.

Discussion: alpine.DEB.2.10.1506011320000.28433@sto
Author: Fabien Coelho and Andres Freund
2016-03-10 17:05:09 -08:00
Andres Freund
428b1d6b29 Allow to trigger kernel writeback after a configurable number of writes.
Currently writes to the main data files of postgres all go through the
OS page cache. This means that some operating systems can end up
collecting a large number of dirty buffers in their respective page
caches.  When these dirty buffers are flushed to storage rapidly, be it
because of fsync(), timeouts, or dirty ratios, latency for other reads
and writes can increase massively.  This is the primary reason for
regular massive stalls observed in real world scenarios and artificial
benchmarks; on rotating disks stalls on the order of hundreds of seconds
have been observed.

On linux it is possible to control this by reducing the global dirty
limits significantly, reducing the above problem. But global
configuration is rather problematic because it'll affect other
applications; also PostgreSQL itself doesn't always generally want this
behavior, e.g. for temporary files it's undesirable.

Several operating systems allow some control over the kernel page
cache. Linux has sync_file_range(2), several posix systems have msync(2)
and posix_fadvise(2). sync_file_range(2) is preferable because it
requires no special setup, whereas msync() requires the to-be-flushed
range to be mmap'ed. For the purpose of flushing dirty data
posix_fadvise(2) is the worst alternative, as flushing dirty data is
just a side-effect of POSIX_FADV_DONTNEED, which also removes the pages
from the page cache.  Thus the feature is enabled by default only on
linux, but can be enabled on all systems that have any of the above
APIs.

While desirable and likely possible this patch does not contain an
implementation for windows.

With the infrastructure added, writes made via checkpointer, bgwriter
and normal user backends can be flushed after a configurable number of
writes. Each of these sources of writes controlled by a separate GUC,
checkpointer_flush_after, bgwriter_flush_after and backend_flush_after
respectively; they're separate because the number of flushes that are
good are separate, and because the performance considerations of
controlled flushing for each of these are different.

A later patch will add checkpoint sorting - after that flushes from the
ckeckpoint will almost always be desirable. Bgwriter flushes are most of
the time going to be random, which are slow on lots of storage hardware.
Flushing in backends works well if the storage and bgwriter can keep up,
but if not it can have negative consequences.  This patch is likely to
have negative performance consequences without checkpoint sorting, but
unfortunately so has sorting without flush control.

Discussion: alpine.DEB.2.10.1506011320000.28433@sto
Author: Fabien Coelho and Andres Freund
2016-03-10 17:04:34 -08:00
Tom Lane
c82c92b111 Give pull_var_clause() reject/recurse/return behavior for WindowFuncs too.
All along, this function should have treated WindowFuncs in a manner
similar to Aggrefs, ie with an option whether or not to recurse into them.
By not considering the case, it was always recursing, which is OK for most
callers (although I suspect that the case in prepare_sort_from_pathkeys
might represent a bug).  But now we need return-without-recursing behavior
as well.  There are also more than a few callers that should never see a
WindowFunc, and now we'll get some error checking on that.
2016-03-10 16:23:52 -05:00
Robert Haas
fd31cd2651 Don't vacuum all-frozen pages.
Commit a892234f83 gave us enough
infrastructure to avoid vacuuming pages where every tuple on the
page is already frozen.  So, replace the notion of a scan_all or
whole-table vacuum with the less onerous notion of an "aggressive"
vacuum, which will pages that are all-visible, but still skip those
that are all-frozen.

This should greatly reduce the cost of anti-wraparound vacuuming
on large clusters where the majority of data is never touched
between one cycle and the next, because we'll no longer have to
read all of those pages only to find out that we don't need to
do anything with them.

Patch by me, reviewed by Masahiko Sawada.
2016-03-10 16:14:42 -05:00
Tom Lane
364a9f47ab Refactor pull_var_clause's API to make it less tedious to extend.
In commit 1d97c19a0f and later c1d9579dd8, we extended
pull_var_clause's API by adding enum-type arguments.  That's sort of a pain
to maintain, though, because it means every time we add a new behavior we
must touch every last one of the call sites, even if there's a reasonable
default behavior that most of them could use.  Let's switch over to using a
bitmask of flags, instead; that seems more maintainable and might save a
nanosecond or two as well.  This commit changes no behavior in itself,
though I'm going to follow it up with one that does add a new behavior.

In passing, remove flatten_tlist(), which has not been used since 9.1
and would otherwise need the same API changes.

Removing these enums means that optimizer/tlist.h no longer needs to
depend on optimizer/var.h.  Changing that caused a number of C files to
need addition of #include "optimizer/var.h" (probably we can thank old
runs of pgrminclude for that); but on balance it seems like a good change
anyway.
2016-03-10 15:53:07 -05:00
Simon Riggs
37c54863cf Rework wait for AccessExclusiveLocks on Hot Standby
Earlier version committed in 9.0 caused spurious waits in some cases.
New infrastructure for lock waits in 9.3 used to correct and improve this.

Jeff Janes based upon a proposal by Simon Riggs, who also reviewed
Additional review comments from Amit Kapila
2016-03-10 19:26:24 +00:00
Robert Haas
53be0b1add Provide much better wait information in pg_stat_activity.
When a process is waiting for a heavyweight lock, we will now indicate
the type of heavyweight lock for which it is waiting.  Also, you can
now see when a process is waiting for a lightweight lock - in which
case we will indicate the individual lock name or the tranche, as
appropriate - or for a buffer pin.

Amit Kapila, Ildus Kurbangaliev, reviewed by me.  Lots of helpful
discussion and suggestions by many others, including Alexander
Korotkov, Vladimir Borodin, and many others.
2016-03-10 12:44:09 -05:00
Magnus Hagander
9d90388247 Avoid crash on old Windows with AVX2-capable CPU for VS2013 builds
The Visual Studio 2013 CRT generates invalid code when it makes a 64-bit
build that is later used on a CPU that supports AVX2 instructions using a
version of Windows before 7SP1/2008R2SP1.

Detect this combination, and in those cases turn off the generation of
FMA3, per recommendation from the Visual Studio team.

The bug is actually in the CRT shipping with Visual Studio 2013, but
Microsoft have stated they're only fixing it in newer major versions.
The fix is therefor conditioned specifically on being built with this
version of Visual Studio, and not previous or later versions.

Author: Christian Ullrich
2016-03-10 14:10:18 +01:00
Simon Riggs
e0694cf9c7 Reduce size of two phase file header
Previously 2PC header was fixed at 200 bytes, which in most cases wasted
WAL space for a workload using 2PC heavily.

Pavan Deolasee, reviewed by Petr Jelinek
2016-03-10 12:51:46 +00:00
Simon Riggs
fcb4bfddb6 Reduce lock level for altering fillfactor
Fabrízio de Royes Mello and Simon Riggs
2016-03-10 12:07:33 +00:00
Robert Haas
090b287fc5 Code review for b6fb6471f6.
Reports by Tomas Vondra, Vinayak Pokale, and Aleksander Alekseev.
Patch by Amit Langote.
2016-03-10 06:07:57 -05:00
Tom Lane
cc402116ca Remove a couple of useless pstrdup() calls.
There's no point in pstrdup'ing the result of TextDatumGetCString,
since that's necessarily already a freshly-palloc'd C string.

These particular calls are unlikely to be of any consequence
performance-wise, but still they're a bad precedent that can confuse
future patch authors.

Noted by Chapman Flack.
2016-03-09 23:29:05 -05:00
Andres Freund
1d4a0ab19a Avoid unlikely data-loss scenarios due to rename() without fsync.
Renaming a file using rename(2) is not guaranteed to be durable in face
of crashes. Use the previously added durable_rename()/durable_link_or_rename()
in various places where we previously just renamed files.

Most of the changed call sites are arguably not critical, but it seems
better to err on the side of too much durability.  The most prominent
known case where the previously missing fsyncs could cause data loss is
crashes at the end of a checkpoint. After the actual checkpoint has been
performed, old WAL files are recycled. When they're filled, their
contents are fdatasynced, but we did not fsync the containing
directory. An OS/hardware crash in an unfortunate moment could then end
up leaving that file with its old name, but new content; WAL replay
would thus not replay it.

Reported-By: Tomas Vondra
Author: Michael Paquier, Tomas Vondra, Andres Freund
Discussion: 56583BDD.9060302@2ndquadrant.com
Backpatch: All supported branches
2016-03-09 18:53:53 -08:00
Andres Freund
606e0f9841 Introduce durable_rename() and durable_link_or_rename().
Renaming a file using rename(2) is not guaranteed to be durable in face
of crashes; especially on filesystems like xfs and ext4 when mounted
with data=writeback. To be certain that a rename() atomically replaces
the previous file contents in the face of crashes and different
filesystems, one has to fsync the old filename, rename the file, fsync
the new filename, fsync the containing directory.  This sequence is not
generally adhered to currently; which exposes us to data loss risks. To
avoid having to repeat this arduous sequence, introduce
durable_rename(), which wraps all that.

Also add durable_link_or_rename(). Several places use link() (with a
fallback to rename()) to rename a file, trying to avoid replacing the
target file out of paranoia. Some of those rename sequences need to be
durable as well. There seems little reason extend several copies of the
same logic, so centralize the link() callers.

This commit does not yet make use of the new functions; they're used in
a followup commit.

Author: Michael Paquier, Andres Freund
Discussion: 56583BDD.9060302@2ndquadrant.com
Backpatch: All supported branches
2016-03-09 18:53:53 -08:00
Alvaro Herrera
28f6df3c36 PostgresNode: add backup_fs_hot and backup_fs_cold
These simple methods rely on RecursiveCopy to create a filesystem-level
backup of a server.  They aren't currently used anywhere yet,but will be
useful for future tests.

Author: Craig Ringer
Reviewed-By: Michael Paquier, Salvador Fandino, Álvaro Herrera
Commitfest-URL: https://commitfest.postgresql.org/9/569/
2016-03-09 19:54:03 -03:00
Alvaro Herrera
a31aaec406 Add filter capability to RecursiveCopy::copypath
This allows skipping copying certain files and subdirectories in tests.
This is useful in some circumstances such as copying a data directory;
future tests want this feature.

Also POD-ify the module.

Authors: Craig Ringer, Pallavi Sontakke
Reviewed-By: Álvaro Herrera
2016-03-09 18:00:31 -03:00
Tom Lane
a298a1e06f Fix incorrect handling of NULL index entries in indexed ROW() comparisons.
An index search using a row comparison such as ROW(a, b) > ROW('x', 'y')
would stop upon reaching a NULL entry in the "b" column, ignoring the
fact that there might be non-NULL "b" values associated with later values
of "a".  This happens because _bt_mark_scankey_required() marks the
subsidiary scankey for "b" as required, which is just wrong: it's for
a column after the one with the first inequality key (namely "a"), and
thus can't be considered a required match.

This bit of brain fade dates back to the very beginnings of our support
for indexed ROW() comparisons, in 2006.  Kind of astonishing that no one
came across it before Glen Takahashi, in bug #14010.

Back-patch to all supported versions.

Note: the given test case doesn't actually fail in unpatched 9.1, evidently
because the fix for bug #6278 (i.e., stopping at nulls in either scan
direction) is required to make it fail.  I'm sure I could devise a case
that fails in 9.1 as well, perhaps with something involving making a cursor
back up; but it doesn't seem worth the trouble.
2016-03-09 14:51:22 -05:00
Robert Haas
be060cbcd4 Re-pgindent vacuumlazy.c. 2016-03-09 13:51:11 -05:00
Robert Haas
accf7616ff pgbench: When -T is used, don't wait for transactions beyond end of run.
At low rates, this can lead to pgbench taking significantly longer to
terminate than the user might expect.  Repair.

Fabien Coelho, reviewed by Aleksander Alekseev, Álvaro Herrera, and me.
2016-03-09 13:11:05 -05:00
Robert Haas
b6fb6471f6 Add a generic command progress reporting facility.
Using this facility, any utility command can report the target relation
upon which it is operating, if there is one, and up to 10 64-bit
counters; the intent of this is that users should be able to figure out
what a utility command is doing without having to resort to ugly hacks
like attaching strace to a backend.

As a demonstration, this adds very crude reporting to lazy vacuum; we
just report the target relation and nothing else.  A forthcoming patch
will make VACUUM report a bunch of additional data that will make this
much more interesting.  But this gets the basic framework in place.

Vinayak Pokale, Rahila Syed, Amit Langote, Robert Haas, reviewed by
Kyotaro Horiguchi, Jim Nasby, Thom Brown, Masahiko Sawada, Fujii Masao,
and Masanori Oyama.
2016-03-09 12:08:58 -05:00
Tom Lane
8776c15c85 Fix incorrect tlist generation in create_gather_plan().
This function is written as though Gather doesn't project; but it does.
Even if it did not project, though, we must use build_path_tlist to ensure
that the output columns receive correct sortgroupref labeling.

Per report from Amit Kapila.
2016-03-09 10:56:46 -05:00
Tom Lane
d31f20e2b5 Fix copy-and-pasteo in comment.
Wensheng Zhang
2016-03-09 10:29:14 -05:00
Tom Lane
51c0f63e4d Improve handling of pathtargets in planner.c.
Refactor so that the internal APIs in planner.c deal in PathTargets not
targetlists, and establish a more regular structure for deriving the
targets needed for successive steps.

There is more that could be done here; calculating the eval costs of each
successive target independently is both inefficient and wrong in detail,
since we won't actually recompute values available from the input node's
tlist.  But it's no worse than what happened before the pathification
rewrite.  In any case this seems like a good starting point for considering
how to handle Konstantin Knizhnik's function-evaluation-postponement patch.
2016-03-09 01:12:16 -05:00
Andres Freund
2f1f443930 Add valgrind suppressions for python code.
Python's allocator does some low-level tricks for efficiency;
unfortunately they trigger valgrind errors. Those tricks can be disabled
making instrumentation easier; but few people testing postgres will have
such a build of python. So add broad suppressions of the resulting
errors.

See also https://svn.python.org/projects/python/trunk/Misc/README.valgrind

This possibly will suppress valid errors, but without it it's basically
impossible to use valgrind with plpython code.

Author: Andres Freund
Backpatch: 9.4, where we started to maintain valgrind suppressions
2016-03-08 19:40:58 -08:00
Andres Freund
5e43bee830 Add valgrind suppressions for bootstrap related code.
Author: Andres Freund
Backpatch: 9.4, where we started to maintain valgrind suppressions
2016-03-08 19:40:58 -08:00
Tom Lane
9e8b99420f Improve handling of group-column indexes in GroupingSetsPath.
Instead of having planner.c compute a groupColIdx array and store it in
GroupingSetsPaths, make create_groupingsets_plan() find the grouping
columns by searching in the child plan node's tlist.  Although that's
probably a bit slower for create_groupingsets_plan(), it's more like
the way every other plan node type does this, and it provides positive
confirmation that we know which child output columns we're supposed to be
grouping on.  (Indeed, looking at this now, I'm not at all sure that it
wasn't broken before, because create_groupingsets_plan() isn't demanding
an exact tlist match from its child node.)  Also, this allows substantial
simplification in planner.c, because it no longer needs to compute the
groupColIdx array at all; no other cases were using it.

I'd intended to put off this refactoring until later (like 9.7), but
in view of the likely bug fix and the need to rationalize planner.c's
tlist handling so we can do something sane with Konstantin Knizhnik's
function-evaluation-postponement patch, I think it can't wait.
2016-03-08 22:32:11 -05:00
Peter Eisentraut
a40814d7aa Handle invalid libpq sockets in more places
Also, make error messages consistent.

From: Michael Paquier <michael.paquier@gmail.com>
2016-03-08 21:10:33 -05:00
Peter Eisentraut
a2fd62dd53 Suppress GCC 6 warning about self-comparison
Reviewed-by: Thomas Munro <thomas.munro@enterprisedb.com>
2016-03-08 19:41:51 -05:00
Peter Eisentraut
92d4294d4b psql: Fix some strange code in SQL help creation
Struct QL_HELP used to be defined as static in the sql_help.h header
file, which is included in sql_help.c and help.c, thus creating two
separate instances of the struct.  This causes a warning from GCC 6,
because the struct is not used in sql_help.c.

Instead, declare the struct as extern in the header file and define it
in sql_help.c.  This also allows making a bunch of functions static
because they are no longer needed outside of sql_help.c.

Reviewed-by: Thomas Munro <thomas.munro@enterprisedb.com>
2016-03-08 19:41:51 -05:00
Peter Eisentraut
0d0644dce8 ecpg: Fix typo
GCC 6 points out the redundant conditions, which were apparently typos.

Reviewed-by: Thomas Munro <thomas.munro@enterprisedb.com>
2016-03-08 19:41:51 -05:00
Tom Lane
61fd218930 Fix minor thinko in pathification code.
I passed the wrong "root" struct to create_pathtarget in build_minmax_path.
Since the subroot is a clone of the outer root, this would not cause any
serious problems, but it would waste some cycles because
set_pathtarget_cost_width would not have access to Var width estimates
set up while running query_planner on the subroot.
2016-03-08 16:50:44 -05:00
Andres Freund
e66197fa2e plperl: Correctly handle empty arrays in plperl_ref_from_pg_array.
plperl_ref_from_pg_array() didn't consider the case that postgrs arrays
can have 0 dimensions (when they're empty) and accessed the first
dimension without a check. Fix that by special casing the empty array
case.

Author: Alex Hunsaker
Reported-By: Andres Freund / valgrind / buildfarm animal skink
Discussion: 20160308063240.usnzg6bsbjrne667@alap3.anarazel.de
Backpatch: 9.1-
2016-03-08 13:42:57 -08:00
Tom Lane
8c314b9853 Finish refactoring make_foo() functions in createplan.c.
This patch removes some redundant cost calculations that I left for later
cleanup in commit 3fc6e2d7f5.  There's now a uniform policy that the
make_foo() convenience functions don't do any cost calculations.  Most of
their callers copy costs from the source Path node, and for those that
don't, the calculation in the make_foo() function wasn't necessarily right
anyhow.  (make_result() was particularly a mess, as it was serving multiple
callers using cost calcs designed for only the first one or two that had
ever existed.)  Aside from saving a few cycles, this ensures that what
EXPLAIN prints matches the costs we used for planning purposes.  It does
not change any planner decisions, since the decisions are already made.
2016-03-08 16:28:34 -05:00
Robert Haas
7400559a3f Comment update for fdw_recheck_quals.
Commit 5fc4c26db5 could've done a better
job updating these comments.

Etsuro Fujita
2016-03-08 14:40:55 -05:00
Robert Haas
734f86d50d Add new flags argument for xl_heap_visible to heap2_desc.
Masahiko Sawada
2016-03-08 13:28:22 -05:00
Robert Haas
dcfecaae9e Fix parallel query on standby servers.
Without this fix, it inevitably bombs out with "ERROR:  failed to
initialize transaction_read_only to 0".  Repair.

Ashutosh Sharma; comments adjusted by me.
2016-03-08 10:27:03 -05:00
Robert Haas
070140ee48 Add some functions to fd.c for the convenience of extensions.
For example, if you want to perform an ioctl() on a file descriptor
opened through the fd.c routines, there's no way to do that without
being able to get at the underlying fd.

KaiGai Kohei
2016-03-08 10:09:50 -05:00
Robert Haas
77a1d1e798 Department of second thoughts: remove PD_ALL_FROZEN.
Commit a892234f83 added a second bit per
page to the visibility map, which still seems like a good idea, but it
also added a second page-level bit alongside PD_ALL_VISIBLE to track
whether the visibility map bit was set.  That no longer seems like a
clever plan, because we don't really need that bit for anything.  We
always clear both bits when the page is modified anyway.

Patch by me, reviewed by Kyotaro Horiguchi and Masahiko Sawada.
2016-03-08 08:46:48 -05:00
Robert Haas
6f56b41ac0 pg_upgrade: Remove converter plugin facility.
We've not found a use for this so far, and the current need, which
is to convert the visibility map to a new format, does not suit the
existing design anyway.  So just rip it out.

Author: Masahiko Sawada, slightly revised by me.
Discussion: 20160215211313.GB31273@momjian.us
2016-03-08 08:13:02 -05:00
Tom Lane
cf8e7b16a5 Spell "parallel" correctly.
Per David Rowley.
2016-03-07 21:48:17 -05:00
Peter Eisentraut
1c2db8c305 Fix uninstall target in tsearch Makefile
Artur Zakirov
2016-03-07 20:36:59 -05:00
Joe Conway
7b077af500 Make get_controlfile() error logging consistent with src/common
As originally committed, get_controlfile() used a non-standard approach
to error logging. Make it consistent with the majority of error logging
done in src/common.

Applies to master only.
2016-03-07 15:14:20 -08:00
Andres Freund
b63bea5fd3 Further improvements to c8f621c43.
Coverity and inspection for the issue addressed in fd45d16f found some
questionable code.

Specifically coverity noticed that the wrong length was added in
ReorderBufferSerializeChange() - without immediate negative consequences
as the variable isn't used afterwards.  During code-review and testing I
noticed that a bit of space was wasted when allocating tuple bufs in
several places.  Thirdly, the debug memset()s in
ReorderBufferGetTupleBuf() reduce the error checking valgrind can do.

Backpatch: 9.4, like c8f621c43.
2016-03-07 14:24:03 -08:00
Tom Lane
3fc6e2d7f5 Make the upper part of the planner work by generating and comparing Paths.
I've been saying we needed to do this for more than five years, and here it
finally is.  This patch removes the ever-growing tangle of spaghetti logic
that grouping_planner() used to use to try to identify the best plan for
post-scan/join query steps.  Now, there is (nearly) independent
consideration of each execution step, and entirely separate construction of
Paths to represent each of the possible ways to do that step.  We choose
the best Path or set of Paths using the same add_path() logic that's been
used inside query_planner() for years.

In addition, this patch removes the old restriction that subquery_planner()
could return only a single Plan.  It now returns a RelOptInfo containing a
set of Paths, just as query_planner() does, and the parent query level can
use each of those Paths as the basis of a SubqueryScanPath at its level.
This allows finding some optimizations that we missed before, wherein a
subquery was capable of returning presorted data and thereby avoiding a
sort in the parent level, making the overall cost cheaper even though
delivering sorted output was not the cheapest plan for the subquery in
isolation.  (A couple of regression test outputs change in consequence of
that.  However, there is very little change in visible planner behavior
overall, because the point of this patch is not to get immediate planning
benefits but to create the infrastructure for future improvements.)

There is a great deal left to do here.  This patch unblocks a lot of
planner work that was basically impractical in the old code structure,
such as allowing FDWs to implement remote aggregation, or rewriting
plan_set_operations() to allow consideration of multiple implementation
orders for set operations.  (The latter will likely require a full
rewrite of plan_set_operations(); what I've done here is only to fix it
to return Paths not Plans.)  I have also left unfinished some localized
refactoring in createplan.c and planner.c, because it was not necessary
to get this patch to a working state.

Thanks to Robert Haas, David Rowley, and Amit Kapila for review.
2016-03-07 15:58:22 -05:00
Tom Lane
b642e50aea Fix backwards test for Windows service-ness in pg_ctl.
A thinko in a96761391 caused pg_ctl to get it exactly backwards when
deciding whether to report problems to the Windows eventlog or to stderr.
Per bug #14001 from Manuel Mathar, who also identified the fix.
Like the previous patch, back-patch to all supported branches.
2016-03-07 10:40:44 -05:00
Tom Lane
94f1adccd3 Re-fix broken definition for function name in pgbench's exprscan.l.
Wups, my first try wasn't quite right either.  Too focused on fixing
the existing bug, not enough on not introducing new ones.
2016-03-06 21:45:34 -05:00
Tom Lane
3899caf772 Fix broken definition for function name in pgbench's exprscan.l.
As written, this would accept e.g. 123e9 as a function name.  Aside
from being mildly astonishing, that would come back to haunt us if
we ever try to add float constants to the expression syntax.  Insist
that function names start with letters (or at least non-digits).

In passing reset yyline as well as yycol when starting a new expression.
This variable is useless since it's used nowhere, but if we're going
to have it we should have it act sanely.
2016-03-06 21:04:25 -05:00
Andres Freund
fd45d16f62 Fix wrong allocation size in c8f621c43.
In c8f621c43 I forgot to account for MAXALIGN when allocating a new
tuplebuf in ReorderBufferGetTupleBuf(). That happens to currently not
cause active problems on a number of platforms because the affected
pointer is already aligned, but others, like ppc and hppa, trigger this
in the regression test, due to a debug memset clearing memory.

Fix that.

Backpatch: 9.4, like the previous commit.
2016-03-06 16:27:20 -08:00
Tom Lane
b3e05097e5 Fix not-terribly-safe coding in NIImportOOAffixes() and NIImportAffixes().
There were two places in spell.c that supposed that they could search
for a location in a string produced by lowerstr() and then transpose
the offset into the original string.  But this fails completely if
lowerstr() transforms any characters into characters of different byte
length, as can happen in Turkish UTF8 for instance.

We'd added some comments about this coding in commit 51e78ab4ff,
but failed to realize that it was not merely confusing but wrong.

Coverity complained about this code years ago, but in such an opaque
fashion that nobody understood what it was on about.  I'm not entirely
sure that this issue *is* what it's on about, actually, but perhaps
this patch will shut it up -- and in any case the problem is clear.

Back-patch to all supported branches.
2016-03-06 19:20:55 -05:00
Tom Lane
cb0ca0c995 Fix unportable usage of <ctype.h> functions.
isdigit(), isspace(), etc are likely to give surprising results if passed a
signed char.  We should always cast the argument to unsigned char to avoid
that.  Error in commit d78a7d9c7f, found by buildfarm member gaur.
2016-03-06 18:23:53 -05:00
Andres Freund
c8f621c43a logical decoding: Fix handling of large old tuples with replica identity full.
When decoding the old version of an UPDATE or DELETE change, and if that
tuple was bigger than MaxHeapTupleSize, we either Assert'ed out, or
failed in more subtle ways in non-assert builds.  Normally individual
tuples aren't bigger than MaxHeapTupleSize, with big datums toasted.
But that's not the case for the old version of a tuple for logical
decoding; the replica identity is logged as one piece. With the default
replica identity btree limits that to small tuples, but that's not the
case for FULL.

Change the tuple buffer infrastructure to separate allocate over-large
tuples, instead of always going through the slab cache.

This unfortunately requires changing the ReorderBufferTupleBuf
definition, we need to store the allocated size someplace. To avoid
requiring output plugins to recompile, don't store HeapTupleHeaderData
directly after HeapTupleData, but point to it via t_data; that leaves
rooms for the allocated size.  As there's no reason for an output plugin
to look at ReorderBufferTupleBuf->t_data.header, remove the field. It
was just a minor convenience having it directly accessible.

Reported-By: Adam Dratwiński
Discussion: CAKg6ypLd7773AOX4DiOGRwQk1TVOQKhNwjYiVjJnpq8Wo+i62Q@mail.gmail.com
2016-03-05 18:02:20 -08:00
Andres Freund
0bda14d54c logical decoding: old/newtuple in spooled UPDATE changes was switched around.
Somehow I managed to flip the order of restoring old & new tuples when
de-spooling a change in a large transaction from disk. This happens to
only take effect when a change is spooled to disk which has old/new
versions of the tuple. That only is the case for UPDATEs where he
primary key changed or where replica identity is changed to FULL.

The tests didn't catch this because either spooled updates, or updates
that changed primary keys, were tested; not both at the same time.

Found while adding tests for the following commit.

Backpatch: 9.4, where logical decoding was added
2016-03-05 18:02:20 -08:00
Andres Freund
d9e903f3cb logical decoding: Tell reorderbuffer about all xids.
Logical decoding's reorderbuffer keeps transactions in an LSN ordered
list for efficiency. To make that's efficiently possible upper-level
xids are forced to be logged before nested subtransaction xids.  That
only works though if these records are all looked at: Unfortunately we
didn't do so for e.g. row level locks, which are otherwise uninteresting
for logical decoding.

This could lead to errors like:
"ERROR: subxact logged without previous toplevel record".

It's not sufficient to just look at row locking records, the xid could
appear first due to a lot of other types of records (which will trigger
the transaction to be marked logged with MarkCurrentTransactionIdLoggedIfAny).
So invent infrastructure to tell reorderbuffer about xids seen, when
they'd otherwise not pass through reorderbuffer.c.

Reported-By: Jarred Ward
Bug: #13844
Discussion: 20160105033249.1087.66040@wrigleys.postgresql.org
Backpatch: 9.4, where logical decoding was added
2016-03-05 18:02:20 -08:00
Joe Conway
dc7d70ea05 Expose control file data via SQL accessible functions.
Add four new SQL accessible functions: pg_control_system(),
pg_control_checkpoint(), pg_control_recovery(), and pg_control_init()
which expose a subset of the control file data.

Along the way move the code to read and validate the control file to
src/common, where it can be shared by the new backend functions
and the original pg_controldata frontend program.

Patch by me, significant input, testing, and review by Michael Paquier.
2016-03-05 11:10:19 -08:00
Fujii Masao
d34794f7d5 Ignore recovery_min_apply_delay until recovery has reached consistent state
Previously recovery_min_apply_delay was applied even before recovery
had reached consistency. This could cause us to wait a long time
unexpectedly for read-only connections to be allowed. It's problematic
because the standby was useless during that wait time.

This patch changes recovery_min_apply_delay so that it's applied once
the database has reached the consistent state. That is, even if the delay
is set, the standby tries to replay WAL records as fast as possible until
it has reached consistency.

Author: Michael Paquier
Reviewed-By: Julien Rouhaud
Reported-By: Greg Clough
Backpatch: 9.4, where recovery_min_apply_delay was added
Bug: #13770
Discussion: http://www.postgresql.org/message-id/20151111155006.2644.84564@wrigleys.postgresql.org
2016-03-06 02:29:04 +09:00
Tom Lane
60690a6fe8 Make stats regression test robust in the face of parallel query.
Historically, the wait_for_stats() function in this test has simply checked
for a report of an indexscan on tenk2, corresponding to the last command
issued before we expect stats updates to appear.  However, with parallel
query that indexscan could be done by a parallel worker that will emit
its stats counters to the collector before the session's main backend does
(a full second before, in fact, thanks to the "pg_sleep(1.0)" added by
commit 957d08c81f).  That leaves a sizable window in which an
autovacuum-triggered write of the stats files would present a state in
which the indexscan on tenk2 appears to have been done, but none of the
write updates performed by the test have been.  This is evidently the
explanation for intermittent failures seen by me and on buildfarm member
mandrill.

To fix, we should check separately for both the tenk2 seqscan and indexscan
counts, since those might be reported by different processes that could be
delayed arbitrarily on an overloaded test machine.  And we need to check
for at least one update-related count.  If we ever allow parallel workers
to do writes, this will get even more complicated ... but in view of all
the other hard problems that will entail, I don't feel a need to solve this
one today.

Per research by Rahila Syed and myself; part of this patch is Rahila's.
2016-03-04 16:20:49 -05:00
Robert Haas
708020eb7b Fix typo in comment.
Thomas Munro
2016-03-04 15:46:30 -05:00
Robert Haas
6fcde8a5c8 Minor improvements to transaction manager README.
A simple SELECT is handled by PortalRunSelect, not ProcessQuery.  Also,
the previous indentation was unclear: change it so that a deeper level
of indentation indicates that the outer function calls the inner one.

Stas Kelvich
2016-03-04 14:12:28 -05:00
Robert Haas
17b124d303 Fix SerializeSnapshot not to overrun the allocated space.
Rushabh Lathia
2016-03-04 13:48:36 -05:00
Teodor Sigaev
0e7557dc8d Fix Windows build broken by d78a7d9c7f 2016-03-04 21:36:49 +03:00
Robert Haas
df4685fb0c Minor optimizations based on ParallelContext having nworkers_launched.
Originally, we didn't have nworkers_launched, so code that used parallel
contexts had to be preprared for the possibility that not all of the
workers requested actually got launched.  But now we can count on knowing
the number of workers that were successfully launched, which can shave
off a few cycles and simplify some code slightly.

Amit Kapila, reviewed by Haribabu Kommi, per a suggestion from Peter
Geoghegan.
2016-03-04 12:59:10 -05:00
Robert Haas
546cd0d766 Fix InitializeSessionUserId not to deference NULL rolename pointer.
Dmitriy Sarafannikov, reviewed by Michael Paquier and Haribabu Kommi,
with a minor fix by me.
2016-03-04 12:28:09 -05:00
Teodor Sigaev
d78a7d9c7f Improve support of Hunspell in ispell dictionary.
Now it's possible to load recent version of Hunspell for several languages.
To handle these dictionaries Hunspell patch adds support for:
* FLAG long - sets the double extended ASCII character flag type
* FLAG num - sets the decimal number flag type (from 1 to 65535)
* AF parameter - alias for flag's set

Also it moves test dictionaries into separate directory.

Author: Artur Zakirov with editorization by me
2016-03-04 20:08:47 +03:00
Robert Haas
9445db925e Fix query-based tab completion for multibyte characters.
The existing code confuses the byte length of the string (which is
relevant when passing it to pg_strncasecmp) with the character length
of the string (which is relevant when it is used with the SQL substring
function).  Separate those two concepts.

Report and patch by Kyotaro Horiguchi, reviewed by Thomas Munro and
reviewed and further revised by me.
2016-03-04 11:53:20 -05:00
Alvaro Herrera
52fe6f4e02 Add 'tap_tests' flag in config_default.pl
This makes the flag more visible for testers using the default file as a
template, increasing the likelyhood that the test suite will be run.
Also have the flag be displayed in the fake "configure" output, if set.

This patch is two new lines only, but perltidy decides to shift things
around which makes it appear a bit bigger.

Author: Michaël Paquier
Reviewed-by: Craig Ringer
Discussion: https://www.postgresql.org/message-id/CAB7nPqRet6UAP2APhZAZw%3DVhJ6w-Q-gGLdZkrOqFgd2vc9-ZDw%40mail.gmail.com
2016-03-04 13:04:53 -03:00
Peter Eisentraut
1fa2a6b1d4 Add prerequisite for KOI8-U.TXT
This was missed when the encoding was added.
2016-03-03 20:44:47 -05:00
Peter Eisentraut
b497abc602 Make some adjustments in variable assignments
These variables aren't really used for anything interesting, but it
seems the existing grouping was somewhat nonsensical.
2016-03-03 20:44:47 -05:00
Peter Eisentraut
7a4a813c99 Add missing rules related to EUC_JIS_2004 and SHIFT_JIS_2004 encodings
This was apparently forgotten in commit
75c6519ff6.
2016-03-03 20:44:47 -05:00
Alvaro Herrera
d561f1caec pgbench: accept unambiguous builtin prefixes for -b
This makes it easier to use "-b se" instead of typing the full "-b
select-only".

Author: Fabien Coelho
Reviewed-by: Michaël Paquier
2016-03-03 19:37:13 -03:00
Alvaro Herrera
2c83f435a3 Rework PostgresNode's psql method
This makes the psql() method much more capable: it captures both stdout
and stderr; it now returns the psql exit code rather than stdout; a
timeout can now be specified, as can ON_ERROR_STOP behavior; it gained a
new "on_error_die" (defaulting to off) parameter to raise an exception
if there's any problem.  Finally, additional parameters to psql can be
passed if there's need for further tweaking.

For convenience, a new safe_psql() method retains much of the old
behavior of psql(), except that it uses on_error_die on, so that
problems like syntax errors in SQL commands can be detected more easily.

Many existing TAP test files now use safe_psql, which is what is really
wanted.  A couple of ->psql() calls are now added in the commit_ts
tests, which verify that the right thing is happening on certain errors.
Some ->command_fails() calls in recovery tests that were verifying that
psql failed also became ->psql() calls now.

Author: Craig Ringer. Some tweaks by Álvaro Herrera
Reviewed-By: Michaël Paquier
2016-03-03 17:58:30 -03:00
Alvaro Herrera
7d9a4301c0 perltidy PostgresNode and SimpleTee
Also, mention in README that Perl files should be perltidy'ed.  This
isn't really the best place (since we have Perl files elsewhere in the
tree) and this is already in pgindent's README, but this subdir is
likely to get hacked a whole lot more than the other Perl files, so it
seems okay to spend two lines on this.

Author: Craig Ringer
2016-03-03 13:21:35 -03:00
Alvaro Herrera
5bec1ad464 Fix mistakes in recovery tests
One test was relying on method remove_tree that isn't implemented in the
oldest Perl we support; fix it by using the older rmtree instead.

Another test had a typo in a SQL command, which isn't noticed because
the PostgresNode->psql() method doesn't check that queries return
correctly.  That's undesirable and will also be fixed later on, but for
now let's make the test actually work.

Author: Craig Ringer
2016-03-03 12:51:47 -03:00
Simon Riggs
c7111d11b1 Revert buggy optimization of index scans
606c0123d6 attempted to reduce cost of index scans using > and <
strategies, though got that completely wrong in a few complex cases.

Revert whole patch until we find a safe optimization.
2016-03-03 09:53:43 +00:00
Magnus Hagander
6c90996a4c Add prefix to pl/pgsql global variables and functions
Rename pl/pgsql global variables to always have a plpgsql_ prefix,
so they don't conflict with other shared libraries loaded.
2016-03-03 10:45:59 +01:00
Andres Freund
7c17aac69d logical decoding: fix decoding of a commit's commit time.
When adding replication origins in 5aa235042, I somehow managed to set
the timestamp of decoded transactions to InvalidXLogRecptr when decoding
one made without a replication origin. Fix that, and the wrong type of
the new commit_time variable.

This didn't trigger a regression test failure because we explicitly
don't show commit timestamps in the regression tests, as they obviously
are variable. Add a test that checks that a decoded commit's timestamp
is within minutes of NOW() from before the commit.

Reported-By: Weiping Qu
Diagnosed-By: Artur Zakirov
Discussion: 56D4197E.9050706@informatik.uni-kl.de,
    56D42918.1010108@postgrespro.ru
Backpatch: 9.5, where 5aa235042 originates.
2016-03-02 23:42:21 -08:00
Tom Lane
a9d199f6d3 Fix json_to_record() bug with nested objects.
A thinko concerning nesting depth caused json_to_record() to produce bogus
output if a field of its input object contained a sub-object with a field
name matching one of the requested output column names.  Per bug #13996
from Johann Visagie.

I added a regression test case based on his example, plus parallel tests
for json_to_recordset, jsonb_to_record, jsonb_to_recordset.  The latter
three do not exhibit the same bug (which suggests that we may be missing
some opportunities to share code...) but testing seems like a good idea
in any case.

Back-patch to 9.4 where these functions were introduced.
2016-03-02 23:31:39 -05:00
Tom Lane
eb43e851d6 Create stub functions to support pg_upgrade of old contrib/tsearch2.
Commits 9ff60273e3 and dbe2328959 adjusted the declarations
of some core functions referenced by contrib/tsearch2's install script,
forgetting that in a pg_upgrade situation, we'll be trying to restore
operator class definitions that reference the old signatures.  We've
hit this problem before; solve it in the same way as before, namely by
installing stub functions that have the expected signature and just
invoke the correct function.  Per report from Jeff Janes.

(Someday we ought to stop supporting contrib/tsearch2, but I'm not
sure today is that day.)
2016-03-02 17:37:54 -05:00
Alvaro Herrera
cc6077d4d5 Prefix temp data dirs with the node name
This makes it easier to relate the temporary data dirs to each node in
a test script.

Author: Kyotaro Horiguchi
Reviewed-By: Craig Ringer, Alvaro Herrera
2016-03-02 18:22:45 -03:00
Tom Lane
c8c7c93de8 Fix PL/Tcl's encoding conversion logic.
PL/Tcl appears to contain logic to convert strings between the database
encoding and UTF8, which is the only encoding modern Tcl will deal with.
However, that code has been disabled since commit 034895125d, which
made it "#if defined(UNICODE_CONVERSION)" and neglected to provide any way
for that symbol to become defined.  That might have been all right back
in 2001, but these days we take a dim view of allowing strings with
incorrect encoding into the database.

Remove the conditional compilation, fix warnings about signed/unsigned char
conversions, clean up assorted places that didn't bother with conversions.
(Notably, there were lots of assumptions that database table and field
names didn't need conversion...)

Add a regression test based on plpython_unicode.  It's not terribly
thorough, but better than no test at all.
2016-03-02 13:30:14 -05:00
Tom Lane
e2609323eb Make PL/Tcl require Tcl 8.4 or later.
As of commit 2878220682, PL/Tcl will not
compile against pre-8.0 Tcl, whereas it used to work (more or less anyway)
with quite prehistoric versions.  As long as we're moving these goalposts,
let's reinstall them at someplace that has some thought behind it.  This
commit sets the minimum allowed Tcl version at 8.4, and rips out some bits
of compatibility cruft that are in consequence no longer needed.  Reasons
for requiring 8.4 include:

* 8.4 was released in 2002; there seems little reason to believe that
anyone would want to use older versions with Postgres 9.6+.

* We have no buildfarm members testing anything older than 8.4, and
thus no way to know if it's broken.

* We need at least 8.1 to allow enforcement of database encoding
security (8.1 standardized Tcl on using UTF8 internally, before that
it was pretty unpredictable).

* Some versions between 8.1 and 8.4 allowed the backend to become
multithreaded, which is disastrous.  We need at least 8.4 to be able
to disable the Tcl notifier subsystem to prevent that.

A small side benefit is that we can make the code more readable by
doing s/CONST84/const/g.
2016-03-02 12:24:30 -05:00
Tom Lane
2878220682 Convert PL/Tcl to use Tcl's "object" interfaces.
The original implementation of Tcl was all strings, but they improved
performance significantly by introducing typed "objects" (integers,
lists, code, etc).  It's past time we made use of that; that happened
in Tcl 8.0 which was released in 1997.

This patch also modernizes some of the error-reporting code, which may
cause small changes in the spelling of complaints about bad calls to
PL/Tcl-provided commands.

Jim Nasby and Karl Lehenbauer, reviewed by Victor Wagner
2016-03-02 12:07:31 -05:00
Tom Lane
3b8d721553 Fix TAP tests for older Perls.
Commit 7132810c (Retain tempdirs for failed tests) used Test::More's
is_passing method, but that was added in Test::More 0.89_01 which is
sometime later than Perl 5.10.1.  Popular platforms such as RHEL6 don't
have that, nevermind some of our older dinosaurs.  Do it the hard way.

Michael Paquier, based on research by Craig Ringer
2016-03-02 01:06:31 -05:00
Robert Haas
a892234f83 Change the format of the VM fork to add a second bit per page.
The new bit indicates whether every tuple on the page is already frozen.
It is cleared only when the all-visible bit is cleared, and it can be
set only when we vacuum a page and find that every tuple on that page is
both visible to every transaction and in no need of any future
vacuuming.

A future commit will use this new bit to optimize away full-table scans
that would otherwise be triggered by XID wraparound considerations.  A
page which is merely all-visible must still be scanned in that case, but
a page which is all-frozen need not be.  This commit does not attempt
that optimization, although that optimization is the goal here.  It
seems better to get the basic infrastructure in place first.

Per discussion, it's very desirable for pg_upgrade to automatically
migrate existing VM forks from the old format to the new format.  That,
too, will be handled in a follow-on patch.

Masahiko Sawada, reviewed by Kyotaro Horiguchi, Fujii Masao, Amit
Kapila, Simon Riggs, Andres Freund, and others, and substantially
revised by me.
2016-03-01 21:49:41 -05:00
Tom Lane
68c521eb92 Improve coverage of pltcl regression tests.
Test composite-type arguments and the argisnull and spi_lastoid Tcl
commmands.  This stuff was not covered before, but needs to be exercised
since the upcoming Tcl object-conversion patch changes these code paths
(and broke at least one of them).
2016-03-01 20:01:16 -05:00
Alvaro Herrera
9def031bd2 Add more tests for commit_timestamp feature
These tests verify that 1) WAL replay preserves the stored value,
2) a streaming standby server replays the value obtained from the
master, and 3) the behavior is sensible in the face of repeated
configuration changes.

One annoyance is that tmp_check/ subdir from the TAP tests is clobbered
when the pg_regress test runs in the same subdirectory.  This is
bothersome but not too terrible a problem, since the pg_regress test is
not run anyway if the TAP tests fail (unless "make -k" is used).

I had these tests around since commit 69e7235c93e2; add them now that we
have the recovery test framework in place.
2016-03-01 19:53:18 -03:00
Alvaro Herrera
88802e0680 TAP tests: retain temp dirs on test failure
This makes it easier to study the reason for the failure.

Author: Kyotaro Horiguchi
Reviewed-By: Craig Ringer
2016-03-01 19:50:13 -03:00
Robert Haas
212bba93ce Fix incorrect comment.
PQmblen and PQdsplen return information about characters, not words.

Kyotaro Horiguchi
2016-03-01 13:31:44 -05:00
Robert Haas
aec64e8f45 Fix mistake in extensible node code.
I believe that I (rhaas) introduced this bug while editing the patch
that became bcac23de73.

Report and patch from KaiGai Kohei.
2016-03-01 13:17:09 -05:00
Robert Haas
7e137f846d Extend pgbench's expression syntax to support a few built-in functions.
Fabien Coelho, reviewed mostly by Michael Paquier and me, but also by
Heikki Linnakangas, BeomYong Lee, Kyotaro Horiguchi, Oleksander
Shulgin, and Álvaro Herrera.
2016-03-01 13:08:30 -05:00
Peter Eisentraut
bd6cf3f237 Add Unicode map generation scripts as rule prerequisites
That way, the rules will trigger when the scripts change.
2016-02-29 21:19:28 -05:00
Peter Eisentraut
cc074bf6c1 Fix comments
Some of these comments were copied and pasted without updating them,
some of them were duplicates.
2016-02-29 21:19:24 -05:00
Peter Eisentraut
9a3e06baa2 UCS_to_most.pl: Make executable, for consistency with other scripts 2016-02-29 21:19:17 -05:00
Tom Lane
3d523564c5 Suppress scary-looking log messages from async-notify isolation test.
I noticed that the async-notify test results in log messages like these:

LOG:  could not send data to client: Broken pipe
FATAL:  connection to client lost

This is because it unceremoniously disconnects a client session that is
about to have some NOTIFY messages delivered to it.  Such log messages
during a regression test might well cause people to go looking for a
problem that doesn't really exist (it did cause me to waste some time that
way).  We can shut it up by adding an UNLISTEN command to session teardown.

Patch HEAD only; this doesn't seem significant enough to back-patch.
2016-02-29 19:29:19 -05:00
Tom Lane
8d8ff5f7db Improve error message for rejecting RETURNING clauses with dropped columns.
This error message was written with only ON SELECT rules in mind, but since
then we also made RETURNING-clause targetlists go through the same logic.
This means that you got a rather off-topic error message if you tried to
add a rule with RETURNING to a table having dropped columns.  Ideally we'd
just support that, but some preliminary investigation says that it might be
a significant amount of work.  Seeing that Nicklas Avén's complaint is the
first one we've gotten about this in the ten years or so that the code's
been like that, I'm unwilling to put much time into it.  Instead, improve
the error report by issuing a different message for RETURNING cases, and
revise the associated comment based on this investigation.

Discussion: 1456176604.17219.9.camel@jordogskog.no
2016-02-29 19:11:38 -05:00
Alvaro Herrera
5847397dec Minor tweaks for new src/test/recovery
Author: Michael Paquier
2016-02-29 18:16:59 -03:00
Alvaro Herrera
10b4852215 Fix typos
Author: Amit Langote
2016-02-29 18:11:58 -03:00
Alvaro Herrera
54638f5708 Make new isolationtester test more stable
The original coding of the test was relying too much on the ordering in
which backends are awakened once an advisory lock which they wait for is
released.  Change the code so that each backend uses its own advisory
lock instead, so that the output becomes stable.  Also add a few seconds
of sleep between lock releases, so that the test isn't broken in
overloaded buildfarm animals, as suggested by Tom Lane.

Per buildfarm members spoonbill and guaibasaurus.

Discussion: https://www.postgresql.org/message-id/19294.1456551587%40sss.pgh.pa.us
2016-02-29 16:34:56 -03:00
Tom Lane
c110678a47 Remove useless unary plus.
It's harmless, but might confuse readers.  Seems to have been introduced
in 6bc8ef0b7f.  Back-patch, just to avoid cosmetic cross-branch
differences.

Amit Langote
2016-02-29 10:48:40 -05:00
Tom Lane
05893712cc Fix build under OPTIMIZER_DEBUG.
In commit 19a541143a I replaced RelOptInfo.width with
RelOptInfo.reltarget.width, but I missed updating debug_print_rel()
for that because it's not compiled by default.
Reported by Salvador Fandino, patch by Michael Paquier.
2016-02-29 10:14:12 -05:00
Dean Rasheed
41fedc2462 Fix incorrect varlevelsup in security_barrier_replace_vars().
When converting an RTE with securityQuals into a security barrier
subquery RTE, ensure that the Vars in the new subquery's targetlist
all have varlevelsup = 0 so that they correctly refer to the
underlying base relation being wrapped.

The original code was creating new Vars by copying them from existing
Vars referencing the base relation found elsewhere in the query, but
failed to account for the fact that such Vars could come from sublink
subqueries, and hence have varlevelsup > 0. In practice it looks like
this could only happen with nested security barrier views, where the
outer view has a WHERE clause containing a correlated subquery, due to
the order in which the Vars are processed.

Bug: #13988
Reported-by: Adam Guthrie
Backpatch-to: 9.4, where updatable SB views were introduced
2016-02-29 12:28:06 +00:00