Commit Graph

15695 Commits

Author SHA1 Message Date
Robert Haas 3adf9ced17 Improve check for overly-long extensible node name.
The old code is bad for two reasons.  First, it has an off-by-one
error.  Second, it won't help if you aren't running with assertions
enabled.  Per discussion, we want a check here in that case too.

Author: KaiGai Kohei, adjusted by me.
Reviewed-by: Petr Jelinek
Discussion: 56E0D547.1030101@2ndquadrant.com
2016-03-14 13:52:52 -04:00
Tom Lane ab4ff2889d Fix memory leak in repeated GIN index searches.
Commit d88976cfa1 removed this code from ginFreeScanKeys():
-		if (entry->list)
-			pfree(entry->list);
evidently in the belief that that ItemPointer array is allocated in the
keyCtx and so would be reclaimed by the following MemoryContextReset.
Unfortunately, it isn't and it won't.  It'd likely be a good idea for
that to become so, but as a simple and back-patchable fix in the
meantime, restore this code to ginFreeScanKeys().

Also, add a similar pfree to where startScanEntry() is about to zero out
entry->list.  I am not sure if there are any code paths where this
change prevents a leak today, but it seems like cheap future-proofing.

In passing, make the initial allocation of so->entries[] use palloc
not palloc0.  The code doesn't depend on unused entries being zero;
if it did, the array-enlargement code in ginFillScanEntry() would be
wrong.  So using palloc0 initially can only serve to confuse readers
about what the invariant is.

Per report from Felipe de Jesús Molina Bravo, via Jaime Casanova in
<CAJGNTeMR1ndMU2Thpr8GPDUfiHTV7idELJRFusA5UXUGY1y-eA@mail.gmail.com>
2016-03-13 16:44:31 -04:00
Peter Eisentraut 96adb14d93 Fix whitespace and remove obsolete gitattributes entry 2016-03-13 16:03:13 -04:00
Tom Lane 4b980167cb Report memory context stats upon out-of-memory in repalloc[_huge].
This longstanding functionality evidently got lost in commit
3d6d1b5855.  Noted while studying an OOM report from Jaime
Casanova.  Backpatch to 9.5 where the bug was introduced.
2016-03-13 00:21:07 -05:00
Tom Lane ab737f6ba9 Fix Windows portability issue in 23a27b039d.
_strtoui64() is available in MSVC builds, but apparently not with
other Windows toolchains.  Thanks to Petr Jelinek for the diagnosis.
2016-03-12 22:34:47 -05:00
Tom Lane 23a27b039d Widen query numbers-of-tuples-processed counters to uint64.
This patch widens SPI_processed, EState's es_processed field, PortalData's
portalPos field, FuncCallContext's call_cntr and max_calls fields,
ExecutorRun's count argument, PortalRunFetch's result, and the max number
of rows in a SPITupleTable to uint64, and deals with (I hope) all the
ensuing fallout.  Some of these values were declared uint32 before, and
others "long".

I also removed PortalData's posOverflow field, since that logic seems
pretty useless given that portalPos is now always 64 bits.

The user-visible results are that command tags for SELECT etc will
correctly report tuple counts larger than 4G, as will plpgsql's GET
GET DIAGNOSTICS ... ROW_COUNT command.  Queries processing more tuples
than that are still not exactly the norm, but they're becoming more
common.

Most values associated with FETCH/MOVE distances, such as PortalRun's count
argument and the count argument of most SPI functions that have one, remain
declared as "long".  It's not clear whether it would be worth promoting
those to int64; but it would definitely be a large dollop of additional
API churn on top of this, and it would only help 32-bit platforms which
seem relatively less likely to see any benefit.

Andreas Scherbaum, reviewed by Christian Ullrich, additional hacking by me
2016-03-12 16:05:29 -05:00
Andres Freund e01157500f Include portability/mem.h into fd.c for MAP_FAILED.
Buildfarm members gaur and pademelon are old enough not to know about
MAP_FAILED; which is used in 428b1d6. Include portability/mem.h to fix;
as already done in a bunch of other places.
2016-03-12 12:16:48 -08:00
Tom Lane 570be1f73f Re-export a few of createplan.c's make_xxx() functions.
CitusDB is using these and don't wish to redesign their code right now.
I am not on board with this being a good idea, or a good precedent,
but I lack the energy to fight about it.
2016-03-12 12:12:59 -05:00
Tom Lane 9118d03a8c When appropriate, postpone SELECT output expressions till after ORDER BY.
It is frequently useful for volatile, set-returning, or expensive functions
in a SELECT's targetlist to be postponed till after ORDER BY and LIMIT are
done.  Otherwise, the functions might be executed for every row of the
table despite the presence of LIMIT, and/or be executed in an unexpected
order.  For example, in
	SELECT x, nextval('seq') FROM tab ORDER BY x LIMIT 10;
it's probably desirable that the nextval() values are ordered the same
as x, and that nextval() is not run more than 10 times.

In the past, Postgres was inconsistent in this area: you would get the
desirable behavior if the ordering were performed via an indexscan, but
not if it had to be done by an explicit sort step.  Getting the desired
behavior reliably required contortions like
	SELECT x, nextval('seq')
	  FROM (SELECT x FROM tab ORDER BY x) ss LIMIT 10;

This patch conditionally postpones evaluation of pure-output target
expressions (that is, those that are not used as DISTINCT, ORDER BY, or
GROUP BY columns) so that they effectively occur after sorting, even if an
explicit sort step is necessary.  Volatile expressions and set-returning
expressions are always postponed, so as to provide consistent semantics.
Expensive expressions (costing more than 10 times typical operator cost,
which by default would include any user-defined function) are postponed
if there is a LIMIT or if there are expressions that must be postponed.

We could be more aggressive and postpone any nontrivial expression, but
there are costs associated with doing so: it requires an extra Result plan
node which adds some overhead, and postponement changes the volume of data
going through the sort step, perhaps for the worse.  Since we tend not to
have very good estimates of the output width of nontrivial expressions,
it's hard to have much confidence in our ability to predict whether
postponement would increase or decrease the cost of the sort; therefore
this patch doesn't attempt to make decisions conditionally on that.
Between these factors and a general desire not to change query behavior
when there's not a demonstrable benefit, it seems best to be conservative
about applying postponement.  We might tweak the decision rules in the
future, though.

Konstantin Knizhnik, heavily rewritten by me
2016-03-11 12:27:50 -05:00
Teodor Sigaev b1fdc727c3 Fix Windows build broken in 6943a946c7
Also it fixes dynamic array allocation disallowed by ANSI-C.

Author: Stas Kelvich
2016-03-11 20:10:20 +03:00
Teodor Sigaev 8829af47ef Fix merge affixes for numeric ones
Some dictionaries have duplicated base words with different affix set, we
just merge that sets into one set. But previously merging of sets of affixes
was actually a concatenation of strings but it's wrong for numeric
representation of affixes because such representation uses comma to
separate affixes.

Author: Artur Zakirov
2016-03-11 19:47:50 +03:00
Teodor Sigaev 6943a946c7 Tsvector editing functions
Adds several tsvector editting function: convert tsvector to/from text array,
set weight for given lexemes, delete lexeme(s), unnest, filter lexemes
with given weights

Author: Stas Kelvich with some editorization by me
Reviewers: Tomas Vondram, Teodor Sigaev
2016-03-11 19:22:36 +03:00
Tom Lane 49635d7b3e Minor additional refactoring of planner.c's PathTarget handling.
Teach make_group_input_target() and make_window_input_target() to work
entirely with the PathTarget representation of tlists, rather than
constructing a tlist and immediately deconstructing it into PathTarget
format.  In itself this only saves a few palloc's; the bigger picture is
that it opens the door for sharing cost_qual_eval work across all of
planner.c's constructions of PathTargets.  I'll come back to that later.

In support of this, flesh out tlist.c's infrastructure for PathTargets
a bit more.
2016-03-11 10:24:55 -05:00
Robert Haas 481c76abf4 Fix a typo, and remove unnecessary pgstat_report_wait_end().
Per Amit Kapila.
2016-03-11 07:34:00 -05:00
Simon Riggs 73e7e49da3 Allow emit_log_hook to see original message text
emit_log_hook could only see the translated text, making it harder to identify
which message was being sent. Pass original text to allow the exact message to
be identified, whichever language is used for logging.

Discussion: 20160216.184755.59721141.horiguchi.kyotaro@lab.ntt.co.jp
Author: Kyotaro Horiguchi
2016-03-11 09:53:06 +00:00
Robert Haas a414d96ad2 Simplify GetLockNameFromTagType.
The old code is wrong, because it returns a pointer to an automatic
variable.  And it's also more clever than we really need to be
considering that the case it's worrying about should never happen.
2016-03-10 21:37:22 -05:00
Andres Freund c94f0c29ce Blindly try to fix dtrace enabled builds, broken in 9cd00c45.
Reported-By: Peter Eisentraut
Discussion: 56E2239E.1050607@gmx.net
2016-03-10 17:51:03 -08:00
Andres Freund 9cd00c457e Checkpoint sorting and balancing.
Up to now checkpoints were written in the order they're in the
BufferDescriptors. That's nearly random in a lot of cases, which
performs badly on rotating media, but even on SSDs it causes slowdowns.

To avoid that, sort checkpoints before writing them out. We currently
sort by tablespace, relfilenode, fork and block number.

One of the major reasons that previously wasn't done, was fear of
imbalance between tablespaces. To address that balance writes between
tablespaces.

The other prime concern was that the relatively large allocation to sort
the buffers in might fail, preventing checkpoints from happening. Thus
pre-allocate the required memory in shared memory, at server startup.

This particularly makes it more efficient to have checkpoint flushing
enabled, because that'll often result in a lot of writes that can be
coalesced into one flush.

Discussion: alpine.DEB.2.10.1506011320000.28433@sto
Author: Fabien Coelho and Andres Freund
2016-03-10 17:05:09 -08:00
Andres Freund 428b1d6b29 Allow to trigger kernel writeback after a configurable number of writes.
Currently writes to the main data files of postgres all go through the
OS page cache. This means that some operating systems can end up
collecting a large number of dirty buffers in their respective page
caches.  When these dirty buffers are flushed to storage rapidly, be it
because of fsync(), timeouts, or dirty ratios, latency for other reads
and writes can increase massively.  This is the primary reason for
regular massive stalls observed in real world scenarios and artificial
benchmarks; on rotating disks stalls on the order of hundreds of seconds
have been observed.

On linux it is possible to control this by reducing the global dirty
limits significantly, reducing the above problem. But global
configuration is rather problematic because it'll affect other
applications; also PostgreSQL itself doesn't always generally want this
behavior, e.g. for temporary files it's undesirable.

Several operating systems allow some control over the kernel page
cache. Linux has sync_file_range(2), several posix systems have msync(2)
and posix_fadvise(2). sync_file_range(2) is preferable because it
requires no special setup, whereas msync() requires the to-be-flushed
range to be mmap'ed. For the purpose of flushing dirty data
posix_fadvise(2) is the worst alternative, as flushing dirty data is
just a side-effect of POSIX_FADV_DONTNEED, which also removes the pages
from the page cache.  Thus the feature is enabled by default only on
linux, but can be enabled on all systems that have any of the above
APIs.

While desirable and likely possible this patch does not contain an
implementation for windows.

With the infrastructure added, writes made via checkpointer, bgwriter
and normal user backends can be flushed after a configurable number of
writes. Each of these sources of writes controlled by a separate GUC,
checkpointer_flush_after, bgwriter_flush_after and backend_flush_after
respectively; they're separate because the number of flushes that are
good are separate, and because the performance considerations of
controlled flushing for each of these are different.

A later patch will add checkpoint sorting - after that flushes from the
ckeckpoint will almost always be desirable. Bgwriter flushes are most of
the time going to be random, which are slow on lots of storage hardware.
Flushing in backends works well if the storage and bgwriter can keep up,
but if not it can have negative consequences.  This patch is likely to
have negative performance consequences without checkpoint sorting, but
unfortunately so has sorting without flush control.

Discussion: alpine.DEB.2.10.1506011320000.28433@sto
Author: Fabien Coelho and Andres Freund
2016-03-10 17:04:34 -08:00
Tom Lane c82c92b111 Give pull_var_clause() reject/recurse/return behavior for WindowFuncs too.
All along, this function should have treated WindowFuncs in a manner
similar to Aggrefs, ie with an option whether or not to recurse into them.
By not considering the case, it was always recursing, which is OK for most
callers (although I suspect that the case in prepare_sort_from_pathkeys
might represent a bug).  But now we need return-without-recursing behavior
as well.  There are also more than a few callers that should never see a
WindowFunc, and now we'll get some error checking on that.
2016-03-10 16:23:52 -05:00
Robert Haas fd31cd2651 Don't vacuum all-frozen pages.
Commit a892234f83 gave us enough
infrastructure to avoid vacuuming pages where every tuple on the
page is already frozen.  So, replace the notion of a scan_all or
whole-table vacuum with the less onerous notion of an "aggressive"
vacuum, which will pages that are all-visible, but still skip those
that are all-frozen.

This should greatly reduce the cost of anti-wraparound vacuuming
on large clusters where the majority of data is never touched
between one cycle and the next, because we'll no longer have to
read all of those pages only to find out that we don't need to
do anything with them.

Patch by me, reviewed by Masahiko Sawada.
2016-03-10 16:14:42 -05:00
Tom Lane 364a9f47ab Refactor pull_var_clause's API to make it less tedious to extend.
In commit 1d97c19a0f and later c1d9579dd8, we extended
pull_var_clause's API by adding enum-type arguments.  That's sort of a pain
to maintain, though, because it means every time we add a new behavior we
must touch every last one of the call sites, even if there's a reasonable
default behavior that most of them could use.  Let's switch over to using a
bitmask of flags, instead; that seems more maintainable and might save a
nanosecond or two as well.  This commit changes no behavior in itself,
though I'm going to follow it up with one that does add a new behavior.

In passing, remove flatten_tlist(), which has not been used since 9.1
and would otherwise need the same API changes.

Removing these enums means that optimizer/tlist.h no longer needs to
depend on optimizer/var.h.  Changing that caused a number of C files to
need addition of #include "optimizer/var.h" (probably we can thank old
runs of pgrminclude for that); but on balance it seems like a good change
anyway.
2016-03-10 15:53:07 -05:00
Simon Riggs 37c54863cf Rework wait for AccessExclusiveLocks on Hot Standby
Earlier version committed in 9.0 caused spurious waits in some cases.
New infrastructure for lock waits in 9.3 used to correct and improve this.

Jeff Janes based upon a proposal by Simon Riggs, who also reviewed
Additional review comments from Amit Kapila
2016-03-10 19:26:24 +00:00
Robert Haas 53be0b1add Provide much better wait information in pg_stat_activity.
When a process is waiting for a heavyweight lock, we will now indicate
the type of heavyweight lock for which it is waiting.  Also, you can
now see when a process is waiting for a lightweight lock - in which
case we will indicate the individual lock name or the tranche, as
appropriate - or for a buffer pin.

Amit Kapila, Ildus Kurbangaliev, reviewed by me.  Lots of helpful
discussion and suggestions by many others, including Alexander
Korotkov, Vladimir Borodin, and many others.
2016-03-10 12:44:09 -05:00
Magnus Hagander 9d90388247 Avoid crash on old Windows with AVX2-capable CPU for VS2013 builds
The Visual Studio 2013 CRT generates invalid code when it makes a 64-bit
build that is later used on a CPU that supports AVX2 instructions using a
version of Windows before 7SP1/2008R2SP1.

Detect this combination, and in those cases turn off the generation of
FMA3, per recommendation from the Visual Studio team.

The bug is actually in the CRT shipping with Visual Studio 2013, but
Microsoft have stated they're only fixing it in newer major versions.
The fix is therefor conditioned specifically on being built with this
version of Visual Studio, and not previous or later versions.

Author: Christian Ullrich
2016-03-10 14:10:18 +01:00
Simon Riggs e0694cf9c7 Reduce size of two phase file header
Previously 2PC header was fixed at 200 bytes, which in most cases wasted
WAL space for a workload using 2PC heavily.

Pavan Deolasee, reviewed by Petr Jelinek
2016-03-10 12:51:46 +00:00
Simon Riggs fcb4bfddb6 Reduce lock level for altering fillfactor
Fabrízio de Royes Mello and Simon Riggs
2016-03-10 12:07:33 +00:00
Robert Haas 090b287fc5 Code review for b6fb6471f6.
Reports by Tomas Vondra, Vinayak Pokale, and Aleksander Alekseev.
Patch by Amit Langote.
2016-03-10 06:07:57 -05:00
Tom Lane cc402116ca Remove a couple of useless pstrdup() calls.
There's no point in pstrdup'ing the result of TextDatumGetCString,
since that's necessarily already a freshly-palloc'd C string.

These particular calls are unlikely to be of any consequence
performance-wise, but still they're a bad precedent that can confuse
future patch authors.

Noted by Chapman Flack.
2016-03-09 23:29:05 -05:00
Andres Freund 1d4a0ab19a Avoid unlikely data-loss scenarios due to rename() without fsync.
Renaming a file using rename(2) is not guaranteed to be durable in face
of crashes. Use the previously added durable_rename()/durable_link_or_rename()
in various places where we previously just renamed files.

Most of the changed call sites are arguably not critical, but it seems
better to err on the side of too much durability.  The most prominent
known case where the previously missing fsyncs could cause data loss is
crashes at the end of a checkpoint. After the actual checkpoint has been
performed, old WAL files are recycled. When they're filled, their
contents are fdatasynced, but we did not fsync the containing
directory. An OS/hardware crash in an unfortunate moment could then end
up leaving that file with its old name, but new content; WAL replay
would thus not replay it.

Reported-By: Tomas Vondra
Author: Michael Paquier, Tomas Vondra, Andres Freund
Discussion: 56583BDD.9060302@2ndquadrant.com
Backpatch: All supported branches
2016-03-09 18:53:53 -08:00
Andres Freund 606e0f9841 Introduce durable_rename() and durable_link_or_rename().
Renaming a file using rename(2) is not guaranteed to be durable in face
of crashes; especially on filesystems like xfs and ext4 when mounted
with data=writeback. To be certain that a rename() atomically replaces
the previous file contents in the face of crashes and different
filesystems, one has to fsync the old filename, rename the file, fsync
the new filename, fsync the containing directory.  This sequence is not
generally adhered to currently; which exposes us to data loss risks. To
avoid having to repeat this arduous sequence, introduce
durable_rename(), which wraps all that.

Also add durable_link_or_rename(). Several places use link() (with a
fallback to rename()) to rename a file, trying to avoid replacing the
target file out of paranoia. Some of those rename sequences need to be
durable as well. There seems little reason extend several copies of the
same logic, so centralize the link() callers.

This commit does not yet make use of the new functions; they're used in
a followup commit.

Author: Michael Paquier, Andres Freund
Discussion: 56583BDD.9060302@2ndquadrant.com
Backpatch: All supported branches
2016-03-09 18:53:53 -08:00
Tom Lane a298a1e06f Fix incorrect handling of NULL index entries in indexed ROW() comparisons.
An index search using a row comparison such as ROW(a, b) > ROW('x', 'y')
would stop upon reaching a NULL entry in the "b" column, ignoring the
fact that there might be non-NULL "b" values associated with later values
of "a".  This happens because _bt_mark_scankey_required() marks the
subsidiary scankey for "b" as required, which is just wrong: it's for
a column after the one with the first inequality key (namely "a"), and
thus can't be considered a required match.

This bit of brain fade dates back to the very beginnings of our support
for indexed ROW() comparisons, in 2006.  Kind of astonishing that no one
came across it before Glen Takahashi, in bug #14010.

Back-patch to all supported versions.

Note: the given test case doesn't actually fail in unpatched 9.1, evidently
because the fix for bug #6278 (i.e., stopping at nulls in either scan
direction) is required to make it fail.  I'm sure I could devise a case
that fails in 9.1 as well, perhaps with something involving making a cursor
back up; but it doesn't seem worth the trouble.
2016-03-09 14:51:22 -05:00
Robert Haas be060cbcd4 Re-pgindent vacuumlazy.c. 2016-03-09 13:51:11 -05:00
Robert Haas b6fb6471f6 Add a generic command progress reporting facility.
Using this facility, any utility command can report the target relation
upon which it is operating, if there is one, and up to 10 64-bit
counters; the intent of this is that users should be able to figure out
what a utility command is doing without having to resort to ugly hacks
like attaching strace to a backend.

As a demonstration, this adds very crude reporting to lazy vacuum; we
just report the target relation and nothing else.  A forthcoming patch
will make VACUUM report a bunch of additional data that will make this
much more interesting.  But this gets the basic framework in place.

Vinayak Pokale, Rahila Syed, Amit Langote, Robert Haas, reviewed by
Kyotaro Horiguchi, Jim Nasby, Thom Brown, Masahiko Sawada, Fujii Masao,
and Masanori Oyama.
2016-03-09 12:08:58 -05:00
Tom Lane 8776c15c85 Fix incorrect tlist generation in create_gather_plan().
This function is written as though Gather doesn't project; but it does.
Even if it did not project, though, we must use build_path_tlist to ensure
that the output columns receive correct sortgroupref labeling.

Per report from Amit Kapila.
2016-03-09 10:56:46 -05:00
Tom Lane d31f20e2b5 Fix copy-and-pasteo in comment.
Wensheng Zhang
2016-03-09 10:29:14 -05:00
Tom Lane 51c0f63e4d Improve handling of pathtargets in planner.c.
Refactor so that the internal APIs in planner.c deal in PathTargets not
targetlists, and establish a more regular structure for deriving the
targets needed for successive steps.

There is more that could be done here; calculating the eval costs of each
successive target independently is both inefficient and wrong in detail,
since we won't actually recompute values available from the input node's
tlist.  But it's no worse than what happened before the pathification
rewrite.  In any case this seems like a good starting point for considering
how to handle Konstantin Knizhnik's function-evaluation-postponement patch.
2016-03-09 01:12:16 -05:00
Tom Lane 9e8b99420f Improve handling of group-column indexes in GroupingSetsPath.
Instead of having planner.c compute a groupColIdx array and store it in
GroupingSetsPaths, make create_groupingsets_plan() find the grouping
columns by searching in the child plan node's tlist.  Although that's
probably a bit slower for create_groupingsets_plan(), it's more like
the way every other plan node type does this, and it provides positive
confirmation that we know which child output columns we're supposed to be
grouping on.  (Indeed, looking at this now, I'm not at all sure that it
wasn't broken before, because create_groupingsets_plan() isn't demanding
an exact tlist match from its child node.)  Also, this allows substantial
simplification in planner.c, because it no longer needs to compute the
groupColIdx array at all; no other cases were using it.

I'd intended to put off this refactoring until later (like 9.7), but
in view of the likely bug fix and the need to rationalize planner.c's
tlist handling so we can do something sane with Konstantin Knizhnik's
function-evaluation-postponement patch, I think it can't wait.
2016-03-08 22:32:11 -05:00
Peter Eisentraut a40814d7aa Handle invalid libpq sockets in more places
Also, make error messages consistent.

From: Michael Paquier <michael.paquier@gmail.com>
2016-03-08 21:10:33 -05:00
Tom Lane 61fd218930 Fix minor thinko in pathification code.
I passed the wrong "root" struct to create_pathtarget in build_minmax_path.
Since the subroot is a clone of the outer root, this would not cause any
serious problems, but it would waste some cycles because
set_pathtarget_cost_width would not have access to Var width estimates
set up while running query_planner on the subroot.
2016-03-08 16:50:44 -05:00
Tom Lane 8c314b9853 Finish refactoring make_foo() functions in createplan.c.
This patch removes some redundant cost calculations that I left for later
cleanup in commit 3fc6e2d7f5.  There's now a uniform policy that the
make_foo() convenience functions don't do any cost calculations.  Most of
their callers copy costs from the source Path node, and for those that
don't, the calculation in the make_foo() function wasn't necessarily right
anyhow.  (make_result() was particularly a mess, as it was serving multiple
callers using cost calcs designed for only the first one or two that had
ever existed.)  Aside from saving a few cycles, this ensures that what
EXPLAIN prints matches the costs we used for planning purposes.  It does
not change any planner decisions, since the decisions are already made.
2016-03-08 16:28:34 -05:00
Robert Haas 7400559a3f Comment update for fdw_recheck_quals.
Commit 5fc4c26db5 could've done a better
job updating these comments.

Etsuro Fujita
2016-03-08 14:40:55 -05:00
Robert Haas 734f86d50d Add new flags argument for xl_heap_visible to heap2_desc.
Masahiko Sawada
2016-03-08 13:28:22 -05:00
Robert Haas dcfecaae9e Fix parallel query on standby servers.
Without this fix, it inevitably bombs out with "ERROR:  failed to
initialize transaction_read_only to 0".  Repair.

Ashutosh Sharma; comments adjusted by me.
2016-03-08 10:27:03 -05:00
Robert Haas 070140ee48 Add some functions to fd.c for the convenience of extensions.
For example, if you want to perform an ioctl() on a file descriptor
opened through the fd.c routines, there's no way to do that without
being able to get at the underlying fd.

KaiGai Kohei
2016-03-08 10:09:50 -05:00
Robert Haas 77a1d1e798 Department of second thoughts: remove PD_ALL_FROZEN.
Commit a892234f83 added a second bit per
page to the visibility map, which still seems like a good idea, but it
also added a second page-level bit alongside PD_ALL_VISIBLE to track
whether the visibility map bit was set.  That no longer seems like a
clever plan, because we don't really need that bit for anything.  We
always clear both bits when the page is modified anyway.

Patch by me, reviewed by Kyotaro Horiguchi and Masahiko Sawada.
2016-03-08 08:46:48 -05:00
Tom Lane cf8e7b16a5 Spell "parallel" correctly.
Per David Rowley.
2016-03-07 21:48:17 -05:00
Peter Eisentraut 1c2db8c305 Fix uninstall target in tsearch Makefile
Artur Zakirov
2016-03-07 20:36:59 -05:00
Andres Freund b63bea5fd3 Further improvements to c8f621c43.
Coverity and inspection for the issue addressed in fd45d16f found some
questionable code.

Specifically coverity noticed that the wrong length was added in
ReorderBufferSerializeChange() - without immediate negative consequences
as the variable isn't used afterwards.  During code-review and testing I
noticed that a bit of space was wasted when allocating tuple bufs in
several places.  Thirdly, the debug memset()s in
ReorderBufferGetTupleBuf() reduce the error checking valgrind can do.

Backpatch: 9.4, like c8f621c43.
2016-03-07 14:24:03 -08:00
Tom Lane 3fc6e2d7f5 Make the upper part of the planner work by generating and comparing Paths.
I've been saying we needed to do this for more than five years, and here it
finally is.  This patch removes the ever-growing tangle of spaghetti logic
that grouping_planner() used to use to try to identify the best plan for
post-scan/join query steps.  Now, there is (nearly) independent
consideration of each execution step, and entirely separate construction of
Paths to represent each of the possible ways to do that step.  We choose
the best Path or set of Paths using the same add_path() logic that's been
used inside query_planner() for years.

In addition, this patch removes the old restriction that subquery_planner()
could return only a single Plan.  It now returns a RelOptInfo containing a
set of Paths, just as query_planner() does, and the parent query level can
use each of those Paths as the basis of a SubqueryScanPath at its level.
This allows finding some optimizations that we missed before, wherein a
subquery was capable of returning presorted data and thereby avoiding a
sort in the parent level, making the overall cost cheaper even though
delivering sorted output was not the cheapest plan for the subquery in
isolation.  (A couple of regression test outputs change in consequence of
that.  However, there is very little change in visible planner behavior
overall, because the point of this patch is not to get immediate planning
benefits but to create the infrastructure for future improvements.)

There is a great deal left to do here.  This patch unblocks a lot of
planner work that was basically impractical in the old code structure,
such as allowing FDWs to implement remote aggregation, or rewriting
plan_set_operations() to allow consideration of multiple implementation
orders for set operations.  (The latter will likely require a full
rewrite of plan_set_operations(); what I've done here is only to fix it
to return Paths not Plans.)  I have also left unfinished some localized
refactoring in createplan.c and planner.c, because it was not necessary
to get this patch to a working state.

Thanks to Robert Haas, David Rowley, and Amit Kapila for review.
2016-03-07 15:58:22 -05:00