Commit Graph

5445 Commits

Author SHA1 Message Date
Heikki Linnakangas 4fc72cc7bb Collection of typo fixes.
Use "a" and "an" correctly, mostly in comments. Two error messages were
also fixed (they were just elogs, so no translation work required). Two
function comments in pg_proc.h were also fixed. Etsuro Fujita reported one
of these, but I found a lot more with grep.

Also fix a few other typos spotted while grepping for the a/an typos.
For example, "consists out of ..." -> "consists of ...". Plus a "though"/
"through" mixup reported by Euler Taveira.

Many of these typos were in old code, which would be nice to backpatch to
make future backpatching easier. But much of the code was new, and I didn't
feel like crafting separate patches for each branch. So no backpatching.
2015-05-20 16:56:22 +03:00
Andres Freund 9bc77c4519 Various fixes around ON CONFLICT for rule deparsing.
Neither the deparsing of the new alias for INSERT's target table, nor of
the inference clause was supported. Also fixup a typo in an error
message.

Add regression tests to test those code paths.

Author: Peter Geoghegan
2015-05-19 23:18:57 +02:00
Tom Lane 4db485e75b Put back a backwards-compatible version of sampling support functions.
Commit 83e176ec18 removed the longstanding
support functions for block sampling without any consideration of the
impact this would have on third-party FDWs.  The new API is not notably
more functional for FDWs than the old, so forcing them to change doesn't
seem like a good thing.  We can provide the old API as a wrapper (more
or less) around the new one for a minimal amount of extra code.
2015-05-18 18:34:37 -04:00
Noah Misch fd97bd411d Check return values of sensitive system library calls.
PostgreSQL already checked the vast majority of these, missing this
handful that nearly cannot fail.  If putenv() failed with ENOMEM in
pg_GSS_recvauth(), authentication would proceed with the wrong keytab
file.  If strftime() returned zero in cache_locale_time(), using the
unspecified buffer contents could lead to information exposure or a
crash.  Back-patch to 9.0 (all supported versions).

Other unchecked calls to these functions, especially those in frontend
code, pose negligible security concern.  This patch does not address
them.  Nonetheless, it is always better to check return values whose
specification provides for indicating an error.

In passing, fix an off-by-one error in strftime_win32()'s invocation of
WideCharToMultiByte().  Upon retrieving a value of exactly MAX_L10N_DATA
bytes, strftime_win32() would overrun the caller's buffer by one byte.
MAX_L10N_DATA is chosen to exceed the length of every possible value, so
the vulnerable scenario probably does not arise.

Security: CVE-2015-3166
2015-05-18 10:02:31 -04:00
Magnus Hagander 3b075e9d7b Fix typos in comments
Dmitriy Olshevskiy
2015-05-17 14:58:04 +02:00
Andres Freund f3d3118532 Support GROUPING SETS, CUBE and ROLLUP.
This SQL standard functionality allows to aggregate data by different
GROUP BY clauses at once. Each grouping set returns rows with columns
grouped by in other sets set to NULL.

This could previously be achieved by doing each grouping as a separate
query, conjoined by UNION ALLs. Besides being considerably more concise,
grouping sets will in many cases be faster, requiring only one scan over
the underlying data.

The current implementation of grouping sets only supports using sorting
for input. Individual sets that share a sort order are computed in one
pass. If there are sets that don't share a sort order, additional sort &
aggregation steps are performed. These additional passes are sourced by
the previous sort step; thus avoiding repeated scans of the source data.

The code is structured in a way that adding support for purely using
hash aggregation or a mix of hashing and sorting is possible. Sorting
was chosen to be supported first, as it is the most generic method of
implementation.

Instead of, as in an earlier versions of the patch, representing the
chain of sort and aggregation steps as full blown planner and executor
nodes, all but the first sort are performed inside the aggregation node
itself. This avoids the need to do some unusual gymnastics to handle
having to return aggregated and non-aggregated tuples from underlying
nodes, as well as having to shut down underlying nodes early to limit
memory usage.  The optimizer still builds Sort/Agg node to describe each
phase, but they're not part of the plan tree, but instead additional
data for the aggregation node. They're a convenient and preexisting way
to describe aggregation and sorting.  The first (and possibly only) sort
step is still performed as a separate execution step. That retains
similarity with existing group by plans, makes rescans fairly simple,
avoids very deep plans (leading to slow explains) and easily allows to
avoid the sorting step if the underlying data is sorted by other means.

A somewhat ugly side of this patch is having to deal with a grammar
ambiguity between the new CUBE keyword and the cube extension/functions
named cube (and rollup). To avoid breaking existing deployments of the
cube extension it has not been renamed, neither has cube been made a
reserved keyword. Instead precedence hacking is used to make GROUP BY
cube(..) refer to the CUBE grouping sets feature, and not the function
cube(). To actually group by a function cube(), unlikely as that might
be, the function name has to be quoted.

Needs a catversion bump because stored rules may change.

Author: Andrew Gierth and Atri Sharma, with contributions from Andres Freund
Reviewed-By: Andres Freund, Noah Misch, Tom Lane, Svenne Krap, Tomas
    Vondra, Erik Rijkers, Marti Raudsepp, Pavel Stehule
Discussion: CAOeZVidmVRe2jU6aMk_5qkxnB7dfmPROzM7Ur8JPW5j8Y5X-Lw@mail.gmail.com
2015-05-16 03:46:31 +02:00
Alvaro Herrera b0b7be6133 Add BRIN infrastructure for "inclusion" opclasses
This lets BRIN be used with R-Tree-like indexing strategies.

Also provided are operator classes for range types, box and inet/cidr.
The infrastructure provided here should be sufficient to create operator
classes for similar datatypes; for instance, opclasses for PostGIS
geometries should be doable, though we didn't try to implement one.

(A box/point opclass was also submitted, but we ripped it out before
commit because the handling of floating point comparisons in existing
code is inconsistent and would generate corrupt indexes.)

Author: Emre Hasegeli.  Cosmetic changes by me
Review: Andreas Karlsson
2015-05-15 18:05:22 -03:00
Alvaro Herrera 26df7066cc Move strategy numbers to include/access/stratnum.h
For upcoming BRIN opclasses, it's convenient to have strategy numbers
defined in a single place.  Since there's nothing appropriate, create
it.  The StrategyNumber typedef now lives there, as well as existing
strategy numbers for B-trees (from skey.h) and R-tree-and-friends (from
gist.h).  skey.h is forced to include stratnum.h because of the
StrategyNumber typedef, but gist.h is not; extensions that currently
rely on gist.h for rtree strategy numbers might need to add a new

A few .c files can stop including skey.h and/or gist.h, which is a nice
side benefit.

Per discussion:
https://www.postgresql.org/message-id/20150514232132.GZ2523@alvh.no-ip.org

Authored by Emre Hasegeli and Álvaro.

(It's not clear to me why bootscanner.l has any #include lines at all.)
2015-05-15 17:03:16 -03:00
Tom Lane 8d3e0906df Extend GB18030 encoding conversion to cover full Unicode range.
Our previous code for GB18030 <-> UTF8 conversion only covered Unicode code
points up to U+FFFF, but the actual spec defines conversions for all code
points up to U+10FFFF.  That would be rather impractical as a lookup table,
but fortunately there is a simple algorithmic conversion between the
additional code points and the equivalent GB18030 byte patterns.  Make use
of the just-added callback facility in LocalToUtf/UtfToLocal to perform the
additional conversions.

Having created the infrastructure to do that, we can use the same code to
map certain linearly-related subranges of the Unicode space below U+FFFF,
allowing removal of the corresponding lookup table entries.  This more
than halves the lookup table size, which is a substantial savings;
utf8_and_gb18030.so drops from nearly a megabyte to about half that.

In support of doing that, replace ISO10646-GB18030.TXT with the data file
gb-18030-2000.xml (retrieved from
http://source.icu-project.org/repos/icu/data/trunk/charset/data/xml/ )
in which these subranges have been deleted from the simple lookup entries.

Per bug #12845 from Arjen Nienhuis.  The conversion code added here is
based on his proposed patch, though I whacked it around rather heavily.
2015-05-15 15:02:13 -04:00
Simon Riggs f6d208d6e5 TABLESAMPLE, SQL Standard and extensible
Add a TABLESAMPLE clause to SELECT statements that allows
user to specify random BERNOULLI sampling or block level
SYSTEM sampling. Implementation allows for extensible
sampling functions to be written, using a standard API.
Basic version follows SQLStandard exactly. Usable
concrete use cases for the sampling API follow in later
commits.

Petr Jelinek

Reviewed by Michael Paquier and Simon Riggs
2015-05-15 14:37:10 -04:00
Heikki Linnakangas ffd37740ee Add archive_mode='always' option.
In 'always' mode, the standby independently archives all files it receives
from the primary.

Original patch by Fujii Masao, docs and review by me.
2015-05-15 18:55:24 +03:00
Tom Lane a868931fec Fix insufficiently-paranoid GB18030 encoding verifier.
The previous coding effectively only verified that the second byte of a
multibyte character was in the expected range; moreover, it wasn't careful
to make sure that the second byte even exists in the buffer before touching
it.  The latter seems unlikely to cause any real problems in the field
(in particular, it could never be a problem with null-terminated input),
but it's still a bug.

Since GB18030 is not a supported backend encoding, the only thing we'd
really be doing with GB18030 text is converting it to UTF8 in LocalToUtf,
which would fail anyway on any invalid character for lack of a match in
its lookup table.  So the only user-visible consequence of this change
should be that you'll get "invalid byte sequence for encoding" rather than
"character has no equivalent" for malformed GB18030 input.  However,
impending changes to the GB18030 conversion code will require these tighter
up-front checks to avoid producing bogus results.
2015-05-15 11:04:02 -04:00
Heikki Linnakangas 35fcb1b3d0 Allow GiST distance function to return merely a lower-bound.
The distance function can now set *recheck = false, like index quals. The
executor will then re-check the ORDER BY expressions, and use a queue to
reorder the results on the fly.

This makes it possible to do kNN-searches on polygons and circles, which
don't store the exact value in the index, but just a bounding box.

Alexander Korotkov and me
2015-05-15 14:26:51 +03:00
Tom Lane 7730f48ede Teach UtfToLocal/LocalToUtf to support algorithmic encoding conversions.
Until now, these functions have only supported encoding conversions using
lookup tables, which is fine as long as there's not too many code points
to convert.  However, GB18030 expects all 1.1 million Unicode code points
to be convertible, which would require a ridiculously-sized lookup table.
Fortunately, a large fraction of those conversions can be expressed through
arithmetic, ie the conversions are one-to-one in certain defined ranges.
To support that, provide a callback function that is used after consulting
the lookup tables.  (This patch doesn't actually change anything about the
GB18030 conversion behavior, just provide infrastructure for fixing it.)

Since this requires changing the APIs of UtfToLocal/LocalToUtf anyway,
take the opportunity to rearrange their argument lists into what seems
to me a saner order.  And beautify the call sites by using lengthof()
instead of error-prone sizeof() arithmetic.

In passing, also mark all the lookup tables used by these calls "const".
This moves an impressive amount of stuff into the text segment, at least
on my machine, and is safer anyhow.
2015-05-14 22:27:12 -04:00
Simon Riggs 83e176ec18 Separate block sampling functions
Refactoring ahead of tablesample patch

Requested and reviewed by Michael Paquier

Petr Jelinek
2015-05-15 04:02:54 +02:00
Peter Eisentraut a486e35706 Add pg_settings.pending_restart column
with input from David G. Johnston, Robert Haas, Michael Paquier
2015-05-14 20:08:51 -04:00
Tom Lane 1dc5ebc907 Support "expanded" objects, particularly arrays, for better performance.
This patch introduces the ability for complex datatypes to have an
in-memory representation that is different from their on-disk format.
On-disk formats are typically optimized for minimal size, and in any case
they can't contain pointers, so they are often not well-suited for
computation.  Now a datatype can invent an "expanded" in-memory format
that is better suited for its operations, and then pass that around among
the C functions that operate on the datatype.  There are also provisions
(rudimentary as yet) to allow an expanded object to be modified in-place
under suitable conditions, so that operations like assignment to an element
of an array need not involve copying the entire array.

The initial application for this feature is arrays, but it is not hard
to foresee using it for other container types like JSON, XML and hstore.
I have hopes that it will be useful to PostGIS as well.

In this initial implementation, a few heuristics have been hard-wired
into plpgsql to improve performance for arrays that are stored in
plpgsql variables.  We would like to generalize those hacks so that
other datatypes can obtain similar improvements, but figuring out some
appropriate APIs is left as a task for future work.  (The heuristics
themselves are probably not optimal yet, either, as they sometimes
force expansion of arrays that would be better left alone.)

Preliminary performance testing shows impressive speed gains for plpgsql
functions that do element-by-element access or update of large arrays.
There are other cases that get a little slower, as a result of added array
format conversions; but we can hope to improve anything that's annoyingly
bad.  In any case most applications should see a net win.

Tom Lane, reviewed by Andres Freund
2015-05-14 12:08:49 -04:00
Robert Haas 61f68e0bed Fix comment.
Commit 78efd5c1ed overlooked this.

Report by Peter Geoghegan.
2015-05-13 15:27:41 -04:00
Robert Haas 78efd5c1ed Extend abbreviated key infrastructure to datum tuplesorts.
Andrew Gierth, reviewed by Peter Geoghegan and by me.
2015-05-13 14:36:26 -04:00
Andrew Dunstan 3f2cec797e Fix jsonb replace and delete on scalars and empty structures
These operations now error out if attempted on scalars, and simply
return the input if attempted on empty arrays or objects. Along the way
we remove the unnecessary cloning of the input when it's known to be
unchanged. Regression tests covering these cases are added.
2015-05-13 13:52:08 -04:00
Robert Haas ae6157164f Remove useless assertion.
Here, snapshot->xcnt is an unsigned type, so it will always be
non-negative.
2015-05-13 11:01:10 -04:00
Andrew Dunstan c6947010ce Additional functions and operators for jsonb
jsonb_pretty(jsonb) produces nicely indented json output.
jsonb || jsonb concatenates two jsonb values.
jsonb - text removes a key and its associated value from the json
jsonb - int removes the designated array element
jsonb - text[] removes a key and associated value or array element at
the designated path
jsonb_replace(jsonb,text[],jsonb) replaces the array element designated
by the path or the value associated with the key designated by the path
with the given value.

Original work by Dmitry Dolgov, adapted and reworked for PostgreSQL core
by Andrew Dunstan, reviewed and tidied up by Petr Jelinek.
2015-05-12 15:52:45 -04:00
Peter Eisentraut d02f16470f Replace some appendStringInfo* calls with more appropriate variants
Author: David Rowley <dgrowleyml@gmail.com>
2015-05-11 20:38:55 -04:00
Alvaro Herrera b488c580ae Allow on-the-fly capture of DDL event details
This feature lets user code inspect and take action on DDL events.
Whenever a ddl_command_end event trigger is installed, DDL actions
executed are saved to a list which can be inspected during execution of
a function attached to ddl_command_end.

The set-returning function pg_event_trigger_ddl_commands can be used to
list actions so captured; it returns data about the type of command
executed, as well as the affected object.  This is sufficient for many
uses of this feature.  For the cases where it is not, we also provide a
"command" column of a new pseudo-type pg_ddl_command, which is a
pointer to a C structure that can be accessed by C code.  The struct
contains all the info necessary to completely inspect and even
reconstruct the executed command.

There is no actual deparse code here; that's expected to come later.
What we have is enough infrastructure that the deparsing can be done in
an external extension.  The intention is that we will add some deparsing
code in a later release, as an in-core extension.

A new test module is included.  It's probably insufficient as is, but it
should be sufficient as a starting point for a more complete and
future-proof approach.

Authors: Álvaro Herrera, with some help from Andres Freund, Ian Barwick,
Abhijit Menon-Sen.

Reviews by Andres Freund, Robert Haas, Amit Kapila, Michael Paquier,
Craig Ringer, David Steele.
Additional input from Chris Browne, Dimitri Fontaine, Stephen Frost,
Petr Jelínek, Tom Lane, Jim Nasby, Steven Singer, Pavel Stěhule.

Based on original work by Dimitri Fontaine, though I didn't use his
code.

Discussion:
  https://www.postgresql.org/message-id/m2txrsdzxa.fsf@2ndQuadrant.fr
  https://www.postgresql.org/message-id/20131108153322.GU5809@eldon.alvh.no-ip.org
  https://www.postgresql.org/message-id/20150215044814.GL3391@alvh.no-ip.org
2015-05-11 19:14:31 -03:00
Tom Lane 1a8a4e5cde Code review for foreign/custom join pushdown patch.
Commit e7cb7ee145 included some design
decisions that seem pretty questionable to me, and there was quite a lot
of stuff not to like about the documentation and comments.  Clean up
as follows:

* Consider foreign joins only between foreign tables on the same server,
rather than between any two foreign tables with the same underlying FDW
handler function.  In most if not all cases, the FDW would simply have had
to apply the same-server restriction itself (far more expensively, both for
lack of caching and because it would be repeated for each combination of
input sub-joins), or else risk nasty bugs.  Anyone who's really intent on
doing something outside this restriction can always use the
set_join_pathlist_hook.

* Rename fdw_ps_tlist/custom_ps_tlist to fdw_scan_tlist/custom_scan_tlist
to better reflect what they're for, and allow these custom scan tlists
to be used even for base relations.

* Change make_foreignscan() API to include passing the fdw_scan_tlist
value, since the FDW is required to set that.  Backwards compatibility
doesn't seem like an adequate reason to expect FDWs to set it in some
ad-hoc extra step, and anyway existing FDWs can just pass NIL.

* Change the API of path-generating subroutines of add_paths_to_joinrel,
and in particular that of GetForeignJoinPaths and set_join_pathlist_hook,
so that various less-used parameters are passed in a struct rather than
as separate parameter-list entries.  The objective here is to reduce the
probability that future additions to those parameter lists will result in
source-level API breaks for users of these hooks.  It's possible that this
is even a small win for the core code, since most CPU architectures can't
pass more than half a dozen parameters efficiently anyway.  I kept root,
joinrel, outerrel, innerrel, and jointype as separate parameters to reduce
code churn in joinpath.c --- in particular, putting jointype into the
struct would have been problematic because of the subroutines' habit of
changing their local copies of that variable.

* Avoid ad-hocery in ExecAssignScanProjectionInfo.  It was probably all
right for it to know about IndexOnlyScan, but if the list is to grow
we should refactor the knowledge out to the callers.

* Restore nodeForeignscan.c's previous use of the relcache to avoid
extra GetFdwRoutine lookups for base-relation scans.

* Lots of cleanup of documentation and missed comments.  Re-order some
code additions into more logical places.
2015-05-10 14:36:36 -04:00
Andrew Dunstan cb9fa802b3 Add new OID alias type regnamespace
Catalog version bumped

Kyotaro HORIGUCHI
2015-05-09 13:36:52 -04:00
Andrew Dunstan 0c90f6769d Add new OID alias type regrole
The new type has the scope of whole the database cluster so it doesn't
behave the same as the existing OID alias types which have database
scope,
concerning object dependency. To avoid confusion constants of the new
type are prohibited from appearing where dependencies are made involving
it.

Also, add a note to the docs about possible MVCC violation and
optimization issues, which are general over the all reg* types.

Kyotaro Horiguchi
2015-05-09 13:06:49 -04:00
Stephen Frost 0cf56f14dd Improve ParseConfigFp comment wrt head/tail
The head_p and tail_p pointers passed to ParseConfigFp() are actually
input/output parameters, not strictly output paramaters.  This updates
the function comment to reflect that.

Per discussion with Tom.
2015-05-09 11:13:37 -04:00
Stephen Frost f91feba877 Modify pg_stat_get_activity to build a tuplestore
This updates pg_stat_get_activity() to build a tuplestore for its
results instead of using the old-style multiple-call method.  This
simplifies the function, though that wasn't the primary motivation for
the change, which is that we may turn it into a helper function which
can filter the results (or not) much more easily.
2015-05-08 19:25:30 -04:00
Stephen Frost a97e0c3354 Add pg_file_settings view and function
The function and view added here provide a way to look at all settings
in postgresql.conf, any #include'd files, and postgresql.auto.conf
(which is what backs the ALTER SYSTEM command).

The information returned includes the configuration file name, line
number in that file, sequence number indicating when the parameter is
loaded (useful to see if it is later masked by another definition of the
same parameter), parameter name, and what it is set to at that point.
This information is updated on reload of the server.

This is unfiltered, privileged, information and therefore access is
restricted to superusers through the GRANT system.

Author: Sawada Masahiko, various improvements by me.
Reviewers: David Steele
2015-05-08 19:09:26 -04:00
Andres Freund 168d5805e4 Add support for INSERT ... ON CONFLICT DO NOTHING/UPDATE.
The newly added ON CONFLICT clause allows to specify an alternative to
raising a unique or exclusion constraint violation error when inserting.
ON CONFLICT refers to constraints that can either be specified using a
inference clause (by specifying the columns of a unique constraint) or
by naming a unique or exclusion constraint.  DO NOTHING avoids the
constraint violation, without touching the pre-existing row.  DO UPDATE
SET ... [WHERE ...] updates the pre-existing tuple, and has access to
both the tuple proposed for insertion and the existing tuple; the
optional WHERE clause can be used to prevent an update from being
executed.  The UPDATE SET and WHERE clauses have access to the tuple
proposed for insertion using the "magic" EXCLUDED alias, and to the
pre-existing tuple using the table name or its alias.

This feature is often referred to as upsert.

This is implemented using a new infrastructure called "speculative
insertion". It is an optimistic variant of regular insertion that first
does a pre-check for existing tuples and then attempts an insert.  If a
violating tuple was inserted concurrently, the speculatively inserted
tuple is deleted and a new attempt is made.  If the pre-check finds a
matching tuple the alternative DO NOTHING or DO UPDATE action is taken.
If the insertion succeeds without detecting a conflict, the tuple is
deemed inserted.

To handle the possible ambiguity between the excluded alias and a table
named excluded, and for convenience with long relation names, INSERT
INTO now can alias its target table.

Bumps catversion as stored rules change.

Author: Peter Geoghegan, with significant contributions from Heikki
    Linnakangas and Andres Freund. Testing infrastructure by Jeff Janes.
Reviewed-By: Heikki Linnakangas, Andres Freund, Robert Haas, Simon Riggs,
    Dean Rasheed, Stephen Frost and many others.
2015-05-08 05:43:10 +02:00
Magnus Hagander d678bde655 Fix indentation that could mask a future bug
Michael Paquier, spotted using Coverity
2015-05-07 11:41:26 +02:00
Alvaro Herrera 3b6db1f445 Add geometry/range functions to support BRIN inclusion
This commit adds the following functions:
    box(point) -> box
    bound_box(box, box) -> box
    inet_same_family(inet, inet) -> bool
    inet_merge(inet, inet) -> cidr
    range_merge(anyrange, anyrange) -> anyrange

The first of these is also used to implement a new assignment cast from
point to box.

These functions are the first part of a base to implement an "inclusion"
operator class for BRIN, for multidimensional data types.

Author: Emre Hasegeli
Reviewed by: Andreas Karlsson
2015-05-05 15:22:24 -03:00
Andrew Dunstan 3c000fd9a6 Fix two small bugs in json's populate_record_worker
The first bug is not releasing a tupdesc when doing an early return out
of the function. The second bug is a logic error in choosing when to do
an early return if given an empty jsonb object.

Bug reports from Pavel Stehule and Tom Lane respectively.

Backpatch to 9.4 where these were introduced.
2015-05-04 12:38:58 -04:00
Tom Lane c90b85e4d9 Second try at fixing warnings caused by commit 9b43d73b3f.
Commit ef3f9e642d suppressed one cause of warnings here, but
recent clang on OS X is still unhappy because we're passing a "long"
to abs().  The fact that tm_gmtoff is declared as long is no doubt a
hangover from days when int might be only 16 bits; but Postgres has
never been able to run on such machines, so we can just cast it to int
with no worries.  For consistency, also cast to int in the other
uses of tm_gmtoff in this stanza.

Note: this code is still broken on machines that don't follow C99
integer-division-truncates-towards-zero rules.  Given the lack of
complaints about it, I don't feel a large desire to complicate things
enough to cope with the pre-C99 rules.
2015-05-03 23:44:52 -04:00
Robert Haas e044a44949 Deparse named arguments to use the new => operator instead of :=
Tom Lane pointed out that this wasn't done, and asked whether that was
intentional.  Subsequent discussion was in favor of making the change,
so here we go.
2015-05-01 09:37:10 -04:00
Robert Haas e7cb7ee145 Allow FDWs and custom scan providers to replace joins with scans.
Foreign data wrappers can use this capability for so-called "join
pushdown"; that is, instead of executing two separate foreign scans
and then joining the results locally, they can generate a path which
performs the join on the remote server and then is scanned locally.
This commit does not extend postgres_fdw to take advantage of this
capability; it just provides the infrastructure.

Custom scan providers can use this in a similar way.  Previously,
it was only possible for a custom scan provider to scan a single
relation.  Now, it can scan an entire join tree, provided of course
that it knows how to produce the same results that the join would
have produced if executed normally.

KaiGai Kohei, reviewed by Shigeru Hanada, Ashutosh Bapat, and me.
2015-05-01 08:50:35 -04:00
Robert Haas 924bcf4f16 Create an infrastructure for parallel computation in PostgreSQL.
This does four basic things.  First, it provides convenience routines
to coordinate the startup and shutdown of parallel workers.  Second,
it synchronizes various pieces of state (e.g. GUCs, combo CID
mappings, transaction snapshot) from the parallel group leader to the
worker processes.  Third, it prohibits various operations that would
result in unsafe changes to that state while parallelism is active.
Finally, it propagates events that would result in an ErrorResponse,
NoticeResponse, or NotifyResponse message being sent to the client
from the parallel workers back to the master, from which they can then
be sent on to the client.

Robert Haas, Amit Kapila, Noah Misch, Rushabh Lathia, Jeevan Chalke.
Suggestions and review from Andres Freund, Heikki Linnakangas, Noah
Misch, Simon Riggs, Euler Taveira, and Jim Nasby.
2015-04-30 15:02:14 -04:00
Robert Haas 9b6a0ce5f0 Remove enum-related special cases for catalog scans.
When this code was written, catalog scans were normally performed using
SnapshotNow, making special handling necessary here.  Now, however, all
catalog scans use MVCC snapshots, so we can change these cases to look
more like what we do for catalog scans elsewhere in the code.

Per discussion with Tom Lane and a reminder from Bruce Momjian.
2015-04-29 15:48:44 -04:00
Robert Haas ef3f9e642d Attempt to fix some compiler warnings. 2015-04-29 14:02:27 -04:00
Andres Freund 5aa2350426 Introduce replication progress tracking infrastructure.
When implementing a replication solution ontop of logical decoding, two
related problems exist:
* How to safely keep track of replication progress
* How to change replication behavior, based on the origin of a row;
  e.g. to avoid loops in bi-directional replication setups

The solution to these problems, as implemented here, consist out of
three parts:

1) 'replication origins', which identify nodes in a replication setup.
2) 'replication progress tracking', which remembers, for each
   replication origin, how far replay has progressed in a efficient and
   crash safe manner.
3) The ability to filter out changes performed on the behest of a
   replication origin during logical decoding; this allows complex
   replication topologies. E.g. by filtering all replayed changes out.

Most of this could also be implemented in "userspace", e.g. by inserting
additional rows contain origin information, but that ends up being much
less efficient and more complicated.  We don't want to require various
replication solutions to reimplement logic for this independently. The
infrastructure is intended to be generic enough to be reusable.

This infrastructure also replaces the 'nodeid' infrastructure of commit
timestamps. It is intended to provide all the former capabilities,
except that there's only 2^16 different origins; but now they integrate
with logical decoding. Additionally more functionality is accessible via
SQL.  Since the commit timestamp infrastructure has also been introduced
in 9.5 (commit 73c986add) changing the API is not a problem.

For now the number of origins for which the replication progress can be
tracked simultaneously is determined by the max_replication_slots
GUC. That GUC is not a perfect match to configure this, but there
doesn't seem to be sufficient reason to introduce a separate new one.

Bumps both catversion and wal page magic.

Author: Andres Freund, with contributions from Petr Jelinek and Craig Ringer
Reviewed-By: Heikki Linnakangas, Petr Jelinek, Robert Haas, Steve Singer
Discussion: 20150216002155.GI15326@awork2.anarazel.de,
    20140923182422.GA15776@alap3.anarazel.de,
    20131114172632.GE7522@alap2.anarazel.de
2015-04-29 19:30:53 +02:00
Bruce Momjian 9b43d73b3f to_char(): have format 'OF' only show the leading negative sign
Previously both hours and minutes displayed as negative.

Report by David Pozsar
2015-04-28 21:02:57 -04:00
Andres Freund 6aab1f45ac Fix various typos and grammar errors in comments.
Author: Dmitriy Olshevskiy
Discussion: 553D00A6.4090205@bk.ru
2015-04-26 18:42:31 +02:00
Peter Eisentraut cac7658205 Add transforms feature
This provides a mechanism for specifying conversions between SQL data
types and procedural languages.  As examples, there are transforms
for hstore and ltree for PL/Perl and PL/Python.

reviews by Pavel Stěhule and Andres Freund
2015-04-26 10:33:14 -04:00
Stephen Frost ab6d1cd26e Fix typo in relcache's equalPolicy()
The USING policies were not being checked for differences as the same
policy was being passed in to both sides of the equal().  This could
result in backends not realizing that a policy had been changed, if
none of the other attributes had been changed.

Fix by passing to equal() the policy1 and policy2 using quals for
comparison.

No need to back-patch as this is not yet released.  Noticed while
testing changes to RLS proposed by Dean Rasheed.
2015-04-17 16:37:11 -04:00
Peter Eisentraut 30982be4e5 Integrate pg_upgrade_support module into backend
Previously, these functions were created in a schema "binary_upgrade",
which was deleted after pg_upgrade was finished.  Because we don't want
to keep that schema around permanently, move them to pg_catalog but
rename them with a binary_upgrade_... prefix.

The provided functions are only small wrappers around global variables
that were added specifically for pg_upgrade use, so keeping the module
separate does not create any modularity.

The functions still check that they are only called in binary upgrade
mode, so it is not possible to call these during normal operation.

Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
2015-04-14 19:26:37 -04:00
Heikki Linnakangas 4f700bcd20 Reorganize our CRC source files again.
Now that we use CRC-32C in WAL and the control file, the "traditional" and
"legacy" CRC-32 variants are not used in any frontend programs anymore.
Move the code for those back from src/common to src/backend/utils/hash.

Also move the slicing-by-8 implementation (back) to src/port. This is in
preparation for next patch that will add another implementation that uses
Intel SSE 4.2 instructions to calculate CRC-32C, where available.
2015-04-14 17:03:42 +03:00
Magnus Hagander 9029f4b374 Add system view pg_stat_ssl
This view shows information about all connections, such as if the
connection is using SSL, which cipher is used, and which client
certificate (if any) is used.

Reviews by Alex Shulgin, Heikki Linnakangas, Andres Freund & Michael Paquier
2015-04-12 19:07:46 +02:00
Heikki Linnakangas a10589a512 Remove duplicated words in comments.
David Rowley
2015-04-12 10:46:17 +03:00
Alvaro Herrera 27846f02c1 Optimize locking a tuple already locked by another subxact
Locking and updating the same tuple repeatedly led to some strange
multixacts being created which had several subtransactions of the same
parent transaction holding locks of the same strength.  However,
once a subxact of the current transaction holds a lock of a given
strength, it's not necessary to acquire the same lock again.  This made
some coding patterns much slower than required.

The fix is twofold.  First we change HeapTupleSatisfiesUpdate to return
HeapTupleBeingUpdated for the case where the current transaction is
already a single-xid locker for the given tuple; it used to return
HeapTupleMayBeUpdated for that case.  The new logic is simpler, and the
change to pgrowlocks is a testament to that: previously we needed to
check for the single-xid locker separately in a very ugly way.  That
test is simpler now.

As fallout from the HTSU change, some of its callers need to be amended
so that tuple-locked-by-own-transaction is taken into account in the
BeingUpdated case rather than the MayBeUpdated case.  For many of them
there is no difference; but heap_delete() and heap_update now check
explicitely and do not grab tuple lock in that case.

The HTSU change also means that routine MultiXactHasRunningRemoteMembers
introduced in commit 11ac4c73cb is no longer necessary and can be
removed; the case that used to require it is now handled naturally as
result of the changes to heap_delete and heap_update.

The second part of the fix to the performance issue is to adjust
heap_lock_tuple to avoid the slowness:

1. Previously we checked for the case that our own transaction already
held a strong enough lock and returned MayBeUpdated, but only in the
multixact case.  Now we do it for the plain Xid case as well, which
saves having to LockTuple.

2. If the current transaction is the only locker of the tuple (but with
a lock not as strong as what we need; otherwise it would have been
caught in the check mentioned above), we can skip sleeping on the
multixact, and instead go straight to create an updated multixact with
the additional lock strength.

3. Most importantly, make sure that both the single-xid-locker case and
the multixact-locker case optimization are applied always.  We do this
by checking both in a single place, rather than them appearing in two
separate portions of the routine -- something that is made possible by
the HeapTupleSatisfiesUpdate API change.  Previously we would only check
for the single-xid case when HTSU returned MayBeUpdated, and only
checked for the multixact case when HTSU returned BeingUpdated.  This
was at odds with what HTSU actually returned in one case: if our own
transaction was locker in a multixact, it returned MayBeUpdated, so the
optimization never applied.  This is what led to the large multixacts in
the first place.

Per bug report #8470 by Oskari Saarenmaa.
2015-04-10 13:47:15 -03:00