Commit Graph

2168 Commits

Author SHA1 Message Date
Teodor Sigaev 710d90da1f Add prefix operator for TEXT type.
The prefix operator along with SP-GiST indexes can be used as an alternative
for LIKE 'word%' commands  and it doesn't have a limitation of string/prefix
length as B-Tree has.

Bump catalog version

Author: Ildus Kurbangaliev with some editorization by me
Review by: Arthur Zakirov, Alexander Korotkov, and me
Discussion: https://www.postgresql.org/message-id/flat/20180202180327.222b04b3@wp.localdomain
2018-04-03 19:46:45 +03:00
Peter Eisentraut 056a5a3f63 Allow committing inside cursor loop
Previously, committing or aborting inside a cursor loop was prohibited
because that would close and remove the cursor.  To allow that,
automatically convert such cursors to holdable cursors so they survive
commits or rollbacks.  Portals now have a new state "auto-held", which
means they have been converted automatically from pinned.  An auto-held
portal is kept on transaction commit or rollback, but is still removed
when returning to the main loop on error.

This supports all languages that have cursor loop constructs: PL/pgSQL,
PL/Python, PL/Perl.

Reviewed-by: Ildus Kurbangaliev <i.kurbangaliev@postgrespro.ru>
2018-03-28 19:03:26 -04:00
Tom Lane 442accc3fe Allow memory contexts to have both fixed and variable ident strings.
Originally, we treated memory context names as potentially variable in
all cases, and therefore always copied them into the context header.
Commit 9fa6f00b1 rethought this a little bit and invented a distinction
between fixed and variable names, skipping the copy step for the former.
But we can make things both simpler and more useful by instead allowing
there to be two parts to a context's identification, a fixed "name" and
an optional, variable "ident".  The name supplied in the context create
call is now required to be a compile-time-constant string in all cases,
as it is never copied but just pointed to.  The "ident" string, if
wanted, is supplied later.  This is needed because typically we want
the ident to be stored inside the context so that it's cleaned up
automatically on context deletion; that means it has to be copied into
the context before we can set the pointer.

The cost of this approach is basically just an additional pointer field
in struct MemoryContextData, which isn't much overhead, and is bought
back entirely in the AllocSet case by not needing a headerSize field
anymore, since we no longer have to cope with variable header length.
In addition, we can simplify the internal interfaces for memory context
creation still further, saving a few cycles there.  And it's no longer
true that a custom identifier disqualifies a context from participating
in aset.c's freelist scheme, so possibly there's some win on that end.

All the places that were using non-compile-time-constant context names
are adjusted to put the variable info into the "ident" instead.  This
allows more effective identification of those contexts in many cases;
for example, subsidary contexts of relcache entries are now identified
by both type (e.g. "index info") and relname, where before you got only
one or the other.  Contexts associated with PL function cache entries
are now identified more fully and uniformly, too.

I also arranged for plancache contexts to use the query source string
as their identifier.  This is basically free for CachedPlanSources, as
they contained a copy of that string already.  We pay an extra pstrdup
to do it for CachedPlans.  That could perhaps be avoided, but it would
make things more fragile (since the CachedPlanSource is sometimes
destroyed first).  I suspect future improvements in error reporting will
require CachedPlans to have a copy of that string anyway, so it's not
clear that it's worth moving mountains to avoid it now.

This also changes the APIs for context statistics routines so that the
context-specific routines no longer assume that output goes straight
to stderr, nor do they know all details of the output format.  This
is useful immediately to reduce code duplication, and it also allows
for external code to do something with stats output that's different
from printing to stderr.

The reason for pushing this now rather than waiting for v12 is that
it rethinks some of the API changes made by commit 9fa6f00b1.  Seems
better for extension authors to endure just one round of API changes
not two.

Discussion: https://postgr.es/m/CAB=Je-FdtmFZ9y9REHD7VsSrnCkiBhsA4mdsLKSPauwXtQBeNA@mail.gmail.com
2018-03-27 16:46:51 -04:00
Simon Riggs c203d6cf81 Allow HOT updates for some expression indexes
If the value of an index expression is unchanged after UPDATE,
allow HOT updates where previously we disallowed them, giving
a significant performance boost in those cases.

Particularly useful for indexes such as JSON->>field where the
JSON value changes but the indexed value does not.

Submitted as "surjective indexes" patch, now enabled by use
of new "recheck_on_update" parameter.

Author: Konstantin Knizhnik
Reviewer: Simon Riggs, with much wordsmithing and some cleanup
2018-03-27 19:57:02 +01:00
Andres Freund 432bb9e04d Basic JIT provider and error handling infrastructure.
This commit introduces:

1) JIT provider abstraction, which allows JIT functionality to be
   implemented in separate shared libraries. That's desirable because
   it allows to install JIT support as a separate package, and because
   it allows experimentation with different forms of JITing.
2) JITContexts which can be, using functions introduced in follow up
   commits, used to emit JITed functions, and have them be cleaned up
   on error.
3) The outline of a LLVM JIT provider, which will be fleshed out in
   subsequent commits.

Documentation for GUCs added, and for JIT in general, will be added in
later commits.

Author: Andres Freund, with architectural input from Jeff Davis
Discussion: https://postgr.es/m/20170901064131.tazjxwus3k2w3ybh@alap3.anarazel.de
2018-03-21 19:28:28 -07:00
Tom Lane 742869946f Fix mishandling of quoted-list GUC values in pg_dump and ruleutils.c.
Code that prints out the contents of setconfig or proconfig arrays in
SQL format needs to handle GUC_LIST_QUOTE variables differently from
other ones, because for those variables, flatten_set_variable_args()
already applied a layer of quoting.  The value can therefore safely
be printed as-is, and indeed must be, or flatten_set_variable_args()
will muck it up completely on reload.  For all other GUC variables,
it's necessary and sufficient to quote the value as a SQL literal.

We'd recognized the need for this long ago, but mis-analyzed the
need slightly, thinking that all GUC_LIST_INPUT variables needed
the special treatment.  That's actually wrong, since a valid value
of a LIST variable might include characters that need quoting,
although no existing variables accept such values.

More to the point, we hadn't made any particular effort to keep the
various places that deal with this up-to-date with the set of variables
that actually need special treatment, meaning that we'd do the wrong
thing with, for example, temp_tablespaces values.  This affects dumping
of SET clauses attached to functions, as well as ALTER DATABASE/ROLE SET
commands.

In ruleutils.c we can fix it reasonably honestly by exporting a guc.c
function that allows discovering the flags for a given GUC variable.
But pg_dump doesn't have easy access to that, so continue the old method
of having a hard-wired list of affected variable names.  At least we can
fix it to have just one list not two, and update the list to match
current reality.

A remaining problem with this is that it only works for built-in
GUC variables.  pg_dump's list obvious knows nothing of third-party
extensions, and even the "ask guc.c" method isn't bulletproof since
the relevant extension might not be loaded.  There's no obvious
solution to that, so for now, we'll just have to discourage extension
authors from inventing custom GUCs that need GUC_LIST_QUOTE.

This has been busted for a long time, so back-patch to all supported
branches.

Michael Paquier and Tom Lane, reviewed by Kyotaro Horiguchi and
Pavel Stehule

Discussion: https://postgr.es/m/20180111064900.GA51030@paquier.xyz
2018-03-21 20:03:28 -04:00
Tom Lane 6497a18e6c Fix some corner-case issues in REFRESH MATERIALIZED VIEW CONCURRENTLY.
refresh_by_match_merge() has some issues in the way it builds a SQL
query to construct the "diff" table:

1. It doesn't require the selected unique index(es) to be indimmediate.
2. It doesn't pay attention to the particular equality semantics enforced
by a given index, but just assumes that they must be those of the column
datatype's default btree opclass.
3. It doesn't check that the indexes are btrees.
4. It's insufficiently careful to ensure that the parser will pick the
intended operator when parsing the query.  (This would have been a
security bug before CVE-2018-1058.)
5. It's not careful about indexes on system columns.

The way to fix #4 is to make use of the existing code in ri_triggers.c
for generating an arbitrary binary operator clause.  I chose to move
that to ruleutils.c, since that seems a more reasonable place to be
exporting such functionality from than ri_triggers.c.

While #1, #3, and #5 are just latent given existing feature restrictions,
and #2 doesn't arise in the core system for lack of alternate opclasses
with different equality behaviors, #4 seems like an issue worth
back-patching.  That's the bulk of the change anyway, so just back-patch
the whole thing to 9.4 where this code was introduced.

Discussion: https://postgr.es/m/13836.1521413227@sss.pgh.pa.us
2018-03-19 18:50:05 -04:00
Tom Lane 58d9acc18d Fix assorted issues in convert_to_scalar().
If convert_to_scalar is passed a pair of datatypes it can't cope with,
its former behavior was just to elog(ERROR).  While this is OK so far as
the core code is concerned, there's extension code that would like to use
scalarltsel/scalargtsel/etc as selectivity estimators for operators that
work on non-core datatypes, and this behavior is a show-stopper for that
use-case.  If we simply allow convert_to_scalar to return FALSE instead of
outright failing, then the main logic of scalarltsel/scalargtsel will work
fine for any operator that behaves like a scalar inequality comparison.
The lack of conversion capability will mean that we can't estimate to
better than histogram-bin-width precision, since the code will effectively
assume that the comparison constant falls at the middle of its bin.  But
that's still a lot better than nothing.  (Someday we should provide a way
for extension code to supply a custom version of convert_to_scalar, but
today is not that day.)

While poking at this issue, we noted that the existing code for handling
type bytea in convert_to_scalar is several bricks shy of a load.
It assumes without checking that if the comparison value is type bytea,
the bounds values are too; in the worst case this could lead to a crash.
It also fails to detoast the input values, so that the comparison result is
complete garbage if any input is toasted out-of-line, compressed, or even
just short-header.  I'm not sure how often such cases actually occur ---
the bounds values, at least, are probably safe since they are elements of
an array and hence can't be toasted.  But that doesn't make this code OK.

Back-patch to all supported branches, partly because author requested that,
but mostly because of the bytea bugs.  The change in API for the exposed
routine convert_network_to_scalar() is theoretically a back-patch hazard,
but it seems pretty unlikely that any third-party code is calling that
function directly.

Tomas Vondra, with some adjustments by me

Discussion: https://postgr.es/m/b68441b6-d18f-13ab-b43b-9a72188a4e02@2ndquadrant.com
2018-03-03 20:31:35 -05:00
Peter Eisentraut fd1a421fe6 Add prokind column, replacing proisagg and proiswindow
The new column distinguishes normal functions, procedures, aggregates,
and window functions.  This replaces the existing columns proisagg and
proiswindow, and replaces the convention that procedures are indicated
by prorettype == 0.  Also change prorettype to be VOIDOID for procedures.

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
2018-03-02 13:48:33 -05:00
Alvaro Herrera a26116c6cb Refactor format_type APIs to be more modular
Introduce a new format_type_extended, with a flags bitmask argument that
can modify the default behavior.  A few compatibility and readability
wrappers remain:
	format_type_be
	format_type_be_qualified
	format_type_with_typemod
while format_type_with_typemod_qualified, which had a single caller, is
removed.

Author: Michael Paquier, some revisions by me
Discussion: 20180213035107.GA2915@paquier.xyz
2018-02-17 19:02:15 -03:00
Tom Lane 4b93f57999 Make plpgsql use its DTYPE_REC code paths for composite-type variables.
Formerly, DTYPE_REC was used only for variables declared as "record";
variables of named composite types used DTYPE_ROW, which is faster for
some purposes but much less flexible.  In particular, the ROW code paths
are entirely incapable of dealing with DDL-caused changes to the number
or data types of the columns of a row variable, once a particular plpgsql
function has been parsed for the first time in a session.  And, since the
stored representation of a ROW isn't a tuple, there wasn't any easy way
to deal with variables of domain-over-composite types, since the domain
constraint checking code would expect the value to be checked to be a
tuple.  A lesser, but still real, annoyance is that ROW format cannot
represent a true NULL composite value, only a row of per-field NULL
values, which is not exactly the same thing.

Hence, switch to using DTYPE_REC for all composite-typed variables,
whether "record", named composite type, or domain over named composite
type.  DTYPE_ROW remains but is used only for its native purpose, to
represent a fixed-at-compile-time list of variables, for instance the
targets of an INTO clause.

To accomplish this without taking significant performance losses, introduce
infrastructure that allows storing composite-type variables as "expanded
objects", similar to the "expanded array" infrastructure introduced in
commit 1dc5ebc90.  A composite variable's value is thereby kept (most of
the time) in the form of separate Datums, so that field accesses and
updates are not much more expensive than they were in the ROW format.
This holds the line, more or less, on performance of variables of named
composite types in field-access-intensive microbenchmarks, and makes
variables declared "record" perform much better than before in similar
tests.  In addition, the logic involved with enforcing composite-domain
constraints against updates of individual fields is in the expanded
record infrastructure not plpgsql proper, so that it might be reusable
for other purposes.

In further support of this, introduce a typcache feature for assigning a
unique-within-process identifier to each distinct tuple descriptor of
interest; in particular, DDL alterations on composite types result in a new
identifier for that type.  This allows very cheap detection of the need to
refresh tupdesc-dependent data.  This improves on the "tupDescSeqNo" idea
I had in commit 687f096ea: that assigned identifying sequence numbers to
successive versions of individual composite types, but the numbers were not
unique across different types, nor was there support for assigning numbers
to registered record types.

In passing, allow plpgsql functions to accept as well as return type
"record".  There was no good reason for the old restriction, and it
was out of step with most of the other PLs.

Tom Lane, reviewed by Pavel Stehule

Discussion: https://postgr.es/m/8962.1514399547@sss.pgh.pa.us
2018-02-13 18:52:21 -05:00
Alvaro Herrera 8237f27b50 get_relid_attribute_name is dead, long live get_attname
The modern way is to use a missing_ok argument instead of two separate
almost-identical routines, so do that.

Author: Michaël Paquier
Reviewed-by: Álvaro Herrera
Discussion: https://postgr.es/m/20180201063212.GE6398@paquier.xyz
2018-02-12 19:33:15 -03:00
Robert Haas 935dee9ad5 Mark assorted GUC variables as PGDLLIMPORT.
This makes life easier for extension authors.

Metin Doslu

Discussion: http://postgr.es/m/CAL1dPcfa45o1dC-c4t-48v0OZE6oy4ChJhObrtkK8mzNfXqDTA@mail.gmail.com
2018-02-09 15:54:45 -05:00
Robert Haas 9da0cc3528 Support parallel btree index builds.
To make this work, tuplesort.c and logtape.c must also support
parallelism, so this patch adds that infrastructure and then applies
it to the particular case of parallel btree index builds.  Testing
to date shows that this can often be 2-3x faster than a serial
index build.

The model for deciding how many workers to use is fairly primitive
at present, but it's better than not having the feature.  We can
refine it as we get more experience.

Peter Geoghegan with some help from Rushabh Lathia.  While Heikki
Linnakangas is not an author of this patch, he wrote other patches
without which this feature would not have been possible, and
therefore the release notes should possibly credit him as an author
of this feature.  Reviewed by Claudio Freire, Heikki Linnakangas,
Thomas Munro, Tels, Amit Kapila, me.

Discussion: http://postgr.es/m/CAM3SWZQKM=Pzc=CAHzRixKjp2eO5Q0Jg1SoFQqeXFQ647JiwqQ@mail.gmail.com
Discussion: http://postgr.es/m/CAH2-Wz=AxWqDoVvGU7dq856S4r6sJAj6DBn7VMtigkB33N5eyg@mail.gmail.com
2018-02-02 13:32:44 -05:00
Tom Lane 97d4445a03 Save a few bytes by removing useless last argument to SearchCatCacheList.
There's never any value in giving a fully specified cache key to
SearchCatCacheList: you might as well call SearchCatCache instead,
since there could be only one match.  So the maximum useful number of
key arguments is one less than the supported number of key columns.
We might as well remove the useless extra argument and save some few
bytes per call site, as well as a cycle or so per call.

I believe the reason it was coded like this is that originally, callers
had to write out all the dummy arguments in each call, and so it seemed
less confusing if SearchCatCache and SearchCatCacheList took the same
number of key arguments.  But since commit e26c539e9, callers only write
their live arguments explicitly, making that a non-factor; and there's
surely been enough time for third-party modules to adapt to that coding
style.  So this is only an ABI break not an API break for callers.

Per discussion with Oliver Ford, this might also make it less confusing
how to use SearchCatCacheList correctly.

Discussion: https://postgr.es/m/27788.1517069693@sss.pgh.pa.us
2018-01-29 15:13:17 -05:00
Peter Eisentraut 7404e77cc1 Split out documentation of SSL parameters into their own section
Split the "Authentication and Security" section into two separate
sections "Authentication" and "SSL".  The latter part has gotten much
longer over time, and doesn't primarily have to do with authentication.

Also, the row_security parameter was inconsistently categorized, so
clean that up while we're here.
2018-01-23 07:11:38 -05:00
Peter Eisentraut 8561e4840c Transaction control in PL procedures
In each of the supplied procedural languages (PL/pgSQL, PL/Perl,
PL/Python, PL/Tcl), add language-specific commit and rollback
functions/commands to control transactions in procedures in that
language.  Add similar underlying functions to SPI.  Some additional
cleanup so that transaction commit or abort doesn't blow away data
structures still used by the procedure call.  Add execution context
tracking to CALL and DO statements so that transaction control commands
can only be issued in top-level procedure and block calls, not function
calls or other procedure or block calls.

- SPI

Add a new function SPI_connect_ext() that is like SPI_connect() but
allows passing option flags.  The only option flag right now is
SPI_OPT_NONATOMIC.  A nonatomic SPI connection can execute transaction
control commands, otherwise it's not allowed.  This is meant to be
passed down from CALL and DO statements which themselves know in which
context they are called.  A nonatomic SPI connection uses different
memory management.  A normal SPI connection allocates its memory in
TopTransactionContext.  For nonatomic connections we use PortalContext
instead.  As the comment in SPI_connect_ext() (previously SPI_connect())
indicates, one could potentially use PortalContext in all cases, but it
seems safest to leave the existing uses alone, because this stuff is
complicated enough already.

SPI also gets new functions SPI_start_transaction(), SPI_commit(), and
SPI_rollback(), which can be used by PLs to implement their transaction
control logic.

- portalmem.c

Some adjustments were made in the code that cleans up portals at
transaction abort.  The portal code could already handle a command
*committing* a transaction and continuing (e.g., VACUUM), but it was not
quite prepared for a command *aborting* a transaction and continuing.

In AtAbort_Portals(), remove the code that marks an active portal as
failed.  As the comment there already predicted, this doesn't work if
the running command wants to keep running after transaction abort.  And
it's actually not necessary, because pquery.c is careful to run all
portal code in a PG_TRY block and explicitly runs MarkPortalFailed() if
there is an exception.  So the code in AtAbort_Portals() is never used
anyway.

In AtAbort_Portals() and AtCleanup_Portals(), we need to be careful not
to clean up active portals too much.  This mirrors similar code in
PreCommit_Portals().

- PL/Perl

Gets new functions spi_commit() and spi_rollback()

- PL/pgSQL

Gets new commands COMMIT and ROLLBACK.

Update the PL/SQL porting example in the documentation to reflect that
transactions are now possible in procedures.

- PL/Python

Gets new functions plpy.commit and plpy.rollback.

- PL/Tcl

Gets new commands commit and rollback.

Reviewed-by: Andrew Dunstan <andrew.dunstan@2ndquadrant.com>
2018-01-22 08:43:06 -05:00
Peter Eisentraut 8b9e9644dc Replace AclObjectKind with ObjectType
AclObjectKind was basically just another enumeration for object types,
and we already have a preferred one for that.  It's only used in
aclcheck_error.  By using ObjectType instead, we can also give some more
precise error messages, for example "index" instead of "relation".

Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
2018-01-19 14:01:15 -05:00
Peter Eisentraut 2c6f37ed62 Replace GrantObjectType with ObjectType
There used to be a lot of different *Type and *Kind symbol groups to
address objects within different commands, most of which have been
replaced by ObjectType, starting with
b256f24264.  But this conversion was never
done for the ACL commands until now.

This change ends up being just a plain replacement of the types and
symbols, without any code restructuring needed, except deleting some now
redundant code.

Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
Reviewed-by: Stephen Frost <sfrost@snowman.net>
2018-01-19 14:01:14 -05:00
Andrew Dunstan 585e166e46 Fix compiler warnings due to commit cc4feded 2018-01-17 03:33:02 -05:00
Andrew Dunstan cc4feded0a Centralize json and jsonb handling of datetime types
The creates a single function JsonEncodeDateTime which will format these
data types in an efficient and consistent manner. This will be all the
more important when we come to jsonpath so we don't have to implement yet
more code doing the same thing in two more places.

This also extends the code to handle time and timetz types which were
not previously handled specially. This requires exposing the time2tm and
timetz2tm functions.

Patch from Nikita Glukhov
2018-01-16 19:07:13 -05:00
Peter Eisentraut a77dd53f30 Remove PortalGetQueryDesc()
After having gotten rid of PortalGetHeapMemory(), there seems little
reason to keep one Portal access macro around that offers no actual
abstraction and isn't consistently used anyway.

Reviewed-by: Andrew Dunstan <andrew.dunstan@2ndquadrant.com>
Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org>
2018-01-09 13:47:56 -05:00
Peter Eisentraut 0f7c49e855 Update portal-related memory context names and API
Rename PortalMemory to TopPortalContext, to avoid confusion with
PortalContext and align naming with similar top-level memory contexts.

Rename PortalData's "heap" field to portalContext.  The "heap" naming
seems quite antiquated and confusing.  Also get rid of the
PortalGetHeapMemory() macro and access the field directly, which we do
for other portal fields, so this abstraction doesn't buy anything.

Reviewed-by: Andrew Dunstan <andrew.dunstan@2ndquadrant.com>
Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org>
2018-01-09 13:47:56 -05:00
Bruce Momjian 9d4649ca49 Update copyright for 2018
Backpatch-through: certain files through 9.3
2018-01-02 23:30:12 -05:00
Peter Eisentraut 0689dc3a23 Add includes to make header files self-contained 2017-12-26 10:21:27 -05:00
Teodor Sigaev ff963b393c Add polygon opclass for SP-GiST
Polygon opclass uses compress method feature of SP-GiST added earlier. For now
it's a single operator class which uses this feature. SP-GiST actually indexes
a bounding boxes of input polygons, so part of supported operations are lossy.
Opclass uses most methods of corresponding opclass over boxes of SP-GiST and
treats bounding boxes as point in 4D-space.

Bump catalog version.

Authors: Nikita Glukhov, Alexander Korotkov with minor editorization by me
Reviewed-By: all authors + Darafei Praliaskouski
Discussion: https://www.postgresql.org/message-id/flat/54907069.1030506@sigaev.ru
2017-12-25 18:59:38 +03:00
Andres Freund ab9e0e718a Add shared tuplestores.
SharedTuplestore allows multiple participants to write into it and
then read the tuples back from it in parallel.  Each reader receives
partial results.

For now it always uses disk files, but other buffering policies and
other kinds of scans (ie each reader receives complete results) may be
useful in future.

The upcoming parallel hash join feature will use this facility.

Author: Thomas Munro
Reviewed-By: Peter Geoghegan, Andres Freund, Robert Haas
Discussion: https://postgr.es/m/CAEepm=2W=cOkiZxcg6qiFQP-dHUe09aqTrEMM7yJDrHMhDv_RA@mail.gmail.com
2017-12-18 14:23:19 -08:00
Tom Lane 9fa6f00b13 Rethink MemoryContext creation to improve performance.
This patch makes a number of interrelated changes to reduce the overhead
involved in creating/deleting memory contexts.  The key ideas are:

* Include the AllocSetContext header of an aset.c context in its first
malloc request, rather than allocating it separately in TopMemoryContext.
This means that we now always create an initial or "keeper" block in an
aset, even if it never receives any allocation requests.

* Create freelists in which we can save and recycle recently-destroyed
asets (this idea is due to Robert Haas).

* In the common case where the name of a context is a constant string,
just store a pointer to it in the context header, rather than copying
the string.

The first change eliminates a palloc/pfree cycle per context, and
also avoids bloat in TopMemoryContext, at the price that creating
a context now involves a malloc/free cycle even if the context never
receives any allocations.  That would be a loser for some common
usage patterns, but recycling short-lived contexts via the freelist
eliminates that pain.

Avoiding copying constant strings not only saves strlen() and strcpy()
overhead, but is an essential part of the freelist optimization because
it makes the context header size constant.  Currently we make no
attempt to use the freelist for contexts with non-constant names.
(Perhaps someday we'll need to think harder about that, but in current
usage, most contexts with custom names are long-lived anyway.)

The freelist management in this initial commit is pretty simplistic,
and we might want to refine it later --- but in common workloads that
will never matter because the freelists will never get full anyway.

To create a context with a non-constant name, one is now required to
call AllocSetContextCreateExtended and specify the MEMCONTEXT_COPY_NAME
option.  AllocSetContextCreate becomes a wrapper macro, and it includes
a test that will complain about non-string-literal context name
parameters on gcc and similar compilers.

An unfortunate side effect of making AllocSetContextCreate a macro is
that one is now *required* to use the size parameter abstraction macros
(ALLOCSET_DEFAULT_SIZES and friends) with it; the pre-9.6 habit of
writing out individual size parameters no longer works unless you
switch to AllocSetContextCreateExtended.

Internally to the memory-context-related modules, the context creation
APIs are simplified, removing the rather baroque original design whereby
a context-type module called mcxt.c which then called back into the
context-type module.  That saved a bit of code duplication, but not much,
and it prevented context-type modules from exercising control over the
allocation of context headers.

In passing, I converted the test-and-elog validation of aset size
parameters into Asserts to save a few more cycles.  The original thought
was that callers might compute size parameters on the fly, but in practice
nobody does that, so it's useless to expend cycles on checking those
numbers in production builds.

Also, mark the memory context method-pointer structs "const",
just for cleanliness.

Discussion: https://postgr.es/m/2264.1512870796@sss.pgh.pa.us
2017-12-13 13:55:16 -05:00
Robert Haas c572599c65 Mark assorted variables PGDLLIMPORT.
This makes life easier for extension authors who wish to support
Windows.

Brian Cloutier, slightly amended by me.

Discussion: http://postgr.es/m/CAJCy68fscdNhmzFPS4kyO00CADkvXvEa-28H-OtENk-pa2OTWw@mail.gmail.com
2017-12-05 09:23:57 -05:00
Peter Eisentraut e4128ee767 SQL procedures
This adds a new object type "procedure" that is similar to a function
but does not have a return type and is invoked by the new CALL statement
instead of SELECT or similar.  This implementation is aligned with the
SQL standard and compatible with or similar to other SQL implementations.

This commit adds new commands CALL, CREATE/ALTER/DROP PROCEDURE, as well
as ALTER/DROP ROUTINE that can refer to either a function or a
procedure (or an aggregate function, as an extension to SQL).  There is
also support for procedures in various utility commands such as COMMENT
and GRANT, as well as support in pg_dump and psql.  Support for defining
procedures is available in all the languages supplied by the core
distribution.

While this commit is mainly syntax sugar around existing functionality,
future features will rely on having procedures as a separate object
type.

Reviewed-by: Andrew Dunstan <andrew.dunstan@2ndquadrant.com>
2017-11-30 11:03:20 -05:00
Robert Haas eaedf0df71 Update typedefs.list and re-run pgindent
Discussion: http://postgr.es/m/CA+TgmoaA9=1RWKtBWpDaj+sF3Stgc8sHgf5z=KGtbjwPLQVDMA@mail.gmail.com
2017-11-29 09:24:24 -05:00
Simon Riggs a4ccc1cef5 Generational memory allocator
Add new style of memory allocator, known as Generational
appropriate for use in cases where memory is allocated
and then freed in roughly oldest first order (FIFO).

Use new allocator for logical decoding’s reorderbuffer
to significantly reduce memory usage and improve performance.

Author: Tomas Vondra
Reviewed-by: Simon Riggs
2017-11-23 05:45:07 +11:00
Simon Riggs c2513365a0 Parameter toast_tuple_target controls TOAST for new rows
Specifies the point at which we try to move long column values
into TOAST tables.

No effect on existing rows.

Discussion: https://postgr.es/m/CANP8+jKsVmw6CX6YP9z7zqkTzcKV1+Uzr3XjKcZW=2Ya00OyQQ@mail.gmail.com

Author: Simon Riggs <simon@2ndQudrant.com>
Reviewed-by: Andrew Dunstan <andrew.dunstan@2ndQuadrant.com>
2017-11-20 09:50:10 +11:00
Tom Lane 687f096ea9 Make PL/Python handle domain-type conversions correctly.
Fix PL/Python so that it can handle domains over composite, and so that
it enforces domain constraints correctly in other cases that were not
always done properly before.  Notably, it didn't do arrays of domains
right (oversight in commit c12d570fa), and it failed to enforce domain
constraints when returning a composite type containing a domain field,
and if a transform function is being used for a domain's base type then
it failed to enforce domain constraints on the result.  Also, in many
places it missed checking domain constraints on null values, because
the plpy_typeio code simply wasn't called for Py_None.

Rather than try to band-aid these problems, I made a significant
refactoring of the plpy_typeio logic.  The existing design of recursing
for array and composite members is extended to also treat domains as
containers requiring recursion, and the APIs for the module are cleaned
up and simplified.

The patch also modifies plpy_typeio to rely on the typcache more than
it did before (which was pretty much not at all).  This reduces the
need for repetitive lookups, and lets us get rid of an ad-hoc scheme
for detecting changes in composite types.  I added a couple of small
features to typcache to help with that.

Although some of this is fixing bugs that long predate v11, I don't
think we should risk a back-patch: it's a significant amount of code
churn, and there've been no complaints from the field about the bugs.

Tom Lane, reviewed by Anthony Bykov

Discussion: https://postgr.es/m/24449.1509393613@sss.pgh.pa.us
2017-11-16 16:23:04 -05:00
Peter Eisentraut 0e1539ba0d Add some const decorations to prototypes
Reviewed-by: Fabien COELHO <coelho@cri.ensmp.fr>
2017-11-10 13:38:57 -05:00
Tom Lane 92a1834dd8 Fix unportable spelling of int64 constant.
Per buildfarm member pademelon.
2017-11-07 13:54:36 -05:00
Simon Riggs 98267ee83e Exclude pg_internal.init from BASE_BACKUP
Add docs to explain this for other backup mechanisms

Author: David Steele <david@pgmasters.net>
Reviewed-by: Petr Jelinek <petr.jelinek@2ndQuadrant.com> et al
2017-11-07 12:28:35 -05:00
Robert Haas 846fcc8516 Fix problems with the "role" GUC and parallel query.
Without this fix, dropping a role can sometimes result in parallel
query failures in sessions that have used "SET ROLE" to assume the
dropped role, even if that setting isn't active any more.

Report by Pavan Deolasee.  Patch by Amit Kapila, reviewed by me.

Discussion: http://postgr.es/m/CABOikdOomRcZsLsLK+Z+qENM1zxyaWnAvFh3MJZzZnnKiF+REg@mail.gmail.com
2017-10-29 12:58:40 +05:30
Robert Haas b7f3eb3140 Add hash_combine64.
Extracted from a larger patch by Amul Sul, with some comment additions
by me.

Discussion: http://postgr.es/m/20171024113004.hn5qajypin4dy5sw@alap3.anarazel.de
2017-10-29 12:41:43 +05:30
Tom Lane 37a795a60b Support domains over composite types.
This is the last major omission in our domains feature: you can now
make a domain over anything that's not a pseudotype.

The major complication from an implementation standpoint is that places
that might be creating tuples of a domain type now need to be prepared
to apply domain_check().  It seems better that unprepared code fail
with an error like "<type> is not composite" than that it silently fail
to apply domain constraints.  Therefore, relevant infrastructure like
get_func_result_type() and lookup_rowtype_tupdesc() has been adjusted
to treat domain-over-composite as a distinct case that unprepared code
won't recognize, rather than just transparently treating it the same
as plain composite.  This isn't a 100% solution to the possibility of
overlooked domain checks, but it catches most places.

In passing, improve typcache.c's support for domains (it can now cache
the identity of a domain's base type), and rewrite the argument handling
logic in jsonfuncs.c's populate_record[set]_worker to reduce duplicative
per-call lookups.

I believe this is code-complete so far as the core and contrib code go.
The PLs need varying amounts of work, which will be tackled in followup
patches.

Discussion: https://postgr.es/m/4206.1499798337@sss.pgh.pa.us
2017-10-26 13:47:45 -04:00
Andres Freund 141fd1b66c Improve sys/catcache performance.
The following are the individual improvements:
1) Avoidance of FunctionCallInfo based function calls, replaced by
   more efficient functions with a native C argument interface.
2) Don't extract columns from a cache entry's tuple whenever matching
   entries - instead store them as a Datum array. This also allows to
   get rid of having to build dummy tuples for negative & list
   entries, and of a hack for dealing with cstring vs. text weirdness.
3) Reorder members of catcache.h struct, so imortant entries are more
   likely to be on one cacheline.
4) Allowing the compiler to specialize critical SearchCatCache for a
   specific number of attributes allows to unroll loops and avoid
   other nkeys dependant initialization.
5) Only initializing the ScanKey when necessary, i.e. catcache misses,
   greatly reduces cache unnecessary cpu cache misses.
6) Split of the cache-miss case from the hash lookup, reducing stack
   allocations etc in the common case.
7) CatCTup and their corresponding heaptuple are allocated in one
   piece.

This results in making cache lookups themselves roughly three times as
fast - full-system benchmarks obviously improve less than that.

I've also evaluated further techniques:
- replace open coded hash with simplehash - the list walk right now
  shows up in profiles. Unfortunately it's not easy to do so safely as
  an entry's memory location can change at various times, which
  doesn't work well with the refcounting and cache invalidation.
- Cacheline-aligning CatCTup entries - helps some with performance,
  but the win isn't big and the code for it is ugly, because the
  tuples have to be freed as well.
- add more proper functions, rather than macros for
  SearchSysCacheCopyN etc., but right now they don't show up in
  profiles.

The reason the macro wrapper for syscache.c/h have to be changed,
rather than just catcache, is that doing otherwise would require
exposing the SysCache array to the outside.  That might be a good idea
anyway, but it's for another day.

Author: Andres Freund
Reviewed-By: Robert Haas
Discussion: https://postgr.es/m/20170914061207.zxotvyopetm7lrrp@alap3.anarazel.de
2017-10-13 14:22:41 -07:00
Tom Lane 118e99c3d7 Fix low-probability loss of NOTIFY messages due to XID wraparound.
Up to now async.c has used TransactionIdIsInProgress() to detect whether
a notify message's source transaction is still running.  However, that
function has a quick-exit path that reports that XIDs before RecentXmin
are no longer running.  If a listening backend is doing nothing but
listening, and not running any queries, there is nothing that will advance
its value of RecentXmin.  Once 2 billion transactions elapse, the
RecentXmin check causes active transactions to be reported as not running.
If they aren't committed yet according to CLOG, async.c decides they
aborted and discards their messages.  The timing for that is a bit tight
but it can happen when multiple backends are sending notifies concurrently.
The net symptom therefore is that a sufficiently-long-surviving
listen-only backend starts to miss some fraction of NOTIFY traffic,
but only under heavy load.

The only function that updates RecentXmin is GetSnapshotData().
A brute-force fix would therefore be to take a snapshot before
processing incoming notify messages.  But that would add cycles,
as well as contention for the ProcArrayLock.  We can be smarter:
having taken the snapshot, let's use that to check for running
XIDs, and not call TransactionIdIsInProgress() at all.  In this
way we reduce the number of ProcArrayLock acquisitions from one
per message to one per notify interrupt; that's the same under
light load but should be a benefit under heavy load.  Light testing
says that this change is a wash performance-wise for normal loads.

I looked around for other callers of TransactionIdIsInProgress()
that might be at similar risk, and didn't find any; all of them
are inside transactions that presumably have already taken a
snapshot.

Problem report and diagnosis by Marko Tiikkaja, patch by me.
Back-patch to all supported branches, since it's been like this
since 9.0.

Discussion: https://postgr.es/m/20170926182935.14128.65278@wrigleys.postgresql.org
2017-10-11 14:28:33 -04:00
Andres Freund 9eafa2b5b0 Msvc doesn't know UINT16_MAX, replace with PG_UINT16_MAX.
UINT16_MAX usage is originating from commit 212e6f34d5.

Per buildfarm animal currawong.
2017-10-04 10:01:02 -07:00
Andres Freund 212e6f34d5 Replace binary search in fmgr_isbuiltin with a lookup array.
Turns out we have enough functions that the binary search is quite
noticeable in profiles.

Thus have Gen_fmgrtab.pl build a new mapping from a builtin function's
oid to an index in the existing fmgr_builtins array. That keeps the
additional memory usage at a reasonable amount.

Author: Andres Freund, with input from Tom Lane
Discussion: https://postgr.es/m/20170914065128.a5sk7z4xde5uy3ei@alap3.anarazel.de
2017-10-04 00:22:38 -07:00
Tom Lane c12d570fa1 Support arrays over domains.
Allowing arrays with a domain type as their element type was left un-done
in the original domain patch, but not for any very good reason.  This
omission leads to such surprising results as array_agg() not working on
a domain column, because the parser can't identify a suitable output type
for the polymorphic aggregate.

In order to fix this, first clean up the APIs of coerce_to_domain() and
some internal functions in parse_coerce.c so that we consistently pass
around a CoercionContext along with CoercionForm.  Previously, we sometimes
passed an "isExplicit" boolean flag instead, which is strictly less
information; and coerce_to_domain() didn't even get that, but instead had
to reverse-engineer isExplicit from CoercionForm.  That's contrary to the
documentation in primnodes.h that says that CoercionForm only affects
display and not semantics.  I don't think this change fixes any live bugs,
but it makes things more consistent.  The main reason for doing it though
is that now build_coercion_expression() receives ccontext, which it needs
in order to be able to recursively invoke coerce_to_target_type().

Next, reimplement ArrayCoerceExpr so that the node does not directly know
any details of what has to be done to the individual array elements while
performing the array coercion.  Instead, the per-element processing is
represented by a sub-expression whose input is a source array element and
whose output is a target array element.  This simplifies life in
parse_coerce.c, because it can build that sub-expression by a recursive
invocation of coerce_to_target_type().  The executor now handles the
per-element processing as a compiled expression instead of hard-wired code.
The main advantage of this is that we can use a single ArrayCoerceExpr to
handle as many as three successive steps per element: base type conversion,
typmod coercion, and domain constraint checking.  The old code used two
stacked ArrayCoerceExprs to handle type + typmod coercion, which was pretty
inefficient, and adding yet another array deconstruction to do domain
constraint checking seemed very unappetizing.

In the case where we just need a single, very simple coercion function,
doing this straightforwardly leads to a noticeable increase in the
per-array-element runtime cost.  Hence, add an additional shortcut evalfunc
in execExprInterp.c that skips unnecessary overhead for that specific form
of expression.  The runtime speed of simple cases is within 1% or so of
where it was before, while cases that previously required two levels of
array processing are significantly faster.

Finally, create an implicit array type for every domain type, as we do for
base types, enums, etc.  Everything except the array-coercion case seems
to just work without further effort.

Tom Lane, reviewed by Andrew Dunstan

Discussion: https://postgr.es/m/9852.1499791473@sss.pgh.pa.us
2017-09-30 13:40:56 -04:00
Andres Freund f14241236e Fix typo.
Reported-By: Thomas Munro and Jesper Pedersen
2017-09-29 17:24:39 -07:00
Andres Freund 791961f59b Add inline murmurhash32(uint32) function.
The function already existed in tidbitmap.c but more users requiring
fast hashing of 32bit ints are coming up.

Author: Andres Freund
Discussion: https://postgr.es/m/20170914061207.zxotvyopetm7lrrp@alap3.anarazel.de
2017-09-22 13:38:42 -07:00
Tom Lane 85feb77aa0 Assume wcstombs(), towlower(), and sibling functions are always present.
These functions are required by SUS v2, which is our minimum baseline
for Unix platforms, and are present on all interesting Windows versions
as well.  Even our oldest buildfarm members have them.  Thus, we were not
testing the "!USE_WIDE_UPPER_LOWER" code paths, which explains why the bug
fixed in commit e6023ee7f escaped detection.  Per discussion, there seems
to be no more real-world value in maintaining this option.  Hence, remove
the configure-time tests for wcstombs() and towlower(), remove the
USE_WIDE_UPPER_LOWER symbol, and remove all the !USE_WIDE_UPPER_LOWER code.
There's not actually all that much of the latter, but simplifying the #if
nests is a win in itself.

Discussion: https://postgr.es/m/20170921052928.GA188913@rfd.leadboat.com
2017-09-22 11:00:58 -04:00
Tom Lane 4bd1994650 Make DatumGetFoo/PG_GETARG_FOO/PG_RETURN_FOO macro names more consistent.
By project convention, these names should include "P" when dealing with a
pointer type; that is, if the result of a GETARG macro is of type FOO *,
it should be called PG_GETARG_FOO_P not just PG_GETARG_FOO.  Some newer
types such as JSONB and ranges had not followed the convention, and a
number of contrib modules hadn't gotten that memo either.  Rename the
offending macros to improve consistency.

In passing, fix a few places that thought PG_DETOAST_DATUM() returns
a Datum; it does not, it returns "struct varlena *".  Applying
DatumGetPointer to that happens not to cause any bad effects today,
but it's formally wrong.  Also, adjust an ltree macro that was designed
without any thought for what pgindent would do with it.

This is all cosmetic and shouldn't have any impact on generated code.

Mark Dilger, some further tweaks by me

Discussion: https://postgr.es/m/EA5676F4-766F-4F38-8348-ECC7DB427C6A@gmail.com
2017-09-18 15:21:23 -04:00
Andres Freund cc5f81366c Add support for coordinating record typmods among parallel workers.
Tuples can have type RECORDOID and a typmod number that identifies a blessed
TupleDesc in a backend-private cache.  To support the sharing of such tuples
through shared memory and temporary files, provide a typmod registry in
shared memory.

To achieve that, introduce per-session DSM segments, created on demand when a
backend first runs a parallel query.  The per-session DSM segment has a
table-of-contents just like the per-query DSM segment, and initially the
contents are a shared record typmod registry and a DSA area to provide the
space it needs to grow.

State relating to the current session is accessed via a Session object
reached through global variable CurrentSession that may require significant
redesign further down the road as we figure out what else needs to be shared
or remodelled.

Author: Thomas Munro
Reviewed-By: Andres Freund
Discussion: https://postgr.es/m/CAEepm=0ZtQ-SpsgCyzzYpsXS6e=kZWqk3g5Ygn3MDV7A8dabUA@mail.gmail.com
2017-09-14 19:59:21 -07:00
Stephen Frost d2e40b310a Fix ordering in pg_dump of GRANTs
The order in which GRANTs are output is important as GRANTs which have
been GRANT'd by individuals via WITH GRANT OPTION GRANTs have to come
after the GRANT which included the WITH GRANT OPTION.  This happens
naturally in the backend during normal operation as we only change
existing ACLs in-place, only add new ACLs to the end, and when removing
an ACL we remove any which depend on it also.

Also, adjust the comments in acl.h to make this clear.

Unfortunately, the updates to pg_dump to handle initial privileges
involved pulling apart ACLs and then combining them back together and
could end up putting them back together in an invalid order, leading to
dumps which wouldn't restore.

Fix this by adjusting the queries used by pg_dump to ensure that the
ACLs are rebuilt in the same order in which they were originally.

Back-patch to 9.6 where the changes for initial privileges were done.
2017-09-13 20:02:09 -04:00
Andres Freund 6e7baa3227 Introduce BYTES unit for GUCs.
This is already useful for track_activity_query_size, and will further
be used in a later commit making the WAL segment size configurable.

Author: Beena Emerson
Reviewed-By: Andres Freund
Discussion: https://postgr.es/m/CAOG9ApEu8bXVwBxkOO9J7ZpM76TASK_vFMEEiCEjwhMmSLiaqQ@mail.gmail.com
2017-09-12 12:13:12 -07:00
Tom Lane 3ca930fc39 Improve performance of get_actual_variable_range with recently-dead tuples.
In commit fccebe421, we hacked get_actual_variable_range() to scan the
index with SnapshotDirty, so that if there are many uncommitted tuples
at the end of the index range, it wouldn't laboriously scan through all
of them looking for a live value to return.  However, that didn't fix it
for the case of many recently-dead tuples at the end of the index;
SnapshotDirty recognizes those as committed dead and so we're back to
the same problem.

To improve the situation, invent a "SnapshotNonVacuumable" snapshot type
and use that instead.  The reason this helps is that, if the snapshot
rejects a given index entry, we know that the indexscan will mark that
index entry as killed.  This means the next get_actual_variable_range()
scan will proceed past that entry without visiting the heap, making the
scan a lot faster.  We may end up accepting a recently-dead tuple as
being the estimated extremal value, but that doesn't seem much worse than
the compromise we made before to accept not-yet-committed extremal values.

The cost of the scan is still proportional to the number of dead index
entries at the end of the range, so in the interval after a mass delete
but before VACUUM's cleaned up the mess, it's still possible for
get_actual_variable_range() to take a noticeable amount of time, if you've
got enough such dead entries.  But the constant factor is much much better
than before, since all we need to do with each index entry is test its
"killed" bit.

We chose to back-patch commit fccebe421 at the time, but I'm hesitant to
do so here, because this form of the problem seems to affect many fewer
people.  Also, even when it happens, it's less bad than the case fixed
by commit fccebe421 because we don't get the contention effects from
expensive TransactionIdIsInProgress tests.

Dmitriy Sarafannikov, reviewed by Andrey Borodin

Discussion: https://postgr.es/m/05C72CF7-B5F6-4DB9-8A09-5AC897653113@yandex.ru
2017-09-07 19:41:51 -04:00
Peter Eisentraut 1356f78ea9 Reduce excessive dereferencing of function pointers
It is equivalent in ANSI C to write (*funcptr) () and funcptr().  These
two styles have been applied inconsistently.  After discussion, we'll
use the more verbose style for plain function pointer variables, to make
it clear that it's a variable, and the shorter style when the function
pointer is in a struct (s.func() or s->func()), because then it's clear
that it's not a plain function name, and otherwise the excessive
punctuation makes some of those invocations hard to read.

Discussion: https://www.postgresql.org/message-id/f52c16db-14ed-757d-4b48-7ef360b1631d@2ndquadrant.com
2017-09-07 13:56:09 -04:00
Simon Riggs 5b6d13eec7 Allow SET STATISTICS on expression indexes
Index columns are referenced by ordinal number rather than name, e.g.
CREATE INDEX coord_idx ON measured (x, y, (z + t));
ALTER INDEX coord_idx ALTER COLUMN 3 SET STATISTICS 1000;

Incompatibility note for release notes:
\d+ for indexes now also displays Stats Target

Authors: Alexander Korotkov, with contribution by Adrien NAYRAT
Review: Adrien NAYRAT, Simon Riggs
Wordsmith: Simon Riggs
2017-09-06 13:46:01 -07:00
Tom Lane b79d69b087 Ensure SIZE_MAX can be used throughout our code.
Pre-C99 platforms may lack <stdint.h> and thereby SIZE_MAX.  We have
a couple of places using the hack "(size_t) -1" as a fallback, but
it wasn't universally available; which means the code added in commit
2e70d6b5e fails to compile everywhere.  Move that hack to c.h so that
we can rely on having SIZE_MAX everywhere.

Per discussion, it'd be a good idea to make the macro's value safe
for use in #if-tests, but that will take a bit more work.  This is
just a quick expedient to get the buildfarm green again.

Back-patch to all supported branches, like the previous commit.

Discussion: https://postgr.es/m/15883.1504278595@sss.pgh.pa.us
2017-09-01 13:52:53 -04:00
Robert Haas 81c5e46c49 Introduce 64-bit hash functions with a 64-bit seed.
This will be useful for hash partitioning, which needs a way to seed
the hash functions to avoid problems such as a hash index on a hash
partitioned table clumping all values into a small portion of the
bucket space; it's also useful for anything that wants a 64-bit hash
value rather than a 32-bit hash value.

Just in case somebody wants a 64-bit hash value that is compatible
with the existing 32-bit hash values, make the low 32-bits of the
64-bit hash value match the 32-bit hash value when the seed is 0.

Robert Haas and Amul Sul

Discussion: http://postgr.es/m/CA+Tgmoafx2yoJuhCQQOL5CocEi-w_uG4S2xT0EtgiJnPGcHW3g@mail.gmail.com
2017-08-31 22:21:21 -04:00
Robert Haas bf11e7ee2e Propagate sort instrumentation from workers back to leader.
Up until now, when parallel query was used, no details about the
sort method or space used by the workers were available; details
were shown only for any sorting done by the leader.  Fix that.

Commit 1177ab1dab forced the test case
added by commit 1f6d515a67 to run
without parallelism; now that we have this infrastructure, allow
that again, with a little tweaking to make it pass with and without
force_parallel_mode.

Robert Haas and Tom Lane

Discussion: http://postgr.es/m/CA+Tgmoa2VBZW6S8AAXfhpHczb=Rf6RqQ2br+zJvEgwJ0uoD_tQ@mail.gmail.com
2017-08-29 13:26:33 -04:00
Andres Freund 0052a0243d Add a hash_combine function for mixing hash values.
This hash function is derived from Boost's function of the same name.

Author: Andres Freund, Thomas Munro
Discussion: https://postgr.es/m/CAEepm%3D3rdgjfxW4cKvJ0OEmya2-34B0qHNG1xV0vK7TGPJGMUQ%40mail.gmail.com
Discussion: https://postgr.es/m/20170731210844.3cwrkmsmbbpt4rjc%40alap3.anarazel.de
2017-08-22 15:36:49 -07:00
Tom Lane 4867d7f62f Avoid out-of-memory in a hash join with many duplicate inner keys.
The executor is capable of splitting buckets during a hash join if
too much memory is being used by a small number of buckets.  However,
this only helps if a bucket's population is actually divisible; if
all the hash keys are alike, the tuples still end up in the same
new bucket.  This can result in an OOM failure if there are enough
inner keys with identical hash values.  The planner's cost estimates
will bias it against choosing a hash join in such situations, but not
by so much that it will never do so.  To mitigate the OOM hazard,
explicitly estimate the hash bucket space needed by just the inner
side's most common value, and if that would exceed work_mem then
add disable_cost to the hash cost estimate.

This approach doesn't account for the possibility that two or more
common values would share the same hash value.  On the other hand,
work_mem is normally a fairly conservative bound, so that eating
two or more times that much space is probably not going to kill us.

If we have no stats about the inner side, ignore this consideration.
There was some discussion of making a conservative assumption, but that
would effectively result in disabling hash join whenever we lack stats,
which seems like an overreaction given how seldom the problem manifests
in the field.

Per a complaint from David Hinkle.  Although this could be viewed
as a bug fix, the lack of similar complaints weighs against back-
patching; indeed we waited for v11 because it seemed already rather
late in the v10 cycle to be making plan choice changes like this one.

Discussion: https://postgr.es/m/32013.1487271761@sss.pgh.pa.us
2017-08-15 14:05:53 -04:00
Tom Lane 004a9702e0 Remove AtEOXact_CatCache().
The sole useful effect of this function, to check that no catcache
entries have positive refcounts at transaction end, has really been
obsolete since we introduced ResourceOwners in PG 8.1.  We reduced the
checks to assertions years ago, so that the function was a complete
no-op in production builds.  There have been previous discussions about
removing it entirely, but consensus up to now was that it had some small
value as a cross-check for bugs in the ResourceOwner logic.

However, it now emerges that it's possible to trigger these assertions
if you hit an assert-enabled backend with SIGTERM during a call to
SearchCatCacheList, because that function temporarily increases the
refcounts of entries it's intending to add to a catcache list construct.
In a normal ERROR scenario, the extra refcounts are cleaned up by
SearchCatCacheList's PG_CATCH block; but in a FATAL exit we do a
transaction abort and exit without ever executing PG_CATCH handlers.

There's a case to be made that this is a generic hazard and we should
consider restructuring elog(FATAL) handling so that pending PG_CATCH
handlers do get run.  That's pretty scary though: it could easily create
more problems than it solves.  Preliminary stress testing by Andreas
Seltenreich suggests that there are not many live problems of this ilk,
so we rejected that idea.

There are more-localized ways to fix the problem; the most principled
one would be to use PG_ENSURE_ERROR_CLEANUP instead of plain PG_TRY.
But adding cycles to SearchCatCacheList isn't very appealing.  We could
also weaken the assertions in AtEOXact_CatCache in some more or less
ad-hoc way, but that just makes its raison d'etre even less compelling.
In the end, the most reasonable solution seems to be to just remove
AtEOXact_CatCache altogether, on the grounds that it's not worth trying
to fix it.  It hasn't found any bugs for us in many years.

Per report from Jeevan Chalke.  Back-patch to all supported branches.

Discussion: https://postgr.es/m/CAM2+6=VEE30YtRQCZX7_sCFsEpoUkFBV1gZazL70fqLn8rcvBA@mail.gmail.com
2017-08-13 16:15:14 -04:00
Peter Eisentraut d6391b03b3 Reject use of ucol_strcollUTF8() before ICU 53
Various bugs can cause crashes, so don't use that function before ICU
53.  It will fall back to the code path used for other encodings.

Since we now tie the function availability to an ICU version, we don't
need the configure test anymore.  That also resolves the issue that the
test result was previously hardcoded for Windows.

researched by Daniel Verite <daniel@manitou-mail.org>, Peter Geoghegan
<pg@bowt.ie>, Tom Lane <tgl@sss.pgh.pa.us>

Discussion: https://www.postgresql.org/message-id/flat/f1438ec6-22aa-4029-9a3b-26f79d330e72%40manitou-mail.org
2017-08-10 22:14:00 -04:00
Robert Haas bb5d6e80b1 Improve the error message when creating an empty range partition.
The previous message didn't mention the name of the table or the
bounds.  Put the table name in the primary error message and the
bounds in the detail message.

Amit Langote, changed slightly by me.  Suggestions on the exac
phrasing from Tom Lane, David G. Johnston, and Dean Rasheed.

Discussion: http://postgr.es/m/CA+Tgmoae6bpwVa-1BMaVcwvCCeOoJ5B9Q9-RHWo-1gJxfPBZ5Q@mail.gmail.com
2017-08-10 13:46:56 -04:00
Tom Lane f13ea95f9e Change pg_ctl to detect server-ready by watching status in postmaster.pid.
Traditionally, "pg_ctl start -w" has waited for the server to become
ready to accept connections by attempting a connection once per second.
That has the major problem that connection issues (for instance, a
kernel packet filter blocking traffic) can't be reliably told apart
from server startup issues, and the minor problem that if server startup
isn't quick, we accumulate "the database system is starting up" spam
in the server log.  We've hacked around many of the possible connection
issues, but it resulted in ugly and complicated code in pg_ctl.c.

In commit c61559ec3, I changed the probe rate to every tenth of a second.
That prompted Jeff Janes to complain that the log-spam problem had become
much worse.  In the ensuing discussion, Andres Freund pointed out that
we could dispense with connection attempts altogether if the postmaster
were changed to report its status in postmaster.pid, which "pg_ctl start"
already relies on being able to read.  This patch implements that, teaching
postmaster.c to report a status string into the pidfile at the same
state-change points already identified as being of interest for systemd
status reporting (cf commit 7d17e683f).  pg_ctl no longer needs to link
with libpq at all; all its functions now depend on reading server files.

In support of this, teach AddToDataDirLockFile() to allow addition of
postmaster.pid lines in not-necessarily-sequential order.  This is needed
on Windows where the SHMEM_KEY line will never be written at all.  We still
have the restriction that we don't want to truncate the pidfile; document
the reasons for that a bit better.

Also, fix the pg_ctl TAP tests so they'll notice if "start -w" mode
is broken --- before, they'd just wait out the sixty seconds until
the loop gives up, and then report success anyway.  (Yes, I found that
out the hard way.)

While at it, arrange for pg_ctl to not need to #include miscadmin.h;
as a rather low-level backend header, requiring that to be compilable
client-side is pretty dubious.  This requires moving the #define's
associated with the pidfile into a new header file, and moving
PG_BACKEND_VERSIONSTR someplace else.  For lack of a clearly better
"someplace else", I put it into port.h, beside the declaration of
find_other_exec(), since most users of that macro are passing the value to
find_other_exec().  (initdb still depends on miscadmin.h, but at least
pg_ctl and pg_upgrade no longer do.)

In passing, fix main.c so that PG_BACKEND_VERSIONSTR actually defines the
output of "postgres -V", which remarkably it had never done before.

Discussion: https://postgr.es/m/CAMkU=1xJW8e+CTotojOMBd-yzUvD0e_JZu2xHo=MnuZ4__m7Pg@mail.gmail.com
2017-06-28 17:31:32 -04:00
Tom Lane b6159202c9 Fix memory leakage in ICU encoding conversion, and other code review.
Callers of icu_to_uchar() neglected to pfree the result string when done
with it.  This results in catastrophic memory leaks in varstr_cmp(),
because of our prevailing assumption that btree comparison functions don't
leak memory.  For safety, make all the call sites clean up leaks, though
I suspect that we could get away without it in formatting.c.  I audited
callers of icu_from_uchar() as well, but found no places that seemed to
have a comparable issue.

Add function API specifications for icu_to_uchar() and icu_from_uchar();
the lack of any thought-through specification is perhaps not unrelated
to the existence of this bug in the first place.  Fix icu_to_uchar()
to guarantee a nul-terminated result; although no existing caller appears
to care, the fact that it would have been nul-terminated except in
extreme corner cases seems ideally designed to bite someone on the rear
someday.  Fix ucnv_fromUChars() destCapacity argument --- in the worst
case, that could perhaps have led to a non-nul-terminated result, too.
Fix icu_from_uchar() to have a more reasonable definition of the function
result --- no callers are actually paying attention, so this isn't a live
bug, but it's certainly sloppily designed.  Const-ify icu_from_uchar()'s
input string for consistency.

That is not the end of what needs to be done to these functions, but
it's as much as I have the patience for right now.

Discussion: https://postgr.es/m/1955.1498181798@sss.pgh.pa.us
2017-06-23 12:22:06 -04:00
Tom Lane 382ceffdf7 Phase 3 of pgindent updates.
Don't move parenthesized lines to the left, even if that means they
flow past the right margin.

By default, BSD indent lines up statement continuation lines that are
within parentheses so that they start just to the right of the preceding
left parenthesis.  However, traditionally, if that resulted in the
continuation line extending to the right of the desired right margin,
then indent would push it left just far enough to not overrun the margin,
if it could do so without making the continuation line start to the left of
the current statement indent.  That makes for a weird mix of indentations
unless one has been completely rigid about never violating the 80-column
limit.

This behavior has been pretty universally panned by Postgres developers.
Hence, disable it with indent's new -lpl switch, so that parenthesized
lines are always lined up with the preceding left paren.

This patch is much less interesting than the first round of indent
changes, but also bulkier, so I thought it best to separate the effects.

Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
2017-06-21 15:35:54 -04:00
Tom Lane c7b8998ebb Phase 2 of pgindent updates.
Change pg_bsd_indent to follow upstream rules for placement of comments
to the right of code, and remove pgindent hack that caused comments
following #endif to not obey the general rule.

Commit e3860ffa4d wasn't actually using
the published version of pg_bsd_indent, but a hacked-up version that
tried to minimize the amount of movement of comments to the right of
code.  The situation of interest is where such a comment has to be
moved to the right of its default placement at column 33 because there's
code there.  BSD indent has always moved right in units of tab stops
in such cases --- but in the previous incarnation, indent was working
in 8-space tab stops, while now it knows we use 4-space tabs.  So the
net result is that in about half the cases, such comments are placed
one tab stop left of before.  This is better all around: it leaves
more room on the line for comment text, and it means that in such
cases the comment uniformly starts at the next 4-space tab stop after
the code, rather than sometimes one and sometimes two tabs after.

Also, ensure that comments following #endif are indented the same
as comments following other preprocessor commands such as #else.
That inconsistency turns out to have been self-inflicted damage
from a poorly-thought-through post-indent "fixup" in pgindent.

This patch is much less interesting than the first round of indent
changes, but also bulkier, so I thought it best to separate the effects.

Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
2017-06-21 15:19:25 -04:00
Tom Lane e3860ffa4d Initial pgindent run with pg_bsd_indent version 2.0.
The new indent version includes numerous fixes thanks to Piotr Stefaniak.
The main changes visible in this commit are:

* Nicer formatting of function-pointer declarations.
* No longer unexpectedly removes spaces in expressions using casts,
  sizeof, or offsetof.
* No longer wants to add a space in "struct structname *varname", as
  well as some similar cases for const- or volatile-qualified pointers.
* Declarations using PG_USED_FOR_ASSERTS_ONLY are formatted more nicely.
* Fixes bug where comments following declarations were sometimes placed
  with no space separating them from the code.
* Fixes some odd decisions for comments following case labels.
* Fixes some cases where comments following code were indented to less
  than the expected column 33.

On the less good side, it now tends to put more whitespace around typedef
names that are not listed in typedefs.list.  This might encourage us to
put more effort into typedef name collection; it's not really a bug in
indent itself.

There are more changes coming after this round, having to do with comment
indentation and alignment of lines appearing within parentheses.  I wanted
to limit the size of the diffs to something that could be reviewed without
one's eyes completely glazing over, so it seemed better to split up the
changes as much as practical.

Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
2017-06-21 14:39:04 -04:00
Tom Lane 54e839fe29 Sort syscache identifiers into alphabetical order.
Not much point in having a convention about this if we don't enforce it.

Mark Dilger

Discussion: https://postgr.es/m/7F67FBEF-C3B3-404E-8EC6-E02ACB15D894@gmail.com
2017-05-30 18:47:13 -04:00
Bruce Momjian a6fd7b7a5f Post-PG 10 beta1 pgindent run
perltidy run not included.
2017-05-17 16:31:56 -04:00
Tom Lane f04c9a6146 Standardize terminology for pg_statistic_ext entries.
Consistently refer to such an entry as a "statistics object", not just
"statistics" or "extended statistics".  Previously we had a mismash of
terms, accompanied by utter confusion as to whether the term was
singular or plural.  That's not only grating (at least to the ear of
a native English speaker) but could be outright misleading, eg in error
messages that seemed to be referring to multiple objects where only one
could be meant.

This commit fixes the code and a lot of comments (though I may have
missed a few).  I also renamed two new SQL functions,
pg_get_statisticsextdef -> pg_get_statisticsobjdef
pg_statistic_ext_is_visible -> pg_statistics_obj_is_visible
to conform better with this terminology.

I have not touched the SGML docs other than fixing those function
names; the docs certainly need work but it seems like a separable task.

Discussion: https://postgr.es/m/22676.1494557205@sss.pgh.pa.us
2017-05-14 10:55:01 -04:00
Tom Lane 9aab83fc50 Redesign get_attstatsslot()/free_attstatsslot() for more safety and speed.
The mess cleaned up in commit da0759600 is clear evidence that it's a
bug hazard to expect the caller of get_attstatsslot()/free_attstatsslot()
to provide the correct type OID for the array elements in the slot.
Moreover, we weren't even getting any performance benefit from that,
since get_attstatsslot() was extracting the real type OID from the array
anyway.  So we ought to get rid of that requirement; indeed, it would
make more sense for get_attstatsslot() to pass back the type OID it found,
in case the caller isn't sure what to expect, which is likely in binary-
compatible-operator cases.

Another problem with the current implementation is that if the stats array
element type is pass-by-reference, we incur a palloc/memcpy/pfree cycle
for each element.  That seemed acceptable when the code was written because
we were targeting O(10) array sizes --- but these days, stats arrays are
almost always bigger than that, sometimes much bigger.  We can save a
significant number of cycles by doing one palloc/memcpy/pfree of the whole
array.  Indeed, in the now-probably-common case where the array is toasted,
that happens anyway so this method is basically free.  (Note: although the
catcache code will inline any out-of-line toasted values, it doesn't
decompress them.  At the other end of the size range, it doesn't expand
short-header datums either.  In either case, DatumGetArrayTypeP would have
to make a copy.  We do end up using an extra array copy step if the element
type is pass-by-value and the array length is neither small enough for a
short header nor large enough to have suffered compression.  But that
seems like a very acceptable price for winning in pass-by-ref cases.)

Hence, redesign to take these insights into account.  While at it,
convert to an API in which we fill a struct rather than passing a bunch
of pointers to individual output arguments.  That will make it less
painful if we ever want further expansion of what get_attstatsslot can
pass back.

It's certainly arguable that this is new development and not something to
push post-feature-freeze.  However, I view it as primarily bug-proofing
and therefore something that's better to have sooner not later.  Since
we aren't quite at beta phase yet, let's put it in.

Discussion: https://postgr.es/m/16364.1494520862@sss.pgh.pa.us
2017-05-13 15:14:39 -04:00
Tom Lane 2df5d46555 Avoid searching for callback functions in CallSyscacheCallbacks().
We have now grown enough registerable syscache-invalidation callback
functions that the original assumption that there would be few of them
is causing performance problems.  In particular, let's fix things so that
CallSyscacheCallbacks doesn't have to search the whole array to find
which callback(s) to invoke for a given cache ID.  Preserve the original
behavior that callbacks are called in order of registration, just in
case there's someplace that depends on that (which I doubt).

In support of this, export the number of syscaches from syscache.h.
People could have found that out anyway from the enum, but adding a
#define makes that much safer.

This provides a useful additional speedup in Mathieu Fenniak's
logical-decoding test case, although we're reaching the point of
diminishing returns there.  I think any further improvement will have
to come from reducing the number of cache invalidations that are
triggered in the first place.  Still, we can hope that this change
gives some incremental benefit for all invalidation scenarios.

Back-patch to 9.4 where logical decoding was introduced.

Discussion: https://postgr.es/m/CAHoiPjzea6N0zuCi=+f9v_j94nfsy6y8SU7-=bp4=7qw6_i=Rg@mail.gmail.com
2017-05-12 19:05:27 -04:00
Tom Lane 50ee1c7462 Avoid searching for the target catcache in CatalogCacheIdInvalidate.
A test case provided by Mathieu Fenniak shows that the initial search for
the target catcache in CatalogCacheIdInvalidate consumes a very significant
amount of overhead in cases where cache invalidation is triggered but has
little useful work to do.  There is no good reason for that search to exist
at all, as the index array maintained by syscache.c allows direct lookup of
the catcache from its ID.  We just need a frontend function in syscache.c,
matching the division of labor for most other cache-accessing operations.

While there's more that can be done in this area, this patch alone reduces
the runtime of Mathieu's example by 2X.  We can hope that it offers some
useful benefit in other cases too, although usually cache invalidation
overhead is not such a striking fraction of the total runtime.

Back-patch to 9.4 where logical decoding was introduced.  It might be
worth going further back, but presently the only case we know of where
cache invalidation is really a significant burden is in logical decoding.
Also, older branches have fewer catcaches, reducing the possible benefit.

(Note: although this nominally changes catcache's API, we have always
documented CatalogCacheIdInvalidate as a private function, so I would
have little sympathy for an external module calling it directly.  So
backpatching should be fine.)

Discussion: https://postgr.es/m/CAHoiPjzea6N0zuCi=+f9v_j94nfsy6y8SU7-=bp4=7qw6_i=Rg@mail.gmail.com
2017-05-12 18:17:29 -04:00
Peter Eisentraut e2d4ef8de8 Add security checks to selectivity estimation functions
Some selectivity estimation functions run user-supplied operators over
data obtained from pg_statistic without security checks, which allows
those operators to leak pg_statistic data without having privileges on
the underlying tables.  Fix by checking that one of the following is
satisfied: (1) the user has table or column privileges on the table
underlying the pg_statistic data, or (2) the function implementing the
user-supplied operator is leak-proof.  If neither is satisfied, planning
will proceed as if there are no statistics available.

At least one of these is satisfied in most cases in practice.  The only
situations that are negatively impacted are user-defined or
not-leak-proof operators on a security-barrier view.

Reported-by: Robert Haas <robertmhaas@gmail.com>
Author: Peter Eisentraut <peter_e@gmx.net>
Author: Tom Lane <tgl@sss.pgh.pa.us>

Security: CVE-2017-7484
2017-05-08 09:26:32 -04:00
Tom Lane 23c6eb0336 Remove create_singleton_array(), hard-coding the case in its sole caller.
create_singleton_array() was not really as useful as we perhaps thought
when we added it.  It had never accreted more than one call site, and is
only saving a dozen lines of code at that one, which is considerably less
bulk than the function itself.  Moreover, because of its insistence on
using the caller's fn_extra cache space, it's arguably a coding hazard.
text_to_array_internal() does not currently use fn_extra in any other way,
but if it did it would be subtly broken, since the conflicting fn_extra
uses could be needed within a single query, in the seldom-tested case that
the field separator varies during the query.  The same objection seems
likely to apply to any other potential caller.

The replacement code is a bit uglier, because it hardwires knowledge of
the storage parameters of type TEXT, but it's not like we haven't got
dozens or hundreds of other places that do the same.  Uglier seems like
a good tradeoff for smaller, faster, and safer.

Per discussion with Neha Khatri.

Discussion: https://postgr.es/m/CAFO0U+_fS5SRhzq6uPG+4fbERhoA9N2+nPrtvaC9mmeWivxbsA@mail.gmail.com
2017-05-02 20:41:37 -04:00
Fujii Masao ff7bce1743 Add max_sync_workers_per_subscription to postgresql.conf.sample.
This commit also does

- add REPLICATION_SUBSCRIBERS into config_group
- mark max_logical_replication_workers and max_sync_workers_per_subscription
  as REPLICATION_SUBSCRIBERS parameters
- move those parameters into "Subscribers" section in postgresql.conf.sample

Author: Masahiko Sawada, Petr Jelinek and me
Reported-by: Masahiko Sawada
Discussion: http://postgr.es/m/CAD21AoAonSCoa=v=87ZO3vhfUZA1k_E2XRNHTt=xioWGUa+0ug@mail.gmail.com
2017-04-12 00:10:54 +09:00
Andres Freund fa117ee403 Allow avoiding tuple copy within tuplesort_gettupleslot().
Add a "copy" argument to make it optional to receive a copy of caller
tuple that is safe to use following a subsequent manipulating of
tuplesort's state.  This is a performance optimization.  Most existing
tuplesort_gettupleslot() callers are made to opt out of copying.
Existing callers that happen to rely on the validity of tuple memory
beyond subsequent manipulations of the tuplesort request their own
copy.

This brings tuplesort_gettupleslot() in line with
tuplestore_gettupleslot().  In the future, a "copy"
tuplesort_getdatum() argument may be added, that similarly allows
callers to opt out of receiving their own copy of tuple.

In passing, clarify assumptions that callers of other tuplesort fetch
routines may make about tuple memory validity, per gripe from Tom
Lane.

Author: Peter Geoghegan
Discussion: CAM3SWZQWZZ_N=DmmL7tKy_OUjGH_5mN=N=A6h7kHyyDvEhg2DA@mail.gmail.com
2017-04-06 14:48:59 -07:00
Simon Riggs cd0cebaf7d Always SnapshotResetXmin() during ClearTransaction()
Avoid corner cases during 2PC with 6bad580d9e
2017-04-06 10:30:22 -04:00
Peter Eisentraut 3217327053 Identity columns
This is the SQL standard-conforming variant of PostgreSQL's serial
columns.  It fixes a few usability issues that serial columns have:

- CREATE TABLE / LIKE copies default but refers to same sequence
- cannot add/drop serialness with ALTER TABLE
- dropping default does not drop sequence
- need to grant separate privileges to sequence
- other slight weirdnesses because serial is some kind of special macro

Reviewed-by: Vitaly Burovoy <vitaly.burovoy@gmail.com>
2017-04-06 08:41:37 -04:00
Simon Riggs 6bad580d9e Avoid SnapshotResetXmin() during AtEOXact_Snapshot()
For normal commits and aborts we already reset PgXact->xmin,
so we can simply avoid running SnapshotResetXmin() twice.

During performance tests by Alexander Korotkov, diagnosis
by Andres Freund showed PgXact array as a bottleneck. After
manual analysis by me of the code paths that touch those
memory locations, I was able to identify extraneous code
in the main transaction commit path.

Avoiding touching highly contented shmem improves concurrent
performance slightly on all workloads, confirmed by tests
run by Ashutosh Sharma and Alexander Korotkov.

Simon Riggs

Discussion: CANP8+jJdXE9b+b9F8CQT-LuxxO0PBCB-SZFfMVAdp+akqo4zfg@mail.gmail.com
2017-04-06 08:31:52 -04:00
Simon Riggs 9a3215026b Make min_wal_size/max_wal_size use MB internally
Previously they were defined using multiples of XLogSegSize.
Remove GUC_UNIT_XSEGS. Introduce GUC_UNIT_MB

Extracted from patch series on XLogSegSize infrastructure.

Beena Emerson
2017-04-04 18:00:01 -04:00
Robert Haas ea69a0dead Expand hash indexes more gradually.
Since hash indexes typically have very few overflow pages, adding a
new splitpoint essentially doubles the on-disk size of the index,
which can lead to large and abrupt increases in disk usage (and
perhaps long delays on occasion).  To mitigate this problem to some
degree, divide larger splitpoints into four equal phases.  This means
that, for example, instead of growing from 4GB to 8GB all at once, a
hash index will now grow from 4GB to 5GB to 6GB to 7GB to 8GB, which
is perhaps still not as smooth as we'd like but certainly an
improvement.

This changes the on-disk format of the metapage, so bump HASH_VERSION
from 2 to 3.  This will force a REINDEX of all existing hash indexes,
but that's probably a good idea anyway.  First, hash indexes from
pre-10 versions of PostgreSQL could easily be corrupted, and we don't
want to confuse corruption carried over from an older release with any
corruption caused despite the new write-ahead logging in v10.  Second,
it will let us remove some backward-compatibility code added by commit
293e24e507.

Mithun Cy, reviewed by Amit Kapila, Jesper Pedersen and me.  Regression
test outputs updated by me.

Discussion: http://postgr.es/m/CAD__OuhG6F1gQLCgMQNnMNgoCvOLQZz9zKYJQNYvYmmJoM42gA@mail.gmail.com
Discussion: http://postgr.es/m/CA+TgmoYty0jCf-pa+m+vYUJ716+AxM7nv_syvyanyf5O-L_i2A@mail.gmail.com
2017-04-03 23:46:33 -04:00
Kevin Grittner 41bd155dd6 Fix two undocumented parameters to functions from ENR patch.
On ProcessUtility document the parameter, to match others.

On CreateCachedPlan drop the queryEnv parameter.  It was not
referenced within the function, and had been added on the
assumption that with some unknown future usage of QueryEnvironment
it might be useful to do something there.  We have avoided other
"just in case" implementation of unused paramters, so drop it here.

Per gripe from Tom Lane
2017-04-01 15:21:05 -05:00
Kevin Grittner 18ce3a4ab2 Add infrastructure to support EphemeralNamedRelation references.
A QueryEnvironment concept is added, which allows new types of
objects to be passed into queries from parsing on through
execution.  At this point, the only thing implemented is a
collection of EphemeralNamedRelation objects -- relations which
can be referenced by name in queries, but do not exist in the
catalogs.  The only type of ENR implemented is NamedTuplestore, but
provision is made to add more types fairly easily.

An ENR can carry its own TupleDesc or reference a relation in the
catalogs by relid.

Although these features can be used without SPI, convenience
functions are added to SPI so that ENRs can easily be used by code
run through SPI.

The initial use of all this is going to be transition tables in
AFTER triggers, but that will be added to each PL as a separate
commit.

An incidental effect of this patch is to produce a more informative
error message if an attempt is made to modify the contents of a CTE
from a referencing DML statement.  No tests previously covered that
possibility, so one is added.

Kevin Grittner and Thomas Munro
Reviewed by Heikki Linnakangas, David Fetter, and Thomas Munro
with valuable comments and suggestions from many others
2017-03-31 23:17:18 -05:00
Andrew Dunstan c80b9920fc Transform or iterate over json(b) string values
Dmitry Dolgov, reviewed and lightly edited by me.
2017-03-31 14:25:25 -04:00
Peter Eisentraut 895f93701f Fix cpluspluscheck warning
Structure tag cannot be the same as a typedef that is a pointer to that
struct.
2017-03-26 18:31:05 -04:00
Andres Freund b8d7f053c5 Faster expression evaluation and targetlist projection.
This replaces the old, recursive tree-walk based evaluation, with
non-recursive, opcode dispatch based, expression evaluation.
Projection is now implemented as part of expression evaluation.

This both leads to significant performance improvements, and makes
future just-in-time compilation of expressions easier.

The speed gains primarily come from:
- non-recursive implementation reduces stack usage / overhead
- simple sub-expressions are implemented with a single jump, without
  function calls
- sharing some state between different sub-expressions
- reduced amount of indirect/hard to predict memory accesses by laying
  out operation metadata sequentially; including the avoidance of
  nearly all of the previously used linked lists
- more code has been moved to expression initialization, avoiding
  constant re-checks at evaluation time

Future just-in-time compilation (JIT) has become easier, as
demonstrated by released patches intended to be merged in a later
release, for primarily two reasons: Firstly, due to a stricter split
between expression initialization and evaluation, less code has to be
handled by the JIT. Secondly, due to the non-recursive nature of the
generated "instructions", less performance-critical code-paths can
easily be shared between interpreted and compiled evaluation.

The new framework allows for significant future optimizations. E.g.:
- basic infrastructure for to later reduce the per executor-startup
  overhead of expression evaluation, by caching state in prepared
  statements.  That'd be helpful in OLTPish scenarios where
  initialization overhead is measurable.
- optimizing the generated "code". A number of proposals for potential
  work has already been made.
- optimizing the interpreter. Similarly a number of proposals have
  been made here too.

The move of logic into the expression initialization step leads to some
backward-incompatible changes:
- Function permission checks are now done during expression
  initialization, whereas previously they were done during
  execution. In edge cases this can lead to errors being raised that
  previously wouldn't have been, e.g. a NULL array being coerced to a
  different array type previously didn't perform checks.
- The set of domain constraints to be checked, is now evaluated once
  during expression initialization, previously it was re-built
  every time a domain check was evaluated. For normal queries this
  doesn't change much, but e.g. for plpgsql functions, which caches
  ExprStates, the old set could stick around longer.  The behavior
  around might still change.

Author: Andres Freund, with significant changes by Tom Lane,
	changes by Heikki Linnakangas
Reviewed-By: Tom Lane, Heikki Linnakangas
Discussion: https://postgr.es/m/20161206034955.bh33paeralxbtluv@alap3.anarazel.de
2017-03-25 14:52:06 -07:00
Simon Riggs 3428ef7911 Reverting 42b4b0b241
Buildfarm issues and other reported issues
2017-03-24 17:56:17 +00:00
Alvaro Herrera 7b504eb282 Implement multivariate n-distinct coefficients
Add support for explicitly declared statistic objects (CREATE
STATISTICS), allowing collection of statistics on more complex
combinations that individual table columns.  Companion commands DROP
STATISTICS and ALTER STATISTICS ... OWNER TO / SET SCHEMA / RENAME are
added too.  All this DDL has been designed so that more statistic types
can be added later on, such as multivariate most-common-values and
multivariate histograms between columns of a single table, leaving room
for permitting columns on multiple tables, too, as well as expressions.

This commit only adds support for collection of n-distinct coefficient
on user-specified sets of columns in a single table.  This is useful to
estimate number of distinct groups in GROUP BY and DISTINCT clauses;
estimation errors there can cause over-allocation of memory in hashed
aggregates, for instance, so it's a worthwhile problem to solve.  A new
special pseudo-type pg_ndistinct is used.

(num-distinct estimation was deemed sufficiently useful by itself that
this is worthwhile even if no further statistic types are added
immediately; so much so that another version of essentially the same
functionality was submitted by Kyotaro Horiguchi:
https://postgr.es/m/20150828.173334.114731693.horiguchi.kyotaro@lab.ntt.co.jp
though this commit does not use that code.)

Author: Tomas Vondra.  Some code rework by Álvaro.
Reviewed-by: Dean Rasheed, David Rowley, Kyotaro Horiguchi, Jeff Janes,
    Ideriha Takeshi
Discussion: https://postgr.es/m/543AFA15.4080608@fuzzy.cz
    https://postgr.es/m/20170320190220.ixlaueanxegqd5gr@alvherre.pgsql
2017-03-24 14:06:10 -03:00
Simon Riggs 42b4b0b241 Avoid SnapshotResetXmin() during AtEOXact_Snapshot()
For normal commits and aborts we already reset PgXact->xmin
Avoiding touching highly contented shmem improves concurrent
performance.

Simon Riggs

Discussion: CANP8+jJdXE9b+b9F8CQT-LuxxO0PBCB-SZFfMVAdp+akqo4zfg@mail.gmail.com
2017-03-24 14:20:59 +00:00
Tom Lane 457a444873 Avoid syntax error on platforms that have neither LOCALE_T nor ICU.
Buildfarm member anole sees this union as empty, and doesn't like it.
2017-03-23 23:18:52 -04:00
Peter Eisentraut eccfef81e1 ICU support
Add a column collprovider to pg_collation that determines which library
provides the collation data.  The existing choices are default and libc,
and this adds an icu choice, which uses the ICU4C library.

The pg_locale_t type is changed to a union that contains the
provider-specific locale handles.  Users of locale information are
changed to look into that struct for the appropriate handle to use.

Also add a collversion column that records the version of the collation
when it is created, and check at run time whether it is still the same.
This detects potentially incompatible library upgrades that can corrupt
indexes and other structures.  This is currently only supported by
ICU-provided collations.

initdb initializes the default collation set as before from the `locale
-a` output but also adds all available ICU locales with a "-x-icu"
appended.

Currently, ICU-provided collations can only be explicitly named
collations.  The global database locales are still always libc-provided.

ICU support is enabled by configure --with-icu.

Reviewed-by: Thomas Munro <thomas.munro@enterprisedb.com>
Reviewed-by: Andreas Karlsson <andreas@proxel.se>
2017-03-23 15:28:48 -04:00
Robert Haas 691b8d5928 Allow for parallel execution whenever ExecutorRun() is done only once.
Previously, it was unsafe to execute a plan in parallel if
ExecutorRun() might be called with a non-zero row count.  However,
it's quite easy to fix things up so that we can support that case,
provided that it is known that we will never call ExecutorRun() a
second time for the same QueryDesc.  Add infrastructure to signal
this, and cross-checks to make sure that a caller who claims this is
true doesn't later reneg.

While that pattern never happens with queries received directly from a
client -- there's no way to know whether multiple Execute messages
will be sent unless the first one requests all the rows -- it's pretty
common for queries originating from procedural languages, which often
limit the result to a single tuple or to a user-specified number of
tuples.

This commit doesn't actually enable parallelism in any additional
cases, because currently none of the places that would be able to
benefit from this infrastructure pass CURSOR_OPT_PARALLEL_OK in the
first place, but it makes it much more palatable to pass
CURSOR_OPT_PARALLEL_OK in places where we currently don't, because it
eliminates some cases where we'd end up having to run the parallel
plan serially.

Patch by me, based on some ideas from Rafia Sabih and corrected by
Rafia Sabih based on feedback from Dilip Kumar and myself.

Discussion: http://postgr.es/m/CA+TgmobXEhvHbJtWDuPZM9bVSLiTj-kShxQJ2uM5GPDze9fRYA@mail.gmail.com
2017-03-23 13:14:36 -04:00
Peter Eisentraut 7c4f52409a Logical replication support for initial data copy
Add functionality for a new subscription to copy the initial data in the
tables and then sync with the ongoing apply process.

For the copying, add a new internal COPY option to have the COPY source
data provided by a callback function.  The initial data copy works on
the subscriber by receiving COPY data from the publisher and then
providing it locally into a COPY that writes to the destination table.

A WAL receiver can now execute full SQL commands.  This is used here to
obtain information about tables and publications.

Several new options were added to CREATE and ALTER SUBSCRIPTION to
control whether and when initial table syncing happens.

Change pg_dump option --no-create-subscription-slots to
--no-subscription-connect and use the new CREATE SUBSCRIPTION
... NOCONNECT option for that.

Author: Petr Jelinek <petr.jelinek@2ndquadrant.com>
Tested-by: Erik Rijkers <er@xs4all.nl>
2017-03-23 08:55:37 -04:00
Stephen Frost c7a9fa399d Add support for EUI-64 MAC addresses as macaddr8
This adds in support for EUI-64 MAC addresses by adding a new data type
called 'macaddr8' (using our usual convention of indicating the number
of bytes stored).

This was largely a copy-and-paste from the macaddr data type, with
appropriate adjustments for having 8 bytes instead of 6 and adding
support for converting a provided EUI-48 (6 byte format) to the EUI-64
format.  Conversion from EUI-48 to EUI-64 inserts FFFE as the 4th and
5th bytes but does not perform the IPv6 modified EUI-64 action of
flipping the 7th bit, but we add a function to perform that specific
action for the user as it may be commonly done by users who wish to
calculate their IPv6 address based on their network prefix and 48-bit
MAC address.

Author: Haribabu Kommi, with a good bit of rework of macaddr8_in by me.
Reviewed by: Vitaly Burovoy, Kuntal Ghosh

Discussion: https://postgr.es/m/CAJrrPGcUi8ZH+KkK+=TctNQ+EfkeCEHtMU_yo1mvX8hsk_ghNQ@mail.gmail.com
2017-03-15 11:16:25 -04:00
Peter Eisentraut f97a028d8e Spelling fixes in code comments
From: Josh Soref <jsoref@gmail.com>
2017-03-14 12:58:39 -04:00
Noah Misch 9d7726c2ba Recommend wrappers of PG_DETOAST_DATUM_PACKED().
When commit 3e23b68dac introduced
single-byte varlena headers, its fmgr.h changes presented
PG_GETARG_TEXT_PP() and PG_GETARG_TEXT_P() as equals.  Its postgres.h
changes presented PG_DETOAST_DATUM_PACKED() and VARDATA_ANY() as the
exceptional case.  Now, instead, firmly recommend PG_GETARG_TEXT_PP()
over PG_GETARG_TEXT_P(); likewise for other ...PP() macros.  This shaves
cycles and invites consistency of style.
2017-03-12 19:35:33 -04:00
Tom Lane 56018bf26e contrib/amcheck needs RecentGlobalXmin to be PGDLLIMPORT'ified.
Per buildfarm.  Maybe some of the other xmin variables in snapmgr.h
ought to get this too, but for the moment I'm just interested in
un-breaking the buildfarm.
2017-03-09 22:55:46 -05:00
Alvaro Herrera fcec6caafa Support XMLTABLE query expression
XMLTABLE is defined by the SQL/XML standard as a feature that allows
turning XML-formatted data into relational form, so that it can be used
as a <table primary> in the FROM clause of a query.

This new construct provides significant simplicity and performance
benefit for XML data processing; what in a client-side custom
implementation was reported to take 20 minutes can be executed in 400ms
using XMLTABLE.  (The same functionality was said to take 10 seconds
using nested PostgreSQL XPath function calls, and 5 seconds using
XMLReader under PL/Python).

The implemented syntax deviates slightly from what the standard
requires.  First, the standard indicates that the PASSING clause is
optional and that multiple XML input documents may be given to it; we
make it mandatory and accept a single document only.  Second, we don't
currently support a default namespace to be specified.

This implementation relies on a new executor node based on a hardcoded
method table.  (Because the grammar is fixed, there is no extensibility
in the current approach; further constructs can be implemented on top of
this such as JSON_TABLE, but they require changes to core code.)

Author: Pavel Stehule, Álvaro Herrera
Extensively reviewed by: Craig Ringer
Discussion: https://postgr.es/m/CAFj8pRAgfzMD-LoSmnMGybD0WsEznLHWap8DO79+-GTRAPR4qA@mail.gmail.com
2017-03-08 12:40:26 -03:00
Robert Haas d9528604cc Remove inclusion of postgres.h from a few header files.
Thomas Munro, per project policy articuled by Andres Freund and
Tom Lane.

Discussion: http://postgr.es/m/CAEepm=2zCoeq3QxVwhS5DFeUh=yU6z81pbWMgfOB8OzyiBwxzw@mail.gmail.com
2017-03-08 08:18:12 -05:00
Robert Haas 5a73e17317 Improve error reporting for tuple-routing failures.
Currently, the whole row is shown without column names.  Instead,
adopt a style similar to _bt_check_unique() in ExecFindPartition()
and show the failing key: (key1, ...) = (val1, ...).

Amit Langote, per a complaint from Simon Riggs.  Reviewed by me;
I also adjusted the grammar in one of the comments.

Discussion: http://postgr.es/m/9f9dc7ae-14f0-4a25-5485-964d9bfc19bd@lab.ntt.co.jp
2017-03-03 09:09:52 +05:30
Peter Eisentraut 20f6d74242 Collect duplicate copies of oid_cmp() 2017-03-01 11:55:28 -05:00
Andres Freund 7e3aa03b41 Reduce size of common allocation header.
The new slab allocator needs different per-allocation information than
the classical aset.c.  The definition in 58b25e981 wasn't sufficiently
careful on 32 platforms with 8 byte alignment, leading to buildfarm
failures.  That's not entirely easy to fix by just adjusting the
definition.

As slab.c doesn't actually need the size part(s) of the common header,
all chunks are equally sized after all, it seems better to instead
reduce the header to the part needed by all allocators, namely which
context an allocation belongs to. That has the advantage of reducing
the overhead of slab allocations, and also allows for more flexibility
in future allocators.

To avoid spreading the logic about accessing a chunk's context around,
centralize it in GetMemoryChunkContext(), which allows to delete a
good number of lines.

A followup commit will revise the mmgr/README portion about
StandardChunkHeader, and more.

Author: Andres Freund
Discussion: https://postgr.es/m/20170228074420.aazv4iw6k562mnxg@alap3.anarazel.de
2017-02-28 19:42:44 -08:00
Peter Eisentraut 2ed193c904 chomp PQerrorMessage() in backend uses
PQerrorMessage() returns an error message with a trailing newline, but
in backend use (dblink, postgres_fdw, libpqwalreceiver), we want to have
the error message without that for emitting via ereport().  To simplify
that, add a function pchomp() that returns a pstrdup'ed string with the
trailing newline characters removed.
2017-02-27 08:54:51 -05:00
Andres Freund 58b25e9810 Add "Slab" MemoryContext implementation for efficient equal-sized allocations.
The default general purpose aset.c style memory context is not a great
choice for allocations that are all going to be evenly sized,
especially when those objects aren't small, and have varying
lifetimes.  There tends to be a lot of fragmentation, larger
allocations always directly go to libc rather than have their cost
amortized over several pallocs.

These problems lead to the introduction of ad-hoc slab allocators in
reorderbuffer.c. But it turns out that the simplistic implementation
leads to problems when a lot of objects are allocated and freed, as
aset.c is still the underlying implementation. Especially freeing can
easily run into O(n^2) behavior in aset.c.

While the O(n^2) behavior in aset.c can, and probably will, be
addressed, custom allocators for this behavior are more efficient
both in space and time.

This allocator is for evenly sized allocations, and supports both
cheap allocations and freeing, without fragmenting significantly.  It
does so by allocating evenly sized blocks via malloc(), and carves
them into chunks that can be used for allocations.  In order to
release blocks to the OS as early as possible, chunks are allocated
from the fullest block that still has free objects, increasing the
likelihood of a block being entirely unused.

A subsequent commit uses this in reorderbuffer.c, but a further
allocator is needed to resolve the performance problems triggering
this work.

There likely are further potentialy uses of this allocator besides
reorderbuffer.c.

There's potential further optimizations of the new slab.c, in
particular the array of freelists could be replaced by a more
intelligent structure - but for now this looks more than good enough.

Author: Tomas Vondra, editorialized by Andres Freund
Reviewed-By: Andres Freund, Petr Jelinek, Robert Haas, Jim Nasby
Discussion: https://postgr.es/m/d15dff83-0b37-28ed-0809-95a5cc7292ad@2ndquadrant.com
2017-02-27 03:41:44 -08:00
Andres Freund bfd12cccbd Make useful infrastructure from aset.c generally available.
An upcoming patch introduces a new type of memory context. To avoid
duplicating debugging infrastructure within aset.c, move useful pieces
to memdebug.[ch].

While touching aset.c, fix printf format code in AllocFree* debug
macros.

Author: Tomas Vondra
Reviewed-By: Andres Freund
Discussion: https://postgr.es/m/b3b2245c-b37a-e1e5-ebc4-857c914bc747@2ndquadrant.com
2017-02-27 03:41:44 -08:00
Tom Lane 9e3755ecb2 Remove useless duplicate inclusions of system header files.
c.h #includes a number of core libc header files, such as <stdio.h>.
There's no point in re-including these after having read postgres.h,
postgres_fe.h, or c.h; so remove code that did so.

While at it, also fix some places that were ignoring our standard pattern
of "include postgres[_fe].h, then system header files, then other Postgres
header files".  While there's not any great magic in doing it that way
rather than system headers last, it's silly to have just a few files
deviating from the general pattern.  (But I didn't attempt to enforce this
globally, only in files I was touching anyway.)

I'd be the first to say that this is mostly compulsive neatnik-ism,
but over time it might save enough compile cycles to be useful.
2017-02-25 16:12:55 -05:00
Tom Lane c29aff959d Consistently declare timestamp variables as TimestampTz.
Twiddle the replication-related code so that its timestamp variables
are declared TimestampTz, rather than the uninformative "int64" that
was previously used for meant-to-be-always-integer timestamps.
This resolves the int64-vs-TimestampTz declaration inconsistencies
introduced by commit 7c030783a, though in the opposite direction to
what was originally suggested.

This required including datatype/timestamp.h in a couple more places
than before.  I decided it would be a good idea to slim down that
header by not having it pull in <float.h> etc, as those headers are
no longer at all relevant to its purpose.  Unsurprisingly, a small number
of .c files turn out to have been depending on those inclusions, so add
them back in the .c files as needed.

Discussion: https://postgr.es/m/26788.1487455319@sss.pgh.pa.us
Discussion: https://postgr.es/m/27694.1487456324@sss.pgh.pa.us
2017-02-23 15:57:08 -05:00
Tom Lane b9d092c962 Remove now-dead code for !HAVE_INT64_TIMESTAMP.
This is a basically mechanical removal of #ifdef HAVE_INT64_TIMESTAMP
tests and the negative-case controlled code.

Discussion: https://postgr.es/m/26788.1487455319@sss.pgh.pa.us
2017-02-23 14:04:43 -05:00
Robert Haas 16be2fd100 Make dsa_allocate interface more like MemoryContextAlloc.
A new function dsa_allocate_extended now takes flags which indicate
that huge allocations should be permitted, that out-of-memory
conditions should not throw an error, and/or that the returned memory
should be zero-filled, just like MemoryContextAllocateExtended.

Commit 9acb85597f, which added
dsa_allocate0, was broken because it failed to account for the
possibility that dsa_allocate() might return InvalidDsaPointer.
This fixes that problem along the way.

Thomas Munro, with some comment changes by me.

Discussion: http://postgr.es/m/CA+Tgmobt7CcF_uQP2UQwWmu4K9qCHehMJP9_9m1urwP8hbOeHQ@mail.gmail.com
2017-02-19 13:59:53 +05:30
Robert Haas 9acb85597f Add new function dsa_allocate0.
This does the same thing as dsa_allocate, except that the memory
is guaranteed to be zero-filled on return.

Dilip Kumar, adjusted by me.
2017-02-16 12:57:03 -05:00
Robert Haas 5262f7a4fc Add optimizer and executor support for parallel index scans.
In combination with 569174f1be, which
taught the btree AM how to perform parallel index scans, this allows
parallel index scan plans on btree indexes.  This infrastructure
should be general enough to support parallel index scans for other
index AMs as well, if someone updates them to support parallel
scans.

Amit Kapila, reviewed and tested by Anastasia Lubennikova, Tushar
Ahuja, and Haribabu Kommi, and me.
2017-02-15 13:53:24 -05:00
Stephen Frost e54f75722c Handle ALTER EXTENSION ADD/DROP with pg_init_privs
In commit 6c268df, pg_init_privs was added to track the initial
privileges of catalog objects and extensions.  Unfortunately, that
commit didn't include understanding of ALTER EXTENSION ADD/DROP, which
allows the objects associated with an extension to be changed after the
initial CREATE EXTENSION script has been run.

The result of this meant that ACLs for objects added through
ALTER EXTENSION ADD were not recorded into pg_init_privs and we would
end up including those ACLs in pg_dump when we shouldn't have.

This commit corrects that by making sure to have pg_init_privs updated
when ALTER EXTENSION ADD/DROP is run, recording the permissions as they
are at ALTER EXTENSION ADD time, and removing any if/when ALTER
EXTENSION DROP is called.

This issue was pointed out by Moshe Jacobson as commentary on bug #14456
(which was actually a bug about versions prior to 9.6 not handling
custom ACLs on extensions correctly, an issue now addressed with
pg_init_privs in 9.6).

Back-patch to 9.6 where pg_init_privs was introduced.
2017-01-29 23:05:07 -05:00
Tom Lane f7c6246240 Introduce convenience macros to hide JsonbContainer header accesses better.
This improves readability a bit and may make future improvements easier.

In passing, make sure that the JB_ROOT_IS_XXX macros deliver boolean (0/1)
results; the previous coding was a bug hazard, though no actual bugs are
known.

Nikita Glukhov, extended a bit by me

Discussion: https://postgr.es/m/9e21a39c-c1d7-b9b5-44a0-c5345a5029f6@postgrespro.ru
2017-01-25 13:28:38 -05:00
Robert Haas 27cdb3414b Reindent table partitioning code.
We've accumulated quite a bit of stuff with which pgindent is not
quite happy in this code; clean it up to provide a less-annoying base
for future pgindent runs.
2017-01-24 10:20:02 -05:00
Peter Eisentraut f21a563d25 Move some things from builtins.h to new header files
This avoids that builtins.h has to include additional header files.
2017-01-20 20:29:53 -05:00
Peter Eisentraut 665d1fad99 Logical replication
- Add PUBLICATION catalogs and DDL
- Add SUBSCRIPTION catalog and DDL
- Define logical replication protocol and output plugin
- Add logical replication workers

From: Petr Jelinek <petr@2ndquadrant.com>
Reviewed-by: Steve Singer <steve@ssinger.info>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Erik Rijkers <er@xs4all.nl>
Reviewed-by: Peter Eisentraut <peter.eisentraut@2ndquadrant.com>
2017-01-20 09:04:49 -05:00
Peter Eisentraut 063ef8308b Correct include file path
Mistake in 352a24a1f9, not clear why it
worked for some before.
2017-01-17 14:16:59 -05:00
Peter Eisentraut 352a24a1f9 Generate fmgr prototypes automatically
Gen_fmgrtab.pl creates a new file fmgrprotos.h, which contains
prototypes for all functions registered in pg_proc.h.  This avoids
having to manually maintain these prototypes across a random variety of
header files.  It also automatically enforces a correct function
signature, and since there are warnings about missing prototypes, it
will detect functions that are defined but not registered in
pg_proc.h (or otherwise used).

Reviewed-by: Pavel Stehule <pavel.stehule@gmail.com>
2017-01-17 14:06:07 -05:00
Peter Eisentraut 30b9a4495a Remove unnecessary include
Between 6eeb95f0f5 and
7b1c2a0f20, builtins.h contained
additional prototypes that have now been moved elsewhere, so we don't
need to include nodes/parsenodes.h anymore.

Fix some files that were relying on builtins.h implicitly pulling in
some unrelated stuff they needed.

Reviewed-by: Pavel Stehule <pavel.stehule@gmail.com>
2017-01-17 12:35:19 -05:00
Tom Lane ab1f0c8225 Change representation of statement lists, and add statement location info.
This patch makes several changes that improve the consistency of
representation of lists of statements.  It's always been the case
that the output of parse analysis is a list of Query nodes, whatever
the types of the individual statements in the list.  This patch brings
similar consistency to the outputs of raw parsing and planning steps:

* The output of raw parsing is now always a list of RawStmt nodes;
the statement-type-dependent nodes are one level down from that.

* The output of pg_plan_queries() is now always a list of PlannedStmt
nodes, even for utility statements.  In the case of a utility statement,
"planning" just consists of wrapping a CMD_UTILITY PlannedStmt around
the utility node.  This list representation is now used in Portal and
CachedPlan plan lists, replacing the former convention of intermixing
PlannedStmts with bare utility-statement nodes.

Now, every list of statements has a consistent head-node type depending
on how far along it is in processing.  This allows changing many places
that formerly used generic "Node *" pointers to use a more specific
pointer type, thus reducing the number of IsA() tests and casts needed,
as well as improving code clarity.

Also, the post-parse-analysis representation of DECLARE CURSOR is changed
so that it looks more like EXPLAIN, PREPARE, etc.  That is, the contained
SELECT remains a child of the DeclareCursorStmt rather than getting flipped
around to be the other way.  It's now true for both Query and PlannedStmt
that utilityStmt is non-null if and only if commandType is CMD_UTILITY.
That allows simplifying a lot of places that were testing both fields.
(I think some of those were just defensive programming, but in many places,
it was actually necessary to avoid confusing DECLARE CURSOR with SELECT.)

Because PlannedStmt carries a canSetTag field, we're also able to get rid
of some ad-hoc rules about how to reconstruct canSetTag for a bare utility
statement; specifically, the assumption that a utility is canSetTag if and
only if it's the only one in its list.  While I see no near-term need for
relaxing that restriction, it's nice to get rid of the ad-hocery.

The API of ProcessUtility() is changed so that what it's passed is the
wrapper PlannedStmt not just the bare utility statement.  This will affect
all users of ProcessUtility_hook, but the changes are pretty trivial; see
the affected contrib modules for examples of the minimum change needed.
(Most compilers should give pointer-type-mismatch warnings for uncorrected
code.)

There's also a change in the API of ExplainOneQuery_hook, to pass through
cursorOptions instead of expecting hook functions to know what to pick.
This is needed because of the DECLARE CURSOR changes, but really should
have been done in 9.6; it's unlikely that any extant hook functions
know about using CURSOR_OPT_PARALLEL_OK.

Finally, teach gram.y to save statement boundary locations in RawStmt
nodes, and pass those through to Query and PlannedStmt nodes.  This allows
more intelligent handling of cases where a source query string contains
multiple statements.  This patch doesn't actually do anything with the
information, but a follow-on patch will.  (Passing this information through
cleanly is the true motivation for these changes; while I think this is all
good cleanup, it's unlikely we'd have bothered without this end goal.)

catversion bump because addition of location fields to struct Query
affects stored rules.

This patch is by me, but it owes a good deal to Fabien Coelho who did
a lot of preliminary work on the problem, and also reviewed the patch.

Discussion: https://postgr.es/m/alpine.DEB.2.20.1612200926310.29821@lancre
2017-01-14 16:02:35 -05:00
Tom Lane e2117e4ab0 Fix field order in struct catcache.
Somebody failed to grasp the point of having the #ifdef CATCACHE_STATS
fields at the end of the struct.  Put that back the way it should be,
and add a comment making it more explicit why it should be that way.
2017-01-12 18:59:57 -05:00
Robert Haas 175ff6598e Fix possible crash reading pg_stat_activity.
With the old code, a backend that read pg_stat_activity without ever
having executed a parallel query might see a backend in the midst of
executing one waiting on a DSA LWLock, resulting in a crash.  The
solution is for backends to register the tranche at startup time, not
the first time a parallel query is executed.

Report by Andreas Seltenreich.  Patch by me, reviewed by Thomas Munro.
2017-01-05 12:27:09 -05:00
Bruce Momjian 1d25779284 Update copyright via script for 2017 2017-01-03 13:48:53 -05:00
Peter Eisentraut 2e254130d1 Make more use of RoleSpec struct
Most code was casting this through a generic Node.  By declaring
everything as RoleSpec appropriately, we can remove a bunch of casts and
ad-hoc node type checking.

Reviewed-by: Alvaro Herrera <alvherre@2ndquadrant.com>
2016-12-29 10:49:39 -05:00
Robert Haas e13486eba0 Remove sql_inheritance GUC.
This backward-compatibility GUC is long overdue for removal.

Discussion: http://postgr.es/m/CA+TgmoYe+EG7LdYX6pkcNxr4ygkP4+A=jm9o-CPXyOvRiCNwaQ@mail.gmail.com
2016-12-23 07:35:01 -05:00
Tom Lane cd1b215692 Fix handling of expanded objects in CoerceToDomain and CASE execution.
When the input value to a CoerceToDomain expression node is a read-write
expanded datum, we should pass a read-only pointer to any domain CHECK
expressions and then return the original read-write pointer as the
expression result.  Previously we were blindly passing the same pointer to
all the consumers of the value, making it possible for a function in CHECK
to modify or even delete the expanded value.  (Since a plpgsql function
will absorb a passed-in read-write expanded array as a local variable
value, it will in fact delete the value on exit.)

A similar hazard of passing the same read-write pointer to multiple
consumers exists in domain_check() and in ExecEvalCase, so fix those too.

The fix requires adding MakeExpandedObjectReadOnly calls at the appropriate
places, which is simple enough except that we need to get the data type's
typlen from somewhere.  For the domain cases, solve this by redefining
DomainConstraintRef.tcache as okay for callers to access; there wasn't any
reason for the original convention against that, other than not wanting the
API of typcache.c to be any wider than it had to be.  For CASE, there's
no good solution except to add a syscache lookup during executor start.

Per bug #14472 from Marcos Castedo.  Back-patch to 9.5 where expanded
values were introduced.

Discussion: https://postgr.es/m/15225.1482431619@sss.pgh.pa.us
2016-12-22 15:01:37 -05:00
Heikki Linnakangas 01ec25631f Simplify tape block format.
No more indirect blocks. The blocks form a linked list instead.

This saves some memory, because we don't need to have a buffer in memory to
hold the indirect block (or blocks). To reflect that, TAPE_BUFFER_OVERHEAD
is reduced from 3 to 1 buffer, which allows using more memory for building
the initial runs.

Reviewed by Peter Geoghegan and Robert Haas.

Discussion: https://www.postgresql.org/message-id/34678beb-938e-646e-db9f-a7def5c44ada%40iki.fi
2016-12-22 18:45:00 +02:00
Peter Eisentraut 1753b1b027 Add pg_sequence system catalog
Move sequence metadata (start, increment, etc.) into a proper system
catalog instead of storing it in the sequence heap object.  This
separates the metadata from the sequence data.  Sequence metadata is now
operated on transactionally by DDL commands, whereas previously
rollbacks of sequence-related DDL commands would be ignored.

Reviewed-by: Andreas Karlsson <andreas@proxel.se>
2016-12-20 08:28:18 -05:00
Robert Haas 3856cf9607 Remove should_free arguments to tuplesort routines.
Since commit e94568ecc1, the answer is
always "false", and we do not need to complicate the API by arranging
to return a constant value.

Peter Geoghegan

Discussion: http://postgr.es/m/CAM3SWZQWZZ_N=DmmL7tKy_OUjGH_5mN=N=A6h7kHyyDvEhg2DA@mail.gmail.com
2016-12-12 15:57:35 -05:00
Robert Haas f0e44751d7 Implement table partitioning.
Table partitioning is like table inheritance and reuses much of the
existing infrastructure, but there are some important differences.
The parent is called a partitioned table and is always empty; it may
not have indexes or non-inherited constraints, since those make no
sense for a relation with no data of its own.  The children are called
partitions and contain all of the actual data.  Each partition has an
implicit partitioning constraint.  Multiple inheritance is not
allowed, and partitioning and inheritance can't be mixed.  Partitions
can't have extra columns and may not allow nulls unless the parent
does.  Tuples inserted into the parent are automatically routed to the
correct partition, so tuple-routing ON INSERT triggers are not needed.
Tuple routing isn't yet supported for partitions which are foreign
tables, and it doesn't handle updates that cross partition boundaries.

Currently, tables can be range-partitioned or list-partitioned.  List
partitioning is limited to a single column, but range partitioning can
involve multiple columns.  A partitioning "column" can be an
expression.

Because table partitioning is less general than table inheritance, it
is hoped that it will be easier to reason about properties of
partitions, and therefore that this will serve as a better foundation
for a variety of possible optimizations, including query planner
optimizations.  The tuple routing based which this patch does based on
the implicit partitioning constraints is an example of this, but it
seems likely that many other useful optimizations are also possible.

Amit Langote, reviewed and tested by Robert Haas, Ashutosh Bapat,
Amit Kapila, Rajkumar Raghuwanshi, Corey Huinker, Jaime Casanova,
Rushabh Lathia, Erik Rijkers, among others.  Minor revisions by me.
2016-12-07 13:17:55 -05:00
Tom Lane 3ebf2b4545 Remove extraneous semicolon from uses of relptr_declare().
If we're going to write a semicolon after calls of relptr_declare(),
then we don't need one inside the macro, and removing it suppresses
"empty declaration" warnings from pickier compilers (eg pademelon).

While at it, we might as well use relptr() inside relptr_declare(),
because otherwise that macro would likely go unused altogether.

Also improve the comment, which I for one found unclear,
and provide a specific example of intended usage.
2016-12-05 20:27:55 -05:00
Robert Haas 88f626f868 Fix more DSA problems uncovered by the buildfarm.
On 32-bit systems, don't try to use 64-bit DSA pointers, because the
computation of DSA_MAX_SEGMENT_SIZE overflows Size.

Cast 1 to Size before shifting it, so that the compiler doesn't
produce a result of the wrong width.

In passing, change one use of size_t to Size.
2016-12-05 10:38:08 -05:00
Robert Haas 670b3bc8f5 Try to fix some DSA-related compiler warnings.
Commit 13df76a537 was overconfident
about how portable %016lx is.  Some compilers complain because they
need %016llx, while platforms where DSA pointers are only 32 bits
get unhappy about using a 64-bit format for a 32-bit quantity.

Thomas Munro, per an off-list suggestion from me.
2016-12-05 10:01:08 -05:00
Heikki Linnakangas fe0a0b5993 Replace PostmasterRandom() with a stronger source, second attempt.
This adds a new routine, pg_strong_random() for generating random bytes,
for use in both frontend and backend. At the moment, it's only used in
the backend, but the upcoming SCRAM authentication patches need strong
random numbers in libpq as well.

pg_strong_random() is based on, and replaces, the existing implementation
in pgcrypto. It can acquire strong random numbers from a number of sources,
depending on what's available:

- OpenSSL RAND_bytes(), if built with OpenSSL
- On Windows, the native cryptographic functions are used
- /dev/urandom

Unlike the current pgcrypto function, the source is chosen by configure.
That makes it easier to test different implementations, and ensures that
we don't accidentally fall back to a less secure implementation, if the
primary source fails. All of those methods are quite reliable, it would be
pretty surprising for them to fail, so we'd rather find out by failing
hard.

If no strong random source is available, we fall back to using erand48(),
seeded from current timestamp, like PostmasterRandom() was. That isn't
cryptographically secure, but allows us to still work on platforms that
don't have any of the above stronger sources. Because it's not very secure,
the built-in implementation is only used if explicitly requested with
--disable-strong-random.

This replaces the more complicated Fortuna algorithm we used to have in
pgcrypto, which is unfortunate, but all modern platforms have /dev/urandom,
so it doesn't seem worth the maintenance effort to keep that. pgcrypto
functions that require strong random numbers will be disabled with
--disable-strong-random.

Original patch by Magnus Hagander, tons of further work by Michael Paquier
and me.

Discussion: https://www.postgresql.org/message-id/CAB7nPqRy3krN8quR9XujMVVHYtXJ0_60nqgVc6oUk8ygyVkZsA@mail.gmail.com
Discussion: https://www.postgresql.org/message-id/CAB7nPqRWkNYRRPJA7-cF+LfroYV10pvjdz6GNvxk-Eee9FypKA@mail.gmail.com
2016-12-05 13:42:59 +02:00
Robert Haas 767a9039d7 Fix thinko in b3427dade1. 2016-12-02 15:06:41 -05:00
Robert Haas 13df76a537 Introduce dynamic shared memory areas.
Programmers discovered decades ago that it was useful to have a simple
interface for allocating and freeing memory, which is why malloc() and
free() were invented.  Unfortunately, those handy tools don't work
with dynamic shared memory segments because those are specific to
PostgreSQL and are not necessarily mapped at the same address in every
cooperating process.  So invent our own allocator instead.  This makes
it possible for processes cooperating as part of parallel query
execution to allocate and free chunks of memory without having to
reserve them prior to the start of execution.  It could also be used
for longer lived objects; for example, we could consider storing data
for pg_stat_statements or the stats collector in shared memory using
these interfaces, rather than writing them to files.  Basically,
anything that needs shared memory but can't predict in advance how
much it's going to need might find this useful.

Thomas Munro and Robert Haas.  The original code (of mine) on which
Thomas based his work was actually designed to be a new backend-local
memory allocator for PostgreSQL, but that hasn't gone anywhere - or
not yet, anyway.  Thomas took that work and performed major
refactoring and extensive modifications to make it work with dynamic
shared memory, including the addition of appropriate locking.

Discussion: CA+TgmobkeWptGwiNa+SGFWsTLzTzD-CeLz0KcE-y6LFgoUus4A@mail.gmail.com
Discussion: CAEepm=1z5WLuNoJ80PaCvz6EtG9dN0j-KuHcHtU6QEfcPP5-qA@mail.gmail.com
2016-12-02 12:34:36 -05:00
Robert Haas 13e14a78ea Management of free memory pages.
This is intended as infrastructure for a full-fledged allocator for
dynamic shared memory.  The interface looks a bit like a real
allocator, but only supports allocating and freeing memory in
multiples of the 4kB page size.  Further, to free memory, you must
know the size of the span you wish to free, in pages.  While these are
make it unsuitable as an allocator in and of itself, it still serves
as very useful scaffolding for a full-fledged allocator.

Robert Haas and Thomas Munro.  This code is mostly the same as my 2014
submission, but Thomas fixed quite a few bugs and made some changes to
the interface.

Discussion: CA+TgmobkeWptGwiNa+SGFWsTLzTzD-CeLz0KcE-y6LFgoUus4A@mail.gmail.com
Discussion: CAEepm=1z5WLuNoJ80PaCvz6EtG9dN0j-KuHcHtU6QEfcPP5-qA@mail.gmail.com
2016-12-02 12:03:30 -05:00
Robert Haas fbc1c12a94 Add a crude facility for dealing with relative pointers.
C doesn't have any sort of built-in understanding of a pointer
relative to some arbitrary base address, but dynamic shared memory
segments can be mapped at different addresses in different processes,
so any sort of shared data structure stored within a dynamic shared
memory segment can't use absolute pointers.  We could use something
like Size to represent a relative pointer, but then the compiler
provides no type-checking.  Use stupid macro tricks to get some
type-checking.

Patch originally by me.  Concept suggested by Andres Freund.  Recently
resubmitted as part of Thomas Munro's work on dynamic shared memory
allocation.

Discussion: 20131205144434.GG12398@alap2.anarazel.de
Discussion: CAEepm=1z5WLuNoJ80PaCvz6EtG9dN0j-KuHcHtU6QEfcPP5-qA@mail.gmail.com
2016-12-02 11:29:01 -05:00
Tom Lane 11da83a0e7 Add uuid to the set of types supported by contrib/btree_gist.
Paul Jungwirth, reviewed and hacked on by Teodor Sigaev, Ildus
Kurbangaliev, Adam Brusselback, Chris Bandy, and myself.

Discussion: https://postgr.es/m/CA+renyUEE29=X01JXdz8_TQvo6n9=2XoEBBRnQ8rkLyr+kjPxQ@mail.gmail.com
Discussion: https://postgr.es/m/55F6EE82.8080209@sigaev.ru
2016-11-29 14:08:34 -05:00
Tom Lane ffaa44cb55 Account for catalog snapshot in PGXACT->xmin updates.
The CatalogSnapshot was not plugged into SnapshotResetXmin()'s accounting
for whether MyPgXact->xmin could be cleared or advanced.  In normal
transactions this was masked by the fact that the transaction snapshot
would be older, but during backend startup and certain utility commands
it was possible to re-use the CatalogSnapshot after MyPgXact->xmin had
been cleared, meaning that recently-deleted rows could be pruned even
though this snapshot could still see them, causing unexpected catalog
lookup failures.  This effect appears to be the explanation for a recent
failure on buildfarm member piculet.

To fix, add the CatalogSnapshot to the RegisteredSnapshots heap whenever
it is valid.

In the previous logic, it was possible for the CatalogSnapshot to remain
valid across waits for client input, but with this change that would mean
it delays advance of global xmin in cases where it did not before.  To
avoid possibly causing new table-bloat problems with clients that sit idle
for long intervals, add code to invalidate the CatalogSnapshot before
waiting for client input.  (When the backend is busy, it's unlikely that
the CatalogSnapshot would be the oldest snap for very long, so we don't
worry about forcing early invalidation of it otherwise.)

In passing, remove the CatalogSnapshotStale flag in favor of using
"CatalogSnapshot != NULL" to represent validity, as we do for the other
special snapshots in snapmgr.c.  And improve some obsolete comments.

No regression test because I don't know a deterministic way to cause this
failure.  But the stress test shown in the original discussion provokes
"cache lookup failed for relation 1255" within a few dozen seconds for me.

Back-patch to 9.4 where MVCC catalog scans were introduced.  (Note: it's
quite easy to produce similar failures with the same test case in branches
before 9.4.  But MVCC catalog scans were supposed to fix that.)

Discussion: <16447.1478818294@sss.pgh.pa.us>
2016-11-15 15:55:35 -05:00
Tom Lane 530f806524 Re-allow user_catalog_table option for materialized views.
The reloptions stuff allows this option to be set on a matview.
While it's questionable whether that is useful or was really intended,
it does work, and we shouldn't change that in minor releases.  Commit
e3e66d8a9 disabled the option since I didn't realize that it was
possible for it to be set on a matview.  Tweak the test to re-allow it.

Discussion: <19749.1478711862@sss.pgh.pa.us>
2016-11-10 15:00:58 -05:00
Tom Lane e3e66d8a98 Band-aid fix for incorrect use of view options as StdRdOptions.
We really ought to make StdRdOptions and the other decoded forms of
reloptions self-identifying, but for the moment, assume that only plain
relations could possibly be user_catalog_tables.  Fixes problem with bogus
"ON CONFLICT is not supported on table ... used as a catalog table" error
when target is a view with cascade option.

Discussion: <26681.1477940227@sss.pgh.pa.us>
2016-11-07 12:08:18 -05:00
Kevin Grittner 8c48375e5f Implement syntax for transition tables in AFTER triggers.
This is infrastructure for the complete SQL standard feature.  No
support is included at this point for execution nodes or PLs.  The
intent is to add that soon.

As this patch leaves things, standard syntax can create tuplestores
to contain old and/or new versions of rows affected by a statement.
References to these tuplestores are in the TriggerData structure.
C triggers can access the tuplestores directly, so they are usable,
but they cannot yet be referenced within a SQL statement.
2016-11-04 10:49:50 -05:00
Tom Lane 5c80642aa8 Remove unnecessary int2vector-specific hash function and equality operator.
These functions were originally added in commit d8cedf67a to support
use of int2vector columns as catcache lookup keys.  However, there are
no catcaches that use such columns.  (Indeed I now think it must always
have been dead code: a catcache with such a key column would need an
underlying unique index on the column, but we've never had an int2vector
btree opclass.)

Getting rid of the int2vector-specific operator and function does not
lose any functionality, because operations on int2vectors will now fall
back to the generic anyarray support.  This avoids a wart that a btree
index on an int2vector column (made using anyarray_ops) would fail to
match equality searches, because int2vectoreq wasn't a member of the
opclass.  We don't really care much about that, since int2vector is not
meant as a type for users to use, but it's silly to have extra code and
less functionality.

If we ever do want a catcache to be indexed by an int2vector column,
we'd need to put back full btree and hash opclasses for int2vector,
comparable to the support for oidvector.  (The anyarray code can't be
used at such a low level, because it needs to do catcache lookups.)
But we'll deal with that if/when the need arises.

Also worth noting is that removal of the hash int2vector_ops opclass will
break any user-created hash indexes on int2vector columns.  While hash
anyarray_ops would serve the same purpose, it would probably not compute
the same hash values and thus wouldn't be on-disk-compatible.  Given that
int2vector isn't a user-facing type and we're planning other incompatible
changes in hash indexes for v10 anyway, this doesn't seem like something
to worry about, but it's probably worth mentioning here.

Amit Langote

Discussion: <d9bb74f8-b194-7307-9ebd-90645d377e45@lab.ntt.co.jp>
2016-10-12 14:54:08 -04:00
Heikki Linnakangas b75f467b6e Simplify the code for logical tape read buffers.
Pass the buffer size as argument to LogicalTapeRewindForRead, rather than
setting it earlier with the separate LogicTapeAssignReadBufferSize call.
This way, the buffer size is set closer to where it's actually used, which
makes the code easier to understand.

This makes the calculation for how much memory to use for the buffers less
precise. We now use the same amount of memory for every tape, rounded down
to the nearest BLCKSZ boundary, instead of using one more block for some
tapes, to get the total up to exact amount of memory available. That should
be OK, merging isn't too sensitive to the exact amount of memory used.

Reviewed by Peter Geoghegan

Discussion: <0f607c4b-df23-353e-bf56-c0389d28495f@iki.fi>
2016-10-12 12:05:45 +03:00
Tom Lane 6bc811c992 Show a sensible value in pg_settings.unit for GUC_UNIT_XSEGS variables.
Commit 88e982302 invented GUC_UNIT_XSEGS for min_wal_size and max_wal_size,
but neglected to make it display sensibly in pg_settings.unit (by adding a
case to the switch in GetConfigOptionByNum).  Fix that, and adjust said
switch to throw a run-time error the next time somebody forgets.

In passing, avoid using a static buffer for the output string --- the rest
of this function pstrdup's from a local buffer, and I see no very good
reason why the units code should do it differently and less safely.

Per report from Otar Shavadze.  Back-patch to 9.5 where the new unit type
was added.

Report: <CAG-jOyA=iNFhN+yB4vfvqh688B7Tr5SArbYcFUAjZi=0Exp-Lg@mail.gmail.com>
2016-10-03 16:40:25 -04:00
Heikki Linnakangas e94568ecc1 Change the way pre-reading in external sort's merge phase works.
Don't pre-read tuples into SortTuple slots during merge. Instead, use the
memory for larger read buffers in logtape.c. We're doing the same number
of READTUP() calls either way, but managing the pre-read SortTuple slots
is much more complicated. Also, the on-tape representation is more compact
than SortTuples, so we can fit more pre-read tuples into the same amount
of memory this way. And we have better cache-locality, when we use just a
small number of SortTuple slots.

Now that we only hold one tuple from each tape in the SortTuple slots, we
can greatly simplify the "batch memory" management. We now maintain a
small set of fixed-sized slots, to hold the tuples, and fall back to
palloc() for larger tuples. We use this method during all merge phases,
not just the final merge, and also when randomAccess is requested, and
also in the TSS_SORTEDONTAPE case. In other words, it's used whenever we
do an external sort.

Reviewed by Peter Geoghegan and Claudio Freire.

Discussion: <CAM3SWZTpaORV=yQGVCG8Q4axcZ3MvF-05xe39ZvORdU9JcD6hQ@mail.gmail.com>
2016-10-03 13:37:49 +03:00
Peter Eisentraut cd03890d0b Fix breakage in previous change 2016-09-30 15:27:51 -04:00
Peter Eisentraut 330b48b94b Separate enum from struct
Otherwise the enum symbols are not visible outside the struct in C++.

Reviewed-by: Thomas Munro <thomas.munro@enterprisedb.com>
2016-09-30 15:11:47 -04:00
Tom Lane f2717c79ee Improve unreachability recognition in elog() macro.
Some experimentation with an older version of gcc showed that it is able
to determine whether "if (elevel_ >= ERROR)" is compile-time constant
if elevel_ is declared "const", but otherwise not so much.  We had
accounted for that in ereport() but were too miserly with braces to
make it so in elog().  I don't know how many currently-interesting
compilers have the same quirk, but in case it will save some code
space, let's make sure that elog() is on the same footing as ereport()
for this purpose.

Back-patch to 9.3 where we introduced pg_unreachable() calls into
elog/ereport.
2016-09-10 17:54:23 -04:00
Peter Eisentraut 49eb0fd097 Add location field to DefElem
Add a location field to the DefElem struct, used to parse many utility
commands.  Update various error messages to supply error position
information.

To propogate the error position information in a more systematic way,
create a ParseState in standard_ProcessUtility() and pass that to
interested functions implementing the utility commands.  This seems
better than passing the query string and then reassembling a parse state
ad hoc, which violates the encapsulation of the ParseState type.

Reviewed-by: Pavel Stehule <pavel.stehule@gmail.com>
2016-09-06 12:00:00 -04:00
Tom Lane ea268cdc9a Add macros to make AllocSetContextCreate() calls simpler and safer.
I found that half a dozen (nearly 5%) of our AllocSetContextCreate calls
had typos in the context-sizing parameters.  While none of these led to
especially significant problems, they did create minor inefficiencies,
and it's now clear that expecting people to copy-and-paste those calls
accurately is not a great idea.  Let's reduce the risk of future errors
by introducing single macros that encapsulate the common use-cases.
Three such macros are enough to cover all but two special-purpose contexts;
those two calls can be left as-is, I think.

While this patch doesn't in itself improve matters for third-party
extensions, it doesn't break anything for them either, and they can
gradually adopt the simplified notation over time.

In passing, change TopMemoryContext to use the default allocation
parameters.  Formerly it could only be extended 8K at a time.  That was
probably reasonable when this code was written; but nowadays we create
many more contexts than we did then, so that it's not unusual to have a
couple hundred K in TopMemoryContext, even without considering various
dubious code that sticks other things there.  There seems no good reason
not to let it use growing blocks like most other contexts.

Back-patch to 9.6, mostly because that's still close enough to HEAD that
it's easy to do so, and keeping the branches in sync can be expected to
avoid some future back-patching pain.  The bugs fixed by these changes
don't seem to be significant enough to justify fixing them further back.

Discussion: <21072.1472321324@sss.pgh.pa.us>
2016-08-27 17:50:38 -04:00
Tom Lane 77e2906821 Create an SP-GiST opclass for inet/cidr.
This seems to offer significantly better search performance than the
existing GiST opclass for inet/cidr, at least on data with a wide mix
of network mask lengths.  (That may suggest that the data splitting
heuristics in the GiST opclass could be improved.)

Emre Hasegeli, with mostly-cosmetic adjustments by me

Discussion: <CAE2gYzxtth9qatW_OAqdOjykS0bxq7AYHLuyAQLPgT7H9ZU0Cw@mail.gmail.com>
2016-08-23 15:16:30 -04:00
Robert Haas 86f31695f3 Add txid_current_ifassigned().
Add a variant of txid_current() that returns NULL if no transaction ID
is assigned.  This version can be used even on a standby server,
although it will always return NULL since no transaction IDs can be
assigned during recovery.

Craig Ringer, per suggestion from Jim Nasby.  Reviewed by Petr Jelinek
and by me.
2016-08-23 10:30:52 -04:00
Tom Lane 7b405b3e04 Refactor some network.c code to create cidr_set_masklen_internal().
Merge several copies of "copy an inet value and adjust the mask length"
code to create a single, conveniently C-callable function.  This function
is exported for future use by inet SPGiST support, but it's good cleanup
anyway since we had three slightly-different-for-no-good-reason copies.

(Extracted from a larger patch, to separate new code from refactoring
of old code)

Emre Hasegeli
2016-08-23 09:39:54 -04:00
Tom Lane 5697522d84 In plpgsql, don't try to convert int2vector or oidvector to expanded array.
These types are storage-compatible with real arrays, but they don't support
toasting, so of course they can't support expansion either.

Per bug #14289 from Michael Overmeyer.  Back-patch to 9.5 where expanded
arrays were introduced.

Report: <20160818174414.1529.37913@wrigleys.postgresql.org>
2016-08-18 14:49:08 -04:00
Tom Lane cf9b0fea5f Implement regexp_match(), a simplified alternative to regexp_matches().
regexp_match() is like regexp_matches(), but it disallows the 'g' flag
and in consequence does not need to return a set.  Instead, it returns
a simple text array value, or NULL if there's no match.  Previously people
usually got that behavior with a sub-select, but this way is considerably
more efficient.

Documentation adjusted so that regexp_match() is presented first and then
regexp_matches() is introduced as a more complicated version.  This is
a bit historically revisionist but seems pedagogically better.

Still TODO: extend contrib/citext to support this function.

Emre Hasegeli, reviewed by David Johnston

Discussion: <CAE2gYzy42sna2ME_e3y1KLQ-4UBrB-eVF0SWn8QG39sQSeVhEw@mail.gmail.com>
2016-08-17 18:33:01 -04:00
Tom Lane 0bb51aa967 Improve parsetree representation of special functions such as CURRENT_DATE.
We implement a dozen or so parameterless functions that the SQL standard
defines special syntax for.  Up to now, that was done by converting them
into more or less ad-hoc constructs such as "'now'::text::date".  That's
messy for multiple reasons: it exposes what should be implementation
details to users, and performance is worse than it needs to be in several
cases.  To improve matters, invent a new expression node type
SQLValueFunction that can represent any of these parameterless functions.

Bump catversion because this changes stored parsetrees for rules.

Discussion: <30058.1463091294@sss.pgh.pa.us>
2016-08-16 20:33:01 -04:00
Tom Lane 9389fbd038 Remove bogus dependencies on NUMERIC_MAX_PRECISION.
NUMERIC_MAX_PRECISION is a purely arbitrary constraint on the precision
and scale you can write in a numeric typmod.  It might once have had
something to do with the allowed range of a typmod-less numeric value,
but at least since 9.1 we've allowed, and documented that we allowed,
any value that would physically fit in the numeric storage format;
which is something over 100000 decimal digits, not 1000.

Hence, get rid of numeric_in()'s use of NUMERIC_MAX_PRECISION as a limit
on the allowed range of the exponent in scientific-format input.  That was
especially silly in view of the fact that you can enter larger numbers as
long as you don't use 'e' to do it.  Just constrain the value enough to
avoid localized overflow, and let make_result be the final arbiter of what
is too large.  Likewise adjust ecpg's equivalent of this code.

Also get rid of numeric_recv()'s use of NUMERIC_MAX_PRECISION to limit the
number of base-NBASE digits it would accept.  That created a dump/restore
hazard for binary COPY without doing anything useful; the wire-format
limit on number of digits (65535) is about as tight as we would want.

In HEAD, also get rid of pg_size_bytes()'s unnecessary intimacy with what
the numeric range limit is.  That code doesn't exist in the back branches.

Per gripe from Aravind Kumar.  Back-patch to all supported branches,
since they all contain the documentation claim about allowed range of
NUMERIC (cf commit cabf5d84b).

Discussion: <2895.1471195721@sss.pgh.pa.us>
2016-08-14 15:06:01 -04:00
Tom Lane ed0097e4f9 Add SQL-accessible functions for inspecting index AM properties.
Per discussion, we should provide such functions to replace the lost
ability to discover AM properties by inspecting pg_am (cf commit
65c5fcd35).  The added functionality is also meant to displace any code
that was looking directly at pg_index.indoption, since we'd rather not
believe that the bit meanings in that field are part of any client API
contract.

As future-proofing, define the SQL API to not assume that properties that
are currently AM-wide or index-wide will remain so unless they logically
must be; instead, expose them only when inquiring about a specific index
or even specific index column.  Also provide the ability for an index
AM to override the behavior.

In passing, document pg_am.amtype, overlooked in commit 473b93287.

Andrew Gierth, with kibitzing by me and others

Discussion: <87mvl5on7n.fsf@news-spur.riddles.org.uk>
2016-08-13 18:31:14 -04:00
Tom Lane 9ee1cf04ab Fix TOAST access failure in RETURNING queries.
Discussion of commit 3e2f3c2e4 exposed a problem that is of longer
standing: since we don't detoast data while sticking it into a portal's
holdStore for PORTAL_ONE_RETURNING and PORTAL_UTIL_SELECT queries, and we
release the query's snapshot as soon as we're done loading the holdStore,
later readout of the holdStore can do TOAST fetches against data that can
no longer be seen by any of the session's live snapshots.  This means that
a concurrent VACUUM could remove the TOAST data before we can fetch it.
Commit 3e2f3c2e4 exposed the problem by showing that sometimes we had *no*
live snapshots while fetching TOAST data, but we'd be at risk anyway.

I believe this code was all right when written, because our management of a
session's exposed xmin was such that the TOAST references were safe until
end of transaction.  But that's no longer true now that we can advance or
clear our PGXACT.xmin intra-transaction.

To fix, copy the query's snapshot during FillPortalStore() and save it in
the Portal; release it only when the portal is dropped.  This essentially
implements a policy that we must hold a relevant snapshot whenever we
access potentially-toasted data.  We had already come to that conclusion
in other places, cf commits 08e261cbc9 and ec543db77b.

I'd have liked to add a regression test case for this, but I didn't see
a way to make one that's not unreasonably bloated; it seems to require
returning a toasted value to the client, and those will be big.

In passing, improve PortalRunUtility() so that it positively verifies
that its ending PopActiveSnapshot() call will pop the expected snapshot,
removing a rather shaky assumption about which utility commands might
do their own PopActiveSnapshot().  There's no known bug here, but now
that we're actively referencing the snapshot it's almost free to make
this code a bit more bulletproof.

We might want to consider back-patching something like this into older
branches, but it would be prudent to let it prove itself more in HEAD
beforehand.

Discussion: <87vazemeda.fsf@credativ.de>
2016-08-07 17:46:08 -04:00
Tom Lane fc509cd824 Fix copy-and-pasteo in 81c766b3fd.
Report: <57A4E6DF.8070209@dunslane.net>
2016-08-05 16:21:38 -04:00
Robert Haas 81c766b3fd Change InitToastSnapshot to a macro.
tqual.h is included in some front-end compiles, and a static inline
breaks on buildfarm member castoroides.  Since the macro is never
referenced, it should dodge that problem, although this doesn't
seem like the cleanest way of hiding things from front-end compiles.

Report and review by Tom Lane; patch by me.
2016-08-05 11:58:03 -04:00
Robert Haas 3e2f3c2e42 Prevent "snapshot too old" from trying to return pruned TOAST tuples.
Previously, we tested for MVCC snapshots to see whether they were too
old, but not TOAST snapshots, which can lead to complaints about missing
TOAST chunks if those chunks are subject to early pruning.  Ideally,
the threshold lsn and timestamp for a TOAST snapshot would be that of
the corresponding MVCC snapshot, but since we have no way of deciding
which MVCC snapshot was used to fetch the TOAST pointer, use the oldest
active or registered snapshot instead.

Reported by Andres Freund, who also sketched out what the fix should
look like.  Patch by me, reviewed by Amit Kapila.
2016-08-03 16:50:01 -04:00
Tom Lane 45639a0525 Avoid invalidating all foreign-join cached plans when user mappings change.
We must not push down a foreign join when the foreign tables involved
should be accessed under different user mappings.  Previously we tried
to enforce that rule literally during planning, but that meant that the
resulting plans were dependent on the current contents of the
pg_user_mapping catalog, and we had to blow away all cached plans
containing any remote join when anything at all changed in pg_user_mapping.
This could have been improved somewhat, but the fact that a syscache inval
callback has very limited info about what changed made it hard to do better
within that design.  Instead, let's change the planner to not consider user
mappings per se, but to allow a foreign join if both RTEs have the same
checkAsUser value.  If they do, then they necessarily will use the same
user mapping at runtime, and we don't need to know specifically which one
that is.  Post-plan-time changes in pg_user_mapping no longer require any
plan invalidation.

This rule does give up some optimization ability, to wit where two foreign
table references come from views with different owners or one's from a view
and one's directly in the query, but nonetheless the same user mapping
would have applied.  We'll sacrifice the first case, but to not regress
more than we have to in the second case, allow a foreign join involving
both zero and nonzero checkAsUser values if the nonzero one is the same as
the prevailing effective userID.  In that case, mark the plan as only
runnable by that userID.

The plancache code already had a notion of plans being userID-specific,
in order to support RLS.  It was a little confused though, in particular
lacking clarity of thought as to whether it was the rewritten query or just
the finished plan that's dependent on the userID.  Rearrange that code so
that it's clearer what depends on which, and so that the same logic applies
to both RLS-injected role dependency and foreign-join-injected role
dependency.

Note that this patch doesn't remove the other issue mentioned in the
original complaint, which is that while we'll reliably stop using a foreign
join if it's disallowed in a new context, we might fail to start using a
foreign join if it's now allowed, but we previously created a generic
cached plan that didn't use one.  It was agreed that the chance of winning
that way was not high enough to justify the much larger number of plan
invalidations that would have to occur if we tried to cause it to happen.

In passing, clean up randomly-varying spelling of EXPLAIN commands in
postgres_fdw.sql, and fix a COSTS ON example that had been allowed to
leak into the committed tests.

This reverts most of commits fbe5a3fb7 and 5d4171d1c, which were the
previous attempt at ensuring we wouldn't push down foreign joins that
span permissions contexts.

Etsuro Fujita and Tom Lane

Discussion: <d49c1e5b-f059-20f4-c132-e9752ee0113e@lab.ntt.co.jp>
2016-07-15 17:23:02 -04:00
Tom Lane 1acf757255 Fix GiST index build for NaN values in geometric types.
GiST index build could go into an infinite loop when presented with boxes
(or points, circles or polygons) containing NaN component values.  This
happened essentially because the code assumed that x == x is true for any
"double" value x; but it's not true for NaNs.  The looping behavior was not
the only problem though: we also attempted to sort the items using simple
double comparisons.  Since NaNs violate the trichotomy law, qsort could
(in principle at least) get arbitrarily confused and mess up the sorting of
ordinary values as well as NaNs.  And we based splitting choices on box size
calculations that could produce NaNs, again resulting in undesirable
behavior.

To fix, replace all comparisons of doubles in this logic with
float8_cmp_internal, which is NaN-aware and is careful to sort NaNs
consistently, higher than any non-NaN.  Also rearrange the box size
calculation to not produce NaNs; instead it should produce an infinity
for a box with NaN on one side and not-NaN on the other.

I don't by any means claim that this solves all problems with NaNs in
geometric values, but it should at least make GiST index insertion work
reliably with such data.  It's likely that the index search side of things
still needs some work, and probably regular geometric operations too.
But with this patch we're laying down a convention for how such cases
ought to behave.

Per bug #14238 from Guang-Dih Lei.  Back-patch to 9.2; the code used before
commit 7f3bd86843 is quite different and doesn't lock up on my simple
test case, nor on the submitter's dataset.

Report: <20160708151747.1426.60150@wrigleys.postgresql.org>
Discussion: <28685.1468246504@sss.pgh.pa.us>
2016-07-14 18:45:59 -04:00
Tom Lane 100340e2dc Restore foreign-key-aware estimation of join relation sizes.
This patch provides a new implementation of the logic added by commit
137805f89 and later removed by 77ba61080.  It differs from the original
primarily in expending much less effort per joinrel in large queries,
which it accomplishes by doing most of the matching work once per query not
once per joinrel.  Hopefully, it's also less buggy and better commented.
The never-documented enable_fkey_estimates GUC remains gone.

There remains work to be done to make the selectivity estimates account
for nulls in FK referencing columns; but that was true of the original
patch as well.  We may be able to address this point later in beta.
In the meantime, any error should be in the direction of overestimating
rather than underestimating joinrel sizes, which seems like the direction
we want to err in.

Tomas Vondra and Tom Lane

Discussion: <31041.1465069446@sss.pgh.pa.us>
2016-06-18 15:22:34 -04:00
Kevin Grittner bf9a60ee33 Fix interaction between CREATE INDEX and "snapshot too old".
Since indexes are created without valid LSNs, an index created
while a snapshot older than old_snapshot_threshold existed could
cause queries to return incorrect results when those old snapshots
were used, if any relevant rows had been subject to early pruning
before the index was built.  Prevent usage of a newly created index
until all such snapshots are released, for relations where this can
happen.

Questions about the interaction of "snapshot too old" with index
creation were initially raised by Andres Freund.

Reviewed by Robert Haas.
2016-06-10 09:25:31 -05:00
Robert Haas 4bc424b968 pgindent run for 9.6 2016-06-09 18:02:36 -04:00
Tom Lane e4158319f3 Mop-up for parallel degree-ectomy.
Fix a couple of overlooked uses of "degree" terminology.  Make the parallel
worker count selection logic in create_plain_partial_paths more robust (in
particular, it failed with max_parallel_workers_per_gather set to zero).
2016-06-09 11:16:26 -04:00
Robert Haas c9ce4a1c61 Eliminate "parallel degree" terminology.
This terminology provoked widespread complaints.  So, instead, rename
the GUC max_parallel_degree to max_parallel_workers_per_gather
(leaving room for a possible future GUC max_parallel_workers that acts
as a system-wide limit), and rename the parallel_degree reloption to
parallel_workers.  Rename structure members to match.

These changes create a dump/restore hazard for users of PostgreSQL
9.6beta1 who have set the reloption (or applied the GUC using ALTER
USER or ALTER DATABASE).
2016-06-09 10:00:26 -04:00
Tom Lane 77ba610805 Revert "Use Foreign Key relationships to infer multi-column join selectivity".
This commit reverts 137805f89 as well as the associated commits 015e88942,
5306df283, and 68d704edb.  We found multiple bugs in this feature, and
there was concern about possible planner slowdown (though to be fair,
exhibiting a very large slowdown proved difficult).  The way forward
requires a considerable rewrite, which may or may not be possible to
accomplish in time for beta2.  In my judgment reviewing the rewrite will
be easier to accomplish starting from a clean slate, so let's temporarily
revert what's there now.  This also leaves us in a safe state if it turns
out to be necessary to postpone the rewrite to the next development cycle.

Discussion: <20160429102531.GA13701@huehner.biz>
2016-06-07 17:21:17 -04:00
Tom Lane d50183c578 Inline the easy cases in MakeExpandedObjectReadOnly().
This attempts to buy back some of whatever performance we lost from fixing
bug #14174 by inlining the initial checks in MakeExpandedObjectReadOnly()
into the callers.  We can do that in a macro without creating multiple-
evaluation hazards, so it's pretty much free notationally; and the amount
of code added to callers should be minimal as well.  (Testing a value can't
take many more instructions than passing it to a subroutine.)

Might as well inline DatumIsReadWriteExpandedObject() while we're at it.

This is an ABI break for callers, so it doesn't seem safe to put into 9.5,
but I see no reason not to do it in HEAD.
2016-06-03 18:34:05 -04:00
Tom Lane 9dd4178cec Be more predictable about reporting "lock timeout" vs "statement timeout".
If both timeout indicators are set when we arrive at ProcessInterrupts,
we've historically just reported "lock timeout".  However, some buildfarm
members have been observed to fail isolationtester's timeouts test by
reporting "lock timeout" when the statement timeout was expected to fire
first.  The cause seems to be that the process is allowed to sleep longer
than expected (probably due to heavy machine load) so that the lock
timeout happens before we reach the point of reporting the error, and
then this arbitrary tiebreak rule does the wrong thing.  We can improve
matters by comparing the scheduled timeout times to decide which error
to report.

I had originally proposed greatly reducing the 1-second window between
the two timeouts in the test cases.  On reflection that is a bad idea,
at least for the case where the lock timeout is expected to fire first,
because that would assume that it takes negligible time to get from
statement start to the beginning of the lock wait.  Thus, this patch
doesn't completely remove the risk of test failures on slow machines.
Empirically, however, the case this handles is the one we are seeing
in the buildfarm.  The explanation may be that the other case requires
the scheduler to take the CPU away from a busy process, whereas the
case fixed here only requires the scheduler to not give the CPU back
right away to a process that has been woken from a multi-second sleep
(and, perhaps, has been swapped out meanwhile).

Back-patch to 9.3 where the isolationtester timeouts test was added.

Discussion: <8693.1464314819@sss.pgh.pa.us>
2016-05-27 10:40:20 -04:00
Kevin Grittner 2cc41acd8f Fix hash index vs "snapshot too old" problemms
Hash indexes are not WAL-logged, and so do not maintain the LSN of
index pages.  Since the "snapshot too old" feature counts on
detecting error conditions using the LSN of a table and all indexes
on it, this makes it impossible to safely do early vacuuming on any
table with a hash index, so add this to the tests for whether the
xid used to vacuum a table can be adjusted based on
old_snapshot_threshold.

While at it, add a paragraph to the docs for old_snapshot_threshold
which specifically mentions this and other aspects of the feature
which may otherwise surprise users.

Problem reported and patch reviewed by Amit Kapila
2016-05-06 07:47:12 -05:00
Kevin Grittner 7c3e8039f4 Add a few entries to the tail of time mapping, to see old values.
Without a few entries beyond old_snapshot_threshold, the lookup
would often fail, resulting in the more aggressive pruning or
vacuum being skipped often enough to matter.  This was very clearly
shown by a python test script posted by Ants Aasma, and was likely
a factor in an earlier but somewhat less clear-cut test case posted
by Jeff Janes.

This patch makes no change to the logic, per se -- it just makes
the array of mapping entries big enough to make lookup misses based
on timing much less likely.  An occasional miss is still possible
if a thread stalls for more than 10 minutes, but that does not
create any problem with correctness of behavior.  Besides, if
things are so busy that a thread is stalling for more than 10
minutes, it is probably OK to skip the more aggressive cleanup at
that particular point in time.
2016-04-29 16:46:08 -05:00
Tom Lane 14216649f3 PGDLLIMPORT-ify old_snapshot_threshold.
Revert commit 7cb1db1d95, which represented
a misunderstanding of the problem (if snapmgr.h weren't already included
in bufmgr.h, things wouldn't compile anywhere).  Instead install what
I think is the real fix.
2016-04-21 14:33:34 -04:00
Andres Freund 80abbeba23 Make init_spin_delay() C89 compliant and change stuck spinlock reporting.
The current definition of init_spin_delay (introduced recently in
48354581a) wasn't C89 compliant. It's not legal to refer to refer to
non-constant expressions, and the ptr argument was one.  This, as
reported by Tom, lead to a failure on buildfarm animal pademelon.

The pointer, especially on system systems with ASLR, isn't super helpful
anyway, though. So instead of making init_spin_delay into an inline
function, make s_lock_stuck() report the function name in addition to
file:line and change init_spin_delay() accordingly. While not a direct
replacement, the function name is likely more useful anyway (line
numbers are often hard to interpret in third party reports).

This also fixes what file/line number is reported for waits via
s_lock().

As PG_FUNCNAME_MACRO is now used outside of elog.h, move it to c.h.

Reported-By: Tom Lane
Discussion: 4369.1460435533@sss.pgh.pa.us
2016-04-13 17:00:53 -07:00
Stephen Frost 293007898d Reserve the "pg_" namespace for roles
This will prevent users from creating roles which begin with "pg_" and
will check for those roles before allowing an upgrade using pg_upgrade.

This will allow for default roles to be provided at initdb time.

Reviews by José Luis Tallón and Robert Haas
2016-04-08 16:56:27 -04:00
Kevin Grittner 848ef42bb8 Add the "snapshot too old" feature
This feature is controlled by a new old_snapshot_threshold GUC.  A
value of -1 disables the feature, and that is the default.  The
value of 0 is just intended for testing.  Above that it is the
number of minutes a snapshot can reach before pruning and vacuum
are allowed to remove dead tuples which the snapshot would
otherwise protect.  The xmin associated with a transaction ID does
still protect dead tuples.  A connection which is using an "old"
snapshot does not get an error unless it accesses a page modified
recently enough that it might not be able to produce accurate
results.

This is similar to the Oracle feature, and we use the same SQLSTATE
and error message for compatibility.
2016-04-08 14:36:30 -05:00
Teodor Sigaev 8b99edefca Revert CREATE INDEX ... INCLUDING ...
It's not ready yet, revert two commits
690c543550 - unstable test output
386e3d7609 - patch itself
2016-04-08 21:52:13 +03:00
Robert Haas af025eed53 Add combine functions for various floating-point aggregates.
This allows parallel aggregation to use them.  It may seem surprising
that we use float8_combine for both float4_accum and float8_accum
transition functions, but that's because those functions differ only
in the type of the non-transition-state argument.

Haribabu Kommi, reviewed by David Rowley and Tomas Vondra
2016-04-08 13:47:06 -04:00
Teodor Sigaev 386e3d7609 CREATE INDEX ... INCLUDING (column[, ...])
Now indexes (but only B-tree for now) can contain "extra" column(s) which
doesn't participate in index structure, they are just stored in leaf
tuples. It allows to use index only scan by using single index instead
of two or more indexes.

Author: Anastasia Lubennikova with minor editorializing by me
Reviewers: David Rowley, Peter Geoghegan, Jeff Janes
2016-04-08 19:45:59 +03:00
Robert Haas 25fe8b5f1a Add a 'parallel_degree' reloption.
The code that estimates what parallel degree should be uesd for the
scan of a relation is currently rather stupid, so add a parallel_degree
reloption that can be used to override the planner's rather limited
judgement.

Julien Rouhaud, reviewed by David Rowley, James Sewell, Amit Kapila,
and me.  Some further hacking by me.
2016-04-08 11:14:56 -04:00
Simon Riggs 015e88942a Load FK defs into relcache for use by planner
Fastpath ignores this if no triggers defined.

Author: Tomas Vondra, with fastpath and comments added by me
Reviewers: David Rowley, Simon Riggs
2016-04-07 12:08:33 +01:00
Teodor Sigaev 0b62fd036e Add jsonb_insert
It inserts a new value into an jsonb array at arbitrary position or
a new key to jsonb object.

Author: Dmitry Dolgov
Reviewers: Petr Jelinek, Vitaly Burovoy, Andrew Dunstan
2016-04-06 19:25:00 +03:00
Robert Haas 11c8669c0c Add parallel query support functions for assorted aggregates.
This lets us use parallel aggregate for a variety of useful cases
that didn't work before, like sum(int8), sum(numeric), several
versions of avg(), and various other functions.

Add some regression tests, as well, testing the general sanity of
these and future catalog entries.

David Rowley, reviewed by Tomas Vondra, with a few further changes
by me.
2016-04-05 14:32:53 -04:00
Tom Lane 66229ac004 Introduce a LOG_SERVER_ONLY ereport level, which is never sent to client.
This elevel is useful for logging audit messages and similar information
that should not be passed to the client.  It's equivalent to LOG in terms
of decisions about logging priority in the postmaster log, but messages
with this elevel will never be sent to the client.

In the current implementation, it's just an alias for the longstanding
COMMERROR elevel (or more accurately, we've made COMMERROR an alias for
this).  At some point it might be interesting to allow a LOG_ONLY flag to
be attached to any elevel, but that would be considerably more complicated,
and it's not clear there's enough use-cases to justify the extra work.
For now, let's just take the easy 90% solution.

David Steele, reviewed by Fabien Coelho, Petr Jelínek, and myself
2016-04-04 12:32:42 -04:00
Tom Lane 50861cd683 Improve portability of I/O behavior for the geometric types.
Formerly, the geometric I/O routines such as box_in and point_out relied
directly on strtod() and sprintf() for conversion of the float8 component
values of their data types.  However, the behavior of those functions is
pretty platform-dependent, especially for edge-case values such as
infinities and NaNs.  This was exposed by commit acdf2a8b37, which
added test cases involving boxes with infinity endpoints, and immediately
failed on Windows and AIX buildfarm members.  We solved these problems
years ago in the main float8in and float8out functions, so let's fix it
by making the geometric types use that code instead of depending directly
on the platform-supplied functions.

To do this, refactor the float8in code so that it can be used to parse
just part of a string, and as a convenience make the guts of float8out
usable without going through DirectFunctionCall.

While at it, get rid of geo_ops.c's fairly shaky assumptions about the
maximum output string length for a double, by having it build results in
StringInfo buffers instead of fixed-length strings.

In passing, convert all the "invalid input syntax for type foo" messages
in this area of the code into "invalid input syntax for type %s" to reduce
the number of distinct translatable strings, per recent discussion.
We would have needed a fair number of the latter anyway for code-sharing
reasons, so we might as well just go whole hog.

Note: this patch is by no means intended to guarantee that the geometric
types uniformly behave sanely for infinity or NaN component values.
But any bugs we have in that line were there all along, they were just
harder to reach in a platform-independent way.
2016-03-30 17:25:03 -04:00
Teodor Sigaev acdf2a8b37 Introduce SP-GiST operator class over box.
Patch implements quad-tree over boxes, naive approach of 2D quad tree will not
work for any non-point objects because splitting space on node is not
efficient. The idea of pathc is treating 2D boxes as 4D points, so,
object will not overlap (in 4D space).

The performance tests reveal that this technique especially beneficial
with too much overlapping objects, so called "spaghetti data".

Author: Alexander Lebedev with editorization by Emre Hasegeli and me
2016-03-30 18:42:36 +03:00
Tom Lane e511d878f3 Allow to_timestamp(float8) to convert float infinity to timestamp infinity.
With the original SQL-function implementation, such cases failed because
we don't support infinite intervals.  Converting the function to C lets
us bypass the interval representation, which should be a bit faster as
well as more flexible.

Vitaly Burovoy, reviewed by Anastasia Lubennikova
2016-03-29 17:09:29 -04:00
Alvaro Herrera 473b932870 Support CREATE ACCESS METHOD
This enables external code to create access methods.  This is useful so
that extensions can add their own access methods which can be formally
tracked for dependencies, so that DROP operates correctly.  Also, having
explicit support makes pg_dump work correctly.

Currently only index AMs are supported, but we expect different types to
be added in the future.

Authors: Alexander Korotkov, Petr Jelínek
Reviewed-By: Teodor Sigaev, Petr Jelínek, Jim Nasby
Commitfest-URL: https://commitfest.postgresql.org/9/353/
Discussion: https://www.postgresql.org/message-id/CAPpHfdsXwZmojm6Dx+TJnpYk27kT4o7Ri6X_4OSWcByu1Rm+VA@mail.gmail.com
2016-03-23 23:01:35 -03:00
Teodor Sigaev 3187d6de0e Introduce parse_ident()
SQL-layer function to split qualified identifier into array parts.

Author: Pavel Stehule with minor editorization by me and Jim Nasby
2016-03-18 18:16:14 +03:00
Peter Eisentraut fc201dfd95 Add syslog_split_messages parameter
Reviewed-by: Andreas Karlsson <andreas@proxel.se>
2016-03-16 23:21:44 -04:00
Peter Eisentraut f4c454e9ba Add syslog_sequence_numbers parameter
Reviewed-by: Andreas Karlsson <andreas@proxel.se>
2016-03-16 23:21:44 -04:00
Robert Haas c6dda1f48e Add idle_in_transaction_session_timeout.
Vik Fearing, reviewed by Stéphane Schildknecht and me, and revised
slightly by me.
2016-03-16 11:30:45 -04:00
Tom Lane 0e9b89986b Cope if platform declares mbstowcs_l(), but not locale_t, in <xlocale.h>.
Previously, we included <xlocale.h> only if necessary to get the definition
of type locale_t.  According to notes in PGAC_TYPE_LOCALE_T, this is
important because on some versions of glibc that file supplies an
incompatible declaration of locale_t.  (This info may be obsolete, because
on my RHEL6 box that seems to be the *only* definition of locale_t; but
there may still be glibc's in the wild for which it's a live concern.)

It turns out though that on FreeBSD and maybe other BSDen, you can get
locale_t from stdlib.h or locale.h but mbstowcs_l() and friends only from
<xlocale.h>.  This was leaving us compiling calls to mbstowcs_l() and
friends with no visible prototype, which causes a warning and could
possibly cause actual trouble, since it's not declared to return int.

Hence, adjust the configure checks so that we'll include <xlocale.h>
either if it's necessary to get type locale_t or if it's necessary to
get a declaration of mbstowcs_l().

Report and patch by Aleksander Alekseev, somewhat whacked around by me.
Back-patch to all supported branches, since we have been using
mbstowcs_l() since 9.1.
2016-03-15 13:19:57 -04:00
Tom Lane 23a27b039d Widen query numbers-of-tuples-processed counters to uint64.
This patch widens SPI_processed, EState's es_processed field, PortalData's
portalPos field, FuncCallContext's call_cntr and max_calls fields,
ExecutorRun's count argument, PortalRunFetch's result, and the max number
of rows in a SPITupleTable to uint64, and deals with (I hope) all the
ensuing fallout.  Some of these values were declared uint32 before, and
others "long".

I also removed PortalData's posOverflow field, since that logic seems
pretty useless given that portalPos is now always 64 bits.

The user-visible results are that command tags for SELECT etc will
correctly report tuple counts larger than 4G, as will plpgsql's GET
GET DIAGNOSTICS ... ROW_COUNT command.  Queries processing more tuples
than that are still not exactly the norm, but they're becoming more
common.

Most values associated with FETCH/MOVE distances, such as PortalRun's count
argument and the count argument of most SPI functions that have one, remain
declared as "long".  It's not clear whether it would be worth promoting
those to int64; but it would definitely be a large dollop of additional
API churn on top of this, and it would only help 32-bit platforms which
seem relatively less likely to see any benefit.

Andreas Scherbaum, reviewed by Christian Ullrich, additional hacking by me
2016-03-12 16:05:29 -05:00