Commit Graph

895 Commits

Author SHA1 Message Date
Alvaro Herrera
8b08f7d482 Local partitioned indexes
When CREATE INDEX is run on a partitioned table, create catalog entries
for an index on the partitioned table (which is just a placeholder since
the table proper has no data of its own), and recurse to create actual
indexes on the existing partitions; create them in future partitions
also.

As a convenience gadget, if the new index definition matches some
existing index in partitions, these are picked up and used instead of
creating new ones.  Whichever way these indexes come about, they become
attached to the index on the parent table and are dropped alongside it,
and cannot be dropped on isolation unless they are detached first.

To support pg_dump'ing these indexes, add commands
    CREATE INDEX ON ONLY <table>
(which creates the index on the parent partitioned table, without
recursing) and
    ALTER INDEX ATTACH PARTITION
(which is used after the indexes have been created individually on each
partition, to attach them to the parent index).  These reconstruct prior
database state exactly.

Reviewed-by: (in alphabetical order) Peter Eisentraut, Robert Haas, Amit
	Langote, Jesper Pedersen, Simon Riggs, David Rowley
Discussion: https://postgr.es/m/20171113170646.gzweigyrgg6pwsg4@alvherre.pgsql
2018-01-19 11:49:22 -03:00
Peter Eisentraut
9e945f8626 Fix Latin spelling
"c.f." should be "cf.".
2018-01-11 08:32:01 -05:00
Alvaro Herrera
bab2969867 Fix typo
Author: Dagfinn Ilmari Mannsåker
Discussion: https://postgr.es/m/d8jefpk4jtd.fsf@dalvik.ping.uio.no
2018-01-03 19:12:06 -03:00
Bruce Momjian
9d4649ca49 Update copyright for 2018
Backpatch-through: certain files through 9.3
2018-01-02 23:30:12 -05:00
Alvaro Herrera
be2343221f Protect against hypothetical memory leaks in RelationGetPartitionKey
Also, fix a comment that commit 8a0596cb65 made obsolete.

Reported-by: Robert Haas
Discussion: http://postgr.es/m/CA+TgmoYbpuUUUp2GhYNwWm0qkah39spiU7uOiNXLz20ASfKYoA@mail.gmail.com
2017-12-27 18:06:14 -03:00
Alvaro Herrera
8a0596cb65 Get rid of copy_partition_key
That function currently exists to avoid leaking memory in
CacheMemoryContext in case of trouble while the partition key is being
built, but there's a better way: allocate everything in a memcxt that
goes away if the current (sub)transaction fails, and once the partition
key is built and no further errors can occur, make the memcxt permanent
by making it a child of CacheMemoryContext.

Reviewed-by: Tom Lane
Discussion: https://postgr.es/m/20171027172730.eh2domlkpn4ja62m@alvherre.pgsql
2017-12-21 14:21:39 -03:00
Tom Lane
9fa6f00b13 Rethink MemoryContext creation to improve performance.
This patch makes a number of interrelated changes to reduce the overhead
involved in creating/deleting memory contexts.  The key ideas are:

* Include the AllocSetContext header of an aset.c context in its first
malloc request, rather than allocating it separately in TopMemoryContext.
This means that we now always create an initial or "keeper" block in an
aset, even if it never receives any allocation requests.

* Create freelists in which we can save and recycle recently-destroyed
asets (this idea is due to Robert Haas).

* In the common case where the name of a context is a constant string,
just store a pointer to it in the context header, rather than copying
the string.

The first change eliminates a palloc/pfree cycle per context, and
also avoids bloat in TopMemoryContext, at the price that creating
a context now involves a malloc/free cycle even if the context never
receives any allocations.  That would be a loser for some common
usage patterns, but recycling short-lived contexts via the freelist
eliminates that pain.

Avoiding copying constant strings not only saves strlen() and strcpy()
overhead, but is an essential part of the freelist optimization because
it makes the context header size constant.  Currently we make no
attempt to use the freelist for contexts with non-constant names.
(Perhaps someday we'll need to think harder about that, but in current
usage, most contexts with custom names are long-lived anyway.)

The freelist management in this initial commit is pretty simplistic,
and we might want to refine it later --- but in common workloads that
will never matter because the freelists will never get full anyway.

To create a context with a non-constant name, one is now required to
call AllocSetContextCreateExtended and specify the MEMCONTEXT_COPY_NAME
option.  AllocSetContextCreate becomes a wrapper macro, and it includes
a test that will complain about non-string-literal context name
parameters on gcc and similar compilers.

An unfortunate side effect of making AllocSetContextCreate a macro is
that one is now *required* to use the size parameter abstraction macros
(ALLOCSET_DEFAULT_SIZES and friends) with it; the pre-9.6 habit of
writing out individual size parameters no longer works unless you
switch to AllocSetContextCreateExtended.

Internally to the memory-context-related modules, the context creation
APIs are simplified, removing the rather baroque original design whereby
a context-type module called mcxt.c which then called back into the
context-type module.  That saved a bit of code duplication, but not much,
and it prevented context-type modules from exercising control over the
allocation of context headers.

In passing, I converted the test-and-elog validation of aset size
parameters into Asserts to save a few more cycles.  The original thought
was that callers might compute size parameters on the fly, but in practice
nobody does that, so it's useless to expend cycles on checking those
numbers in production builds.

Also, mark the memory context method-pointer structs "const",
just for cleanliness.

Discussion: https://postgr.es/m/2264.1512870796@sss.pgh.pa.us
2017-12-13 13:55:16 -05:00
Tom Lane
2069e6faa0 Clean up assorted messiness around AllocateDir() usage.
This patch fixes a couple of low-probability bugs that could lead to
reporting an irrelevant errno value (and hence possibly a wrong SQLSTATE)
concerning directory-open or file-open failures.  It also fixes places
where we took shortcuts in reporting such errors, either by using elog
instead of ereport or by using ereport but forgetting to specify an
errcode.  And it eliminates a lot of just plain redundant error-handling
code.

In service of all this, export fd.c's formerly-static function
ReadDirExtended, so that external callers can make use of the coding
pattern

	dir = AllocateDir(path);
	while ((de = ReadDirExtended(dir, path, LOG)) != NULL)

if they'd like to treat directory-open failures as mere LOG conditions
rather than errors.  Also fix FreeDir to be a no-op if we reach it
with dir == NULL, as such a coding pattern would cause.

Then, remove code at many call sites that was throwing an error or log
message for AllocateDir failure, as ReadDir or ReadDirExtended can handle
that job just fine.  Aside from being a net code savings, this gets rid of
a lot of not-quite-up-to-snuff reports, as mentioned above.  (In some
places these changes result in replacing a custom error message such as
"could not open tablespace directory" with more generic wording "could not
open directory", but it was agreed that the custom wording buys little as
long as we report the directory name.)  In some other call sites where we
can't just remove code, change the error reports to be fully
project-style-compliant.

Also reorder code in restoreTwoPhaseData that was acquiring a lock
between AllocateDir and ReadDir; in the unlikely but surely not
impossible case that LWLockAcquire changes errno, AllocateDir failures
would be misreported.  There is no great value in opening the directory
before acquiring TwoPhaseStateLock, so just do it in the other order.

Also fix CheckXLogRemoved to guarantee that it preserves errno,
as quite a number of call sites are implicitly assuming.  (Again,
it's unlikely but I think not impossible that errno could change
during a SpinLockAcquire.  If so, this function was broken for its
own purposes as well as breaking callers.)

And change a few places that were using not-per-project-style messages,
such as "could not read directory" when "could not open directory" is
more correct.

Back-patch the exporting of ReadDirExtended, in case we have occasion
to back-patch some fix that makes use of it; it's not needed right now
but surely making it global is pretty harmless.  Also back-patch the
restoreTwoPhaseData and CheckXLogRemoved fixes.  The rest of this is
essentially cosmetic and need not get back-patched.

Michael Paquier, with a bit of additional work by me

Discussion: https://postgr.es/m/CAB7nPqRpOCxjiirHmebEFhXVTK7V5Jvw4bz82p7Oimtsm3TyZA@mail.gmail.com
2017-12-04 17:02:56 -05:00
Peter Eisentraut
e4128ee767 SQL procedures
This adds a new object type "procedure" that is similar to a function
but does not have a return type and is invoked by the new CALL statement
instead of SELECT or similar.  This implementation is aligned with the
SQL standard and compatible with or similar to other SQL implementations.

This commit adds new commands CALL, CREATE/ALTER/DROP PROCEDURE, as well
as ALTER/DROP ROUTINE that can refer to either a function or a
procedure (or an aggregate function, as an extension to SQL).  There is
also support for procedures in various utility commands such as COMMENT
and GRANT, as well as support in pg_dump and psql.  Support for defining
procedures is available in all the languages supplied by the core
distribution.

While this commit is mainly syntax sugar around existing functionality,
future features will rely on having procedures as a separate object
type.

Reviewed-by: Andrew Dunstan <andrew.dunstan@2ndquadrant.com>
2017-11-30 11:03:20 -05:00
Tom Lane
687f096ea9 Make PL/Python handle domain-type conversions correctly.
Fix PL/Python so that it can handle domains over composite, and so that
it enforces domain constraints correctly in other cases that were not
always done properly before.  Notably, it didn't do arrays of domains
right (oversight in commit c12d570fa), and it failed to enforce domain
constraints when returning a composite type containing a domain field,
and if a transform function is being used for a domain's base type then
it failed to enforce domain constraints on the result.  Also, in many
places it missed checking domain constraints on null values, because
the plpy_typeio code simply wasn't called for Py_None.

Rather than try to band-aid these problems, I made a significant
refactoring of the plpy_typeio logic.  The existing design of recursing
for array and composite members is extended to also treat domains as
containers requiring recursion, and the APIs for the module are cleaned
up and simplified.

The patch also modifies plpy_typeio to rely on the typcache more than
it did before (which was pretty much not at all).  This reduces the
need for repetitive lookups, and lets us get rid of an ad-hoc scheme
for detecting changes in composite types.  I added a couple of small
features to typcache to help with that.

Although some of this is fixing bugs that long predate v11, I don't
think we should risk a back-patch: it's a significant amount of code
churn, and there've been no complaints from the field about the bugs.

Tom Lane, reviewed by Anthony Bykov

Discussion: https://postgr.es/m/24449.1509393613@sss.pgh.pa.us
2017-11-16 16:23:04 -05:00
Robert Haas
1aba8e651a Add hash partitioning.
Hash partitioning is useful when you want to partition a growing data
set evenly.  This can be useful to keep table sizes reasonable, which
makes maintenance operations such as VACUUM faster, or to enable
partition-wise join.

At present, we still depend on constraint exclusion for partitioning
pruning, and the shape of the partition constraints for hash
partitioning is such that that doesn't work.  Work is underway to fix
that, which should both improve performance and make partitioning
pruning work with hash partitioning.

Amul Sul, reviewed and tested by Dilip Kumar, Ashutosh Bapat, Yugo
Nagata, Rajkumar Raghuwanshi, Jesper Pedersen, and by me.  A few
final tweaks also by me.

Discussion: http://postgr.es/m/CAAJ_b96fhpJAP=ALbETmeLk1Uni_GFZD938zgenhF49qgDTjaQ@mail.gmail.com
2017-11-09 18:07:44 -05:00
Peter Eisentraut
2eb4a831e5 Change TRUE/FALSE to true/false
The lower case spellings are C and C++ standard and are used in most
parts of the PostgreSQL sources.  The upper case spellings are only used
in some files/modules.  So standardize on the standard spellings.

The APIs for ICU, Perl, and Windows define their own TRUE and FALSE, so
those are left as is when using those APIs.

In code comments, we use the lower-case spelling for the C concepts and
keep the upper-case spelling for the SQL concepts.

Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
2017-11-08 11:37:28 -05:00
Simon Riggs
98267ee83e Exclude pg_internal.init from BASE_BACKUP
Add docs to explain this for other backup mechanisms

Author: David Steele <david@pgmasters.net>
Reviewed-by: Petr Jelinek <petr.jelinek@2ndQuadrant.com> et al
2017-11-07 12:28:35 -05:00
Tom Lane
37a795a60b Support domains over composite types.
This is the last major omission in our domains feature: you can now
make a domain over anything that's not a pseudotype.

The major complication from an implementation standpoint is that places
that might be creating tuples of a domain type now need to be prepared
to apply domain_check().  It seems better that unprepared code fail
with an error like "<type> is not composite" than that it silently fail
to apply domain constraints.  Therefore, relevant infrastructure like
get_func_result_type() and lookup_rowtype_tupdesc() has been adjusted
to treat domain-over-composite as a distinct case that unprepared code
won't recognize, rather than just transparently treating it the same
as plain composite.  This isn't a 100% solution to the possibility of
overlooked domain checks, but it catches most places.

In passing, improve typcache.c's support for domains (it can now cache
the identity of a domain's base type), and rewrite the argument handling
logic in jsonfuncs.c's populate_record[set]_worker to reduce duplicative
per-call lookups.

I believe this is code-complete so far as the core and contrib code go.
The PLs need varying amounts of work, which will be tackled in followup
patches.

Discussion: https://postgr.es/m/4206.1499798337@sss.pgh.pa.us
2017-10-26 13:47:45 -04:00
Tom Lane
36ea99c84d Fix typcache's failure to treat ranges as container types.
Like the similar logic for arrays and records, it's necessary to examine
the range's subtype to decide whether the range type can support hashing.
We can omit checking the subtype for btree-defined operations, though,
since range subtypes are required to have those operations.  (Possibly
that simplification for btree cases led us to overlook that it does
not apply for hash cases.)

This is only an issue if the subtype lacks hash support, which is not
true of any built-in range type, but it's easy to demonstrate a problem
with a range type over, eg, money: you can get a "could not identify
a hash function" failure when the planner is misled into thinking that
hash join or aggregation would work.

This was born broken, so back-patch to all supported branches.
2017-10-20 17:12:27 -04:00
Tom Lane
a8f1efc8ac Fix misimplementation of typcache logic for extended hashing.
The previous coding would report that an array type supports extended
hashing if its element type supports regular hashing.  This bug is
only latent at the moment, since AFAICS there is not yet any code
that depends on checking presence of extended-hashing support to make
any decisions.  (And in any case it wouldn't matter unless the element
type has only regular hashing, which isn't true of any core data type.)
But that doesn't make it less broken.  Extend the
cache_array_element_properties infrastructure to check this properly.
2017-10-20 16:08:17 -04:00
Andres Freund
141fd1b66c Improve sys/catcache performance.
The following are the individual improvements:
1) Avoidance of FunctionCallInfo based function calls, replaced by
   more efficient functions with a native C argument interface.
2) Don't extract columns from a cache entry's tuple whenever matching
   entries - instead store them as a Datum array. This also allows to
   get rid of having to build dummy tuples for negative & list
   entries, and of a hack for dealing with cstring vs. text weirdness.
3) Reorder members of catcache.h struct, so imortant entries are more
   likely to be on one cacheline.
4) Allowing the compiler to specialize critical SearchCatCache for a
   specific number of attributes allows to unroll loops and avoid
   other nkeys dependant initialization.
5) Only initializing the ScanKey when necessary, i.e. catcache misses,
   greatly reduces cache unnecessary cpu cache misses.
6) Split of the cache-miss case from the hash lookup, reducing stack
   allocations etc in the common case.
7) CatCTup and their corresponding heaptuple are allocated in one
   piece.

This results in making cache lookups themselves roughly three times as
fast - full-system benchmarks obviously improve less than that.

I've also evaluated further techniques:
- replace open coded hash with simplehash - the list walk right now
  shows up in profiles. Unfortunately it's not easy to do so safely as
  an entry's memory location can change at various times, which
  doesn't work well with the refcounting and cache invalidation.
- Cacheline-aligning CatCTup entries - helps some with performance,
  but the win isn't big and the code for it is ugly, because the
  tuples have to be freed as well.
- add more proper functions, rather than macros for
  SearchSysCacheCopyN etc., but right now they don't show up in
  profiles.

The reason the macro wrapper for syscache.c/h have to be changed,
rather than just catcache, is that doing otherwise would require
exposing the SysCache array to the outside.  That might be a good idea
anyway, but it's for another day.

Author: Andres Freund
Reviewed-By: Robert Haas
Discussion: https://postgr.es/m/20170914061207.zxotvyopetm7lrrp@alap3.anarazel.de
2017-10-13 14:22:41 -07:00
Peter Eisentraut
0c5803b450 Refactor new file permission handling
The file handling functions from fd.c were called with a diverse mix of
notations for the file permissions when they were opening new files.
Almost all files created by the server should have the same permissions
set.  So change the API so that e.g. OpenTransientFile() automatically
uses the standard permissions set, and OpenTransientFilePerm() is a new
function that takes an explicit permissions set for the few cases where
it is needed.  This also saves an unnecessary argument for call sites
that are just opening an existing file.

While we're reviewing these APIs, get rid of the FileName typedef and
use the standard const char * for the file name and mode_t for the file
mode.  This makes these functions match other file handling functions
and removes an unnecessary layer of mysteriousness.  We can also get rid
of a few casts that way.

Author: David Steele <david@pgmasters.net>
2017-09-23 10:16:18 -04:00
Tom Lane
cad22075bc Fix bogus size calculation introduced by commit cc5f81366.
The elements of RecordCacheArray are TupleDesc, not TupleDesc *.
Those are actually the same size, so that this error is harmless,
but it's still wrong --- and it might bite us someday, if TupleDesc
ever became a struct, say.

Per Coverity.
2017-09-17 11:35:27 -04:00
Tom Lane
71aa4801a8 Get rid of shared_record_typmod_registry_worker_detach; it doesn't work.
This code is unsafe, as proven by buildfarm failures, because it tries
to access shared memory that might already be gone.  It's also unnecessary,
because we're about to exit the process anyway and so the record type cache
should never be accessed again.  The idea was to lay some foundations for
someday recycling workers --- which would require attaching to a different
shared tupdesc registry --- but that will require considerably more
thought.  In the meantime let's save some bytes by just removing the
nonfunctional code.

Problem identification, and proposal to fix by removing functionality
from the detach function, by Thomas Munro.  I went a bit further by
removing the function altogether.

Discussion: https://postgr.es/m/E1dsguX-00056N-9x@gemulon.postgresql.org
2017-09-15 10:52:30 -04:00
Tom Lane
eaa4070543 Don't use anonymous unions.
Commit cc5f81366c introduced a language
feature that is not acceptable to strict C89 compilers.

Thomas Munro

Per buildfarm.
2017-09-15 00:57:38 -04:00
Andres Freund
cc5f81366c Add support for coordinating record typmods among parallel workers.
Tuples can have type RECORDOID and a typmod number that identifies a blessed
TupleDesc in a backend-private cache.  To support the sharing of such tuples
through shared memory and temporary files, provide a typmod registry in
shared memory.

To achieve that, introduce per-session DSM segments, created on demand when a
backend first runs a parallel query.  The per-session DSM segment has a
table-of-contents just like the per-query DSM segment, and initially the
contents are a shared record typmod registry and a DSA area to provide the
space it needs to grow.

State relating to the current session is accessed via a Session object
reached through global variable CurrentSession that may require significant
redesign further down the road as we figure out what else needs to be shared
or remodelled.

Author: Thomas Munro
Reviewed-By: Andres Freund
Discussion: https://postgr.es/m/CAEepm=0ZtQ-SpsgCyzzYpsXS6e=kZWqk3g5Ygn3MDV7A8dabUA@mail.gmail.com
2017-09-14 19:59:21 -07:00
Peter Eisentraut
1356f78ea9 Reduce excessive dereferencing of function pointers
It is equivalent in ANSI C to write (*funcptr) () and funcptr().  These
two styles have been applied inconsistently.  After discussion, we'll
use the more verbose style for plain function pointer variables, to make
it clear that it's a variable, and the shorter style when the function
pointer is in a struct (s.func() or s->func()), because then it's clear
that it's not a plain function name, and otherwise the excessive
punctuation makes some of those invocations hard to read.

Discussion: https://www.postgresql.org/message-id/f52c16db-14ed-757d-4b48-7ef360b1631d@2ndquadrant.com
2017-09-07 13:56:09 -04:00
Simon Riggs
5b6d13eec7 Allow SET STATISTICS on expression indexes
Index columns are referenced by ordinal number rather than name, e.g.
CREATE INDEX coord_idx ON measured (x, y, (z + t));
ALTER INDEX coord_idx ALTER COLUMN 3 SET STATISTICS 1000;

Incompatibility note for release notes:
\d+ for indexes now also displays Stats Target

Authors: Alexander Korotkov, with contribution by Adrien NAYRAT
Review: Adrien NAYRAT, Simon Riggs
Wordsmith: Simon Riggs
2017-09-06 13:46:01 -07:00
Robert Haas
81c5e46c49 Introduce 64-bit hash functions with a 64-bit seed.
This will be useful for hash partitioning, which needs a way to seed
the hash functions to avoid problems such as a hash index on a hash
partitioned table clumping all values into a small portion of the
bucket space; it's also useful for anything that wants a 64-bit hash
value rather than a 32-bit hash value.

Just in case somebody wants a 64-bit hash value that is compatible
with the existing 32-bit hash values, make the low 32-bits of the
64-bit hash value match the 32-bit hash value when the seed is 0.

Robert Haas and Amul Sul

Discussion: http://postgr.es/m/CA+Tgmoafx2yoJuhCQQOL5CocEi-w_uG4S2xT0EtgiJnPGcHW3g@mail.gmail.com
2017-08-31 22:21:21 -04:00
Andres Freund
35ea75632a Refactor typcache.c's record typmod hash table.
Previously, tuple descriptors were stored in chains keyed by a fixed size
array of OIDs.  That meant there were effectively two levels of collision
chain -- one inside and one outside the hash table.  Instead, let dynahash.c
look after conflicts for us by supplying a proper hash and equal function
pair.

This is a nice cleanup on its own, but also simplifies followup
changes allowing blessed TupleDescs to be shared between backends
participating in parallel query.

Author: Thomas Munro
Reviewed-By: Andres Freund
Discussion: https://postgr.es/m/CAEepm%3D34GVhOL%2BarUx56yx7OPk7%3DqpGsv3CpO54feqjAwQKm5g%40mail.gmail.com
2017-08-22 16:11:54 -07:00
Andres Freund
2cd7084524 Change tupledesc->attrs[n] to TupleDescAttr(tupledesc, n).
This is a mechanical change in preparation for a later commit that
will change the layout of TupleDesc.  Introducing a macro to abstract
the details of where attributes are stored will allow us to change
that in separate step and revise it in future.

Author: Thomas Munro, editorialized by Andres Freund
Reviewed-By: Andres Freund
Discussion: https://postgr.es/m/CAEepm=0ZtQ-SpsgCyzzYpsXS6e=kZWqk3g5Ygn3MDV7A8dabUA@mail.gmail.com
2017-08-20 11:19:07 -07:00
Robert Haas
e139f1953f Assorted preparatory refactoring for partition-wise join.
Instead of duplicating the logic to search for a matching
ParamPathInfo in multiple places, factor it out into a separate
function.

Pass only the relevant bits of the PartitionKey to
partition_bounds_equal instead of the whole thing, because
partition-wise join will want to call this without having a
PartitionKey available.

Adjust allow_star_schema_join and calc_nestloop_required_outer
to take relevant Relids rather than the entire Path, because
partition-wise join will want to call it with the top-parent
relids to determine whether a child join is allowable.

Ashutosh Bapat.  Review and testing of the larger patch set of which
this is a part by Amit Langote, Rajkumar Raghuwanshi, Rafia Sabih,
Thomas Munro, Dilip Kumar, and me.

Discussion: http://postgr.es/m/CA+TgmobQK80vtXjAsPZWWXd7c8u13G86gmuLupN+uUJjA+i4nA@mail.gmail.com
2017-08-15 12:30:38 -04:00
Tom Lane
004a9702e0 Remove AtEOXact_CatCache().
The sole useful effect of this function, to check that no catcache
entries have positive refcounts at transaction end, has really been
obsolete since we introduced ResourceOwners in PG 8.1.  We reduced the
checks to assertions years ago, so that the function was a complete
no-op in production builds.  There have been previous discussions about
removing it entirely, but consensus up to now was that it had some small
value as a cross-check for bugs in the ResourceOwner logic.

However, it now emerges that it's possible to trigger these assertions
if you hit an assert-enabled backend with SIGTERM during a call to
SearchCatCacheList, because that function temporarily increases the
refcounts of entries it's intending to add to a catcache list construct.
In a normal ERROR scenario, the extra refcounts are cleaned up by
SearchCatCacheList's PG_CATCH block; but in a FATAL exit we do a
transaction abort and exit without ever executing PG_CATCH handlers.

There's a case to be made that this is a generic hazard and we should
consider restructuring elog(FATAL) handling so that pending PG_CATCH
handlers do get run.  That's pretty scary though: it could easily create
more problems than it solves.  Preliminary stress testing by Andreas
Seltenreich suggests that there are not many live problems of this ilk,
so we rejected that idea.

There are more-localized ways to fix the problem; the most principled
one would be to use PG_ENSURE_ERROR_CLEANUP instead of plain PG_TRY.
But adding cycles to SearchCatCacheList isn't very appealing.  We could
also weaken the assertions in AtEOXact_CatCache in some more or less
ad-hoc way, but that just makes its raison d'etre even less compelling.
In the end, the most reasonable solution seems to be to just remove
AtEOXact_CatCache altogether, on the grounds that it's not worth trying
to fix it.  It hasn't found any bugs for us in many years.

Per report from Jeevan Chalke.  Back-patch to all supported branches.

Discussion: https://postgr.es/m/CAM2+6=VEE30YtRQCZX7_sCFsEpoUkFBV1gZazL70fqLn8rcvBA@mail.gmail.com
2017-08-13 16:15:14 -04:00
Tom Lane
278cb43411 Be more consistent about errors for opfamily member lookup failures.
Add error checks in some places that were calling get_opfamily_member
or get_opfamily_proc and just assuming that the call could never fail.
Also, standardize the wording for such errors in some other places.

None of these errors are expected in normal use, hence they're just
elog not ereport.  But they may be handy for diagnosing omissions in
custom opclasses.

Rushabh Lathia found the oversight in RelationBuildPartitionKey();
I found the others by grepping for all callers of these functions.

Discussion: https://postgr.es/m/CAGPqQf2R9Nk8htpv0FFi+FP776EwMyGuORpc9zYkZKC8sFQE3g@mail.gmail.com
2017-07-24 11:23:27 -04:00
Tom Lane
382ceffdf7 Phase 3 of pgindent updates.
Don't move parenthesized lines to the left, even if that means they
flow past the right margin.

By default, BSD indent lines up statement continuation lines that are
within parentheses so that they start just to the right of the preceding
left parenthesis.  However, traditionally, if that resulted in the
continuation line extending to the right of the desired right margin,
then indent would push it left just far enough to not overrun the margin,
if it could do so without making the continuation line start to the left of
the current statement indent.  That makes for a weird mix of indentations
unless one has been completely rigid about never violating the 80-column
limit.

This behavior has been pretty universally panned by Postgres developers.
Hence, disable it with indent's new -lpl switch, so that parenthesized
lines are always lined up with the preceding left paren.

This patch is much less interesting than the first round of indent
changes, but also bulkier, so I thought it best to separate the effects.

Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
2017-06-21 15:35:54 -04:00
Tom Lane
c7b8998ebb Phase 2 of pgindent updates.
Change pg_bsd_indent to follow upstream rules for placement of comments
to the right of code, and remove pgindent hack that caused comments
following #endif to not obey the general rule.

Commit e3860ffa4d wasn't actually using
the published version of pg_bsd_indent, but a hacked-up version that
tried to minimize the amount of movement of comments to the right of
code.  The situation of interest is where such a comment has to be
moved to the right of its default placement at column 33 because there's
code there.  BSD indent has always moved right in units of tab stops
in such cases --- but in the previous incarnation, indent was working
in 8-space tab stops, while now it knows we use 4-space tabs.  So the
net result is that in about half the cases, such comments are placed
one tab stop left of before.  This is better all around: it leaves
more room on the line for comment text, and it means that in such
cases the comment uniformly starts at the next 4-space tab stop after
the code, rather than sometimes one and sometimes two tabs after.

Also, ensure that comments following #endif are indented the same
as comments following other preprocessor commands such as #else.
That inconsistency turns out to have been self-inflicted damage
from a poorly-thought-through post-indent "fixup" in pgindent.

This patch is much less interesting than the first round of indent
changes, but also bulkier, so I thought it best to separate the effects.

Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
2017-06-21 15:19:25 -04:00
Tom Lane
e3860ffa4d Initial pgindent run with pg_bsd_indent version 2.0.
The new indent version includes numerous fixes thanks to Piotr Stefaniak.
The main changes visible in this commit are:

* Nicer formatting of function-pointer declarations.
* No longer unexpectedly removes spaces in expressions using casts,
  sizeof, or offsetof.
* No longer wants to add a space in "struct structname *varname", as
  well as some similar cases for const- or volatile-qualified pointers.
* Declarations using PG_USED_FOR_ASSERTS_ONLY are formatted more nicely.
* Fixes bug where comments following declarations were sometimes placed
  with no space separating them from the code.
* Fixes some odd decisions for comments following case labels.
* Fixes some cases where comments following code were indented to less
  than the expected column 33.

On the less good side, it now tends to put more whitespace around typedef
names that are not listed in typedefs.list.  This might encourage us to
put more effort into typedef name collection; it's not really a bug in
indent itself.

There are more changes coming after this round, having to do with comment
indentation and alignment of lines appearing within parentheses.  I wanted
to limit the size of the diffs to something that could be reviewed without
one's eyes completely glazing over, so it seemed better to split up the
changes as much as practical.

Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
2017-06-21 14:39:04 -04:00
Robert Haas
3106829513 Use NIL rather than NULL to represent an empty list.
Just to be tidy.

Amit Langote

Discussion: http://postgr.es/m/9297f80f-e4ab-7dda-33d4-8580bab6d634@lab.ntt.co.jp
2017-06-06 11:21:22 -04:00
Magnus Hagander
acbd8375e9 Fix copy/paste mistake in comment
Amit Langote
2017-06-02 11:18:24 +02:00
Tom Lane
54e839fe29 Sort syscache identifiers into alphabetical order.
Not much point in having a convention about this if we don't enforce it.

Mark Dilger

Discussion: https://postgr.es/m/7F67FBEF-C3B3-404E-8EC6-E02ACB15D894@gmail.com
2017-05-30 18:47:13 -04:00
Bruce Momjian
a6fd7b7a5f Post-PG 10 beta1 pgindent run
perltidy run not included.
2017-05-17 16:31:56 -04:00
Tom Lane
f04c9a6146 Standardize terminology for pg_statistic_ext entries.
Consistently refer to such an entry as a "statistics object", not just
"statistics" or "extended statistics".  Previously we had a mismash of
terms, accompanied by utter confusion as to whether the term was
singular or plural.  That's not only grating (at least to the ear of
a native English speaker) but could be outright misleading, eg in error
messages that seemed to be referring to multiple objects where only one
could be meant.

This commit fixes the code and a lot of comments (though I may have
missed a few).  I also renamed two new SQL functions,
pg_get_statisticsextdef -> pg_get_statisticsobjdef
pg_statistic_ext_is_visible -> pg_statistics_obj_is_visible
to conform better with this terminology.

I have not touched the SGML docs other than fixing those function
names; the docs certainly need work but it seems like a separable task.

Discussion: https://postgr.es/m/22676.1494557205@sss.pgh.pa.us
2017-05-14 10:55:01 -04:00
Tom Lane
9aab83fc50 Redesign get_attstatsslot()/free_attstatsslot() for more safety and speed.
The mess cleaned up in commit da0759600 is clear evidence that it's a
bug hazard to expect the caller of get_attstatsslot()/free_attstatsslot()
to provide the correct type OID for the array elements in the slot.
Moreover, we weren't even getting any performance benefit from that,
since get_attstatsslot() was extracting the real type OID from the array
anyway.  So we ought to get rid of that requirement; indeed, it would
make more sense for get_attstatsslot() to pass back the type OID it found,
in case the caller isn't sure what to expect, which is likely in binary-
compatible-operator cases.

Another problem with the current implementation is that if the stats array
element type is pass-by-reference, we incur a palloc/memcpy/pfree cycle
for each element.  That seemed acceptable when the code was written because
we were targeting O(10) array sizes --- but these days, stats arrays are
almost always bigger than that, sometimes much bigger.  We can save a
significant number of cycles by doing one palloc/memcpy/pfree of the whole
array.  Indeed, in the now-probably-common case where the array is toasted,
that happens anyway so this method is basically free.  (Note: although the
catcache code will inline any out-of-line toasted values, it doesn't
decompress them.  At the other end of the size range, it doesn't expand
short-header datums either.  In either case, DatumGetArrayTypeP would have
to make a copy.  We do end up using an extra array copy step if the element
type is pass-by-value and the array length is neither small enough for a
short header nor large enough to have suffered compression.  But that
seems like a very acceptable price for winning in pass-by-ref cases.)

Hence, redesign to take these insights into account.  While at it,
convert to an API in which we fill a struct rather than passing a bunch
of pointers to individual output arguments.  That will make it less
painful if we ever want further expansion of what get_attstatsslot can
pass back.

It's certainly arguable that this is new development and not something to
push post-feature-freeze.  However, I view it as primarily bug-proofing
and therefore something that's better to have sooner not later.  Since
we aren't quite at beta phase yet, let's put it in.

Discussion: https://postgr.es/m/16364.1494520862@sss.pgh.pa.us
2017-05-13 15:14:39 -04:00
Alvaro Herrera
d99d58cdc8 Complete tab completion for DROP STATISTICS
Tab-completing DROP STATISTICS would only work if you started writing
the schema name containing the statistics object, because the visibility
clause was missing.  To add it, we need to add SQL-callable support for
testing visibility of a statistics object, like all other object types
already have.

Discussion: https://postgr.es/m/22676.1494557205@sss.pgh.pa.us
2017-05-13 01:05:48 -03:00
Tom Lane
2df5d46555 Avoid searching for callback functions in CallSyscacheCallbacks().
We have now grown enough registerable syscache-invalidation callback
functions that the original assumption that there would be few of them
is causing performance problems.  In particular, let's fix things so that
CallSyscacheCallbacks doesn't have to search the whole array to find
which callback(s) to invoke for a given cache ID.  Preserve the original
behavior that callbacks are called in order of registration, just in
case there's someplace that depends on that (which I doubt).

In support of this, export the number of syscaches from syscache.h.
People could have found that out anyway from the enum, but adding a
#define makes that much safer.

This provides a useful additional speedup in Mathieu Fenniak's
logical-decoding test case, although we're reaching the point of
diminishing returns there.  I think any further improvement will have
to come from reducing the number of cache invalidations that are
triggered in the first place.  Still, we can hope that this change
gives some incremental benefit for all invalidation scenarios.

Back-patch to 9.4 where logical decoding was introduced.

Discussion: https://postgr.es/m/CAHoiPjzea6N0zuCi=+f9v_j94nfsy6y8SU7-=bp4=7qw6_i=Rg@mail.gmail.com
2017-05-12 19:05:27 -04:00
Tom Lane
8085a4f751 Reduce initial size of RelfilenodeMapHash.
A test case provided by Mathieu Fenniak shows that hash_seq_search'ing
this hashtable can consume a very significant amount of overhead during
logical decoding, which triggers frequent cache invalidation.  Testing
suggests that the actual population of the hashtable is often no more
than a few dozen entries, so we can cut the overhead just by dropping
the initial number of buckets down from 1024 --- I chose to cut it to 64.
(In situations where we do have a significant number of entries, we
shouldn't get any real penalty from doing this, as the dynahash.c code
will resize the hashtable automatically.)

This gives a further factor-of-two savings in Mathieu's test case.
That may be overly optimistic for real-world benefit, as real cases
may have larger average table populations, but it's hard to see it
turning into a net negative for any workload.

Back-patch to 9.4 where relfilenodemap.c was introduced.

Discussion: https://postgr.es/m/CAHoiPjzea6N0zuCi=+f9v_j94nfsy6y8SU7-=bp4=7qw6_i=Rg@mail.gmail.com
2017-05-12 18:30:17 -04:00
Tom Lane
50ee1c7462 Avoid searching for the target catcache in CatalogCacheIdInvalidate.
A test case provided by Mathieu Fenniak shows that the initial search for
the target catcache in CatalogCacheIdInvalidate consumes a very significant
amount of overhead in cases where cache invalidation is triggered but has
little useful work to do.  There is no good reason for that search to exist
at all, as the index array maintained by syscache.c allows direct lookup of
the catcache from its ID.  We just need a frontend function in syscache.c,
matching the division of labor for most other cache-accessing operations.

While there's more that can be done in this area, this patch alone reduces
the runtime of Mathieu's example by 2X.  We can hope that it offers some
useful benefit in other cases too, although usually cache invalidation
overhead is not such a striking fraction of the total runtime.

Back-patch to 9.4 where logical decoding was introduced.  It might be
worth going further back, but presently the only case we know of where
cache invalidation is really a significant burden is in logical decoding.
Also, older branches have fewer catcaches, reducing the possible benefit.

(Note: although this nominally changes catcache's API, we have always
documented CatalogCacheIdInvalidate as a private function, so I would
have little sympathy for an external module calling it directly.  So
backpatching should be fine.)

Discussion: https://postgr.es/m/CAHoiPjzea6N0zuCi=+f9v_j94nfsy6y8SU7-=bp4=7qw6_i=Rg@mail.gmail.com
2017-05-12 18:17:29 -04:00
Tom Lane
596a7c8df7 Increase MAX_SYSCACHE_CALLBACKS to provide more room for extensions.
Increase from the historical value of 32 to 64.  We are up to 31 callers
of CacheRegisterSyscacheCallback() in HEAD, so if they were all to be
exercised in one process that would leave only one slot for add-on modules.
It's probably not possible for that to happen, but still we clearly need
more daylight here.  (At some point it might be worth making the array
dynamically resizable; but since we've never heard a complaint of "out of
syscache_callback_list slots" happening in the field, I doubt it's worth
it yet.)

Back-patch as far as 9.4, which is where we increased the companion limit
MAX_RELCACHE_CALLBACKS (cf commit f01d1ae3a).  It's not as urgent in
released branches, which have only a couple dozen call sites in core, but
it still seems that somebody might hit the limit before these branches die.

Discussion: https://postgr.es/m/12184.1494450131@sss.pgh.pa.us
2017-05-11 14:51:21 -04:00
Robert Haas
622c82279d Avoid theoretical infinite loop loading relcache partition key.
Amit Langote, per report from 甄明洋

Discussion: http://postgr.es/m/57bd1e1.1886.15bd7b79cee.Coremail.18612389267@yeah.net
2017-05-09 23:53:35 -04:00
Tom Lane
da07596006 Further patch rangetypes_selfuncs.c's statistics slot management.
Values in a STATISTIC_KIND_RANGE_LENGTH_HISTOGRAM slot are float8,
not of the type of the column the statistics are for.

This bug is at least partly the fault of sloppy specification comments
for get_attstatsslot()/free_attstatsslot(): the type OID they want is that
of the stavalues entries, not of the underlying column.  (I double-checked
other callers and they seem to get this right.)  Adjust the comments to be
more correct.

Per buildfarm.

Security: CVE-2017-7484
2017-05-08 15:03:14 -04:00
Robert Haas
5e1ccd4844 In load_relcache_init_file, initialize rd_pdcxt.
Oversight noted by Gao Zeng Qi.

Discussion: http://postgr.es/m/CAFmBtr1N3-SbepJbnGpaYp=jw-FvWMnYY7-bTtRgvjvbyB8YJA@mail.gmail.com
2017-04-28 14:05:13 -04:00
Alvaro Herrera
ee6922112e Rename columns in new pg_statistic_ext catalog
The new catalog reused a column prefix "sta" from pg_statistic, but this
is undesirable, so change the catalog to use prefix "stx" instead.
Also, rename the column that lists enabled statistic kinds as "stxkind"
rather than "enabled".

Discussion: https://postgr.es/m/CAKJS1f_2t5jhSN7huYRFH3w3rrHfG2QU7hiUHsu-Vdjd1rYT3w@mail.gmail.com
2017-04-17 18:34:29 -03:00
Peter Eisentraut
6275f5d28a Fix new warnings from GCC 7
This addresses the new warning types -Wformat-truncation
-Wformat-overflow that are part of -Wall, via -Wformat, in GCC 7.
2017-04-17 13:59:46 -04:00
Simon Riggs
2c2ecddcff Mention pg_index changes also cause relcache invalidation
Amit Langote, additional line by me
2017-04-13 10:07:21 +01:00