Commit Graph

440 Commits

Author SHA1 Message Date
Bruce Momjian 8a94332478 Update typedefs list in prep. for post-PG10 beta1 pgindent run 2017-05-17 15:52:16 -04:00
Bruce Momjian df238b43d7 Add download URL for perltidy version v20090616 2017-05-17 15:29:37 -04:00
Bruce Momjian c4c493fd35 pgindent: use HTTP instead of FTP to retrieve pg_bsd_indent src
FTP support will be removed from ftp.postgresql.org in months, but http
still works.  Typedefs already used http.
2017-05-09 09:28:44 -04:00
Peter Eisentraut facde2a98f Clean up Perl code according to perlcritic
Fix all perlcritic warnings of severity level 5, except in
src/backend/utils/Gen_dummy_probes.pl, which is automatically generated.

Reviewed-by: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
2017-03-27 08:18:22 -04:00
Andres Freund b8d7f053c5 Faster expression evaluation and targetlist projection.
This replaces the old, recursive tree-walk based evaluation, with
non-recursive, opcode dispatch based, expression evaluation.
Projection is now implemented as part of expression evaluation.

This both leads to significant performance improvements, and makes
future just-in-time compilation of expressions easier.

The speed gains primarily come from:
- non-recursive implementation reduces stack usage / overhead
- simple sub-expressions are implemented with a single jump, without
  function calls
- sharing some state between different sub-expressions
- reduced amount of indirect/hard to predict memory accesses by laying
  out operation metadata sequentially; including the avoidance of
  nearly all of the previously used linked lists
- more code has been moved to expression initialization, avoiding
  constant re-checks at evaluation time

Future just-in-time compilation (JIT) has become easier, as
demonstrated by released patches intended to be merged in a later
release, for primarily two reasons: Firstly, due to a stricter split
between expression initialization and evaluation, less code has to be
handled by the JIT. Secondly, due to the non-recursive nature of the
generated "instructions", less performance-critical code-paths can
easily be shared between interpreted and compiled evaluation.

The new framework allows for significant future optimizations. E.g.:
- basic infrastructure for to later reduce the per executor-startup
  overhead of expression evaluation, by caching state in prepared
  statements.  That'd be helpful in OLTPish scenarios where
  initialization overhead is measurable.
- optimizing the generated "code". A number of proposals for potential
  work has already been made.
- optimizing the interpreter. Similarly a number of proposals have
  been made here too.

The move of logic into the expression initialization step leads to some
backward-incompatible changes:
- Function permission checks are now done during expression
  initialization, whereas previously they were done during
  execution. In edge cases this can lead to errors being raised that
  previously wouldn't have been, e.g. a NULL array being coerced to a
  different array type previously didn't perform checks.
- The set of domain constraints to be checked, is now evaluated once
  during expression initialization, previously it was re-built
  every time a domain check was evaluated. For normal queries this
  doesn't change much, but e.g. for plpgsql functions, which caches
  ExprStates, the old set could stick around longer.  The behavior
  around might still change.

Author: Andres Freund, with significant changes by Tom Lane,
	changes by Heikki Linnakangas
Reviewed-By: Tom Lane, Heikki Linnakangas
Discussion: https://postgr.es/m/20161206034955.bh33paeralxbtluv@alap3.anarazel.de
2017-03-25 14:52:06 -07:00
Andres Freund 3717dc149e Add amcheck extension to contrib.
This is the beginning of a collection of SQL-callable functions to
verify the integrity of data files.  For now it only contains code to
verify B-Tree indexes.

This adds two SQL-callable functions, validating B-Tree consistency to
a varying degree.  Check the, extensive, docs for details.

The goal is to later extend the coverage of the module to further
access methods, possibly including the heap.  Once checks for
additional access methods exist, we'll likely add some "dispatch"
functions that cover multiple access methods.

Author: Peter Geoghegan, editorialized by Andres Freund
Reviewed-By: Andres Freund, Tomas Vondra, Thomas Munro,
   Anastasia Lubennikova, Robert Haas, Amit Langote
Discussion: CAM3SWZQzLMhMwmBqjzK+pRKXrNUZ4w90wYMUWfkeV8mZ3Debvw@mail.gmail.com
2017-03-09 16:33:02 -08:00
Robert Haas 355d3993c5 Add a Gather Merge executor node.
Like Gather, we spawn multiple workers and run the same plan in each
one; however, Gather Merge is used when each worker produces the same
output ordering and we want to preserve that output ordering while
merging together the streams of tuples from various workers.  (In a
way, Gather Merge is like a hybrid of Gather and MergeAppend.)

This works out to a win if it saves us from having to perform an
expensive Sort.  In cases where only a small amount of data would need
to be sorted, it may actually be faster to use a regular Gather node
and then sort the results afterward, because Gather Merge sometimes
needs to wait synchronously for tuples whereas a pure Gather generally
doesn't.  But if this avoids an expensive sort then it's a win.

Rushabh Lathia, reviewed and tested by Amit Kapila, Thomas Munro,
and Neha Sharma, and reviewed and revised by me.

Discussion: http://postgr.es/m/CAGPqQf09oPX-cQRpBKS0Gq49Z+m6KBxgxd_p9gX8CKk_d75HoQ@mail.gmail.com
2017-03-09 07:49:29 -05:00
Peter Eisentraut 550214a4ef Add operator_with_argtypes grammar rule
This makes the handling of operators similar to that of functions and
aggregates.

Rename node FuncWithArgs to ObjectWithArgs, to reflect the expanded use.

Reviewed-by: Jim Nasby <Jim.Nasby@BlueTreble.com>
Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
2017-03-06 13:31:47 -05:00
Andres Freund 7e3aa03b41 Reduce size of common allocation header.
The new slab allocator needs different per-allocation information than
the classical aset.c.  The definition in 58b25e981 wasn't sufficiently
careful on 32 platforms with 8 byte alignment, leading to buildfarm
failures.  That's not entirely easy to fix by just adjusting the
definition.

As slab.c doesn't actually need the size part(s) of the common header,
all chunks are equally sized after all, it seems better to instead
reduce the header to the part needed by all allocators, namely which
context an allocation belongs to. That has the advantage of reducing
the overhead of slab allocations, and also allows for more flexibility
in future allocators.

To avoid spreading the logic about accessing a chunk's context around,
centralize it in GetMemoryChunkContext(), which allows to delete a
good number of lines.

A followup commit will revise the mmgr/README portion about
StandardChunkHeader, and more.

Author: Andres Freund
Discussion: https://postgr.es/m/20170228074420.aazv4iw6k562mnxg@alap3.anarazel.de
2017-02-28 19:42:44 -08:00
Andres Freund 58b25e9810 Add "Slab" MemoryContext implementation for efficient equal-sized allocations.
The default general purpose aset.c style memory context is not a great
choice for allocations that are all going to be evenly sized,
especially when those objects aren't small, and have varying
lifetimes.  There tends to be a lot of fragmentation, larger
allocations always directly go to libc rather than have their cost
amortized over several pallocs.

These problems lead to the introduction of ad-hoc slab allocators in
reorderbuffer.c. But it turns out that the simplistic implementation
leads to problems when a lot of objects are allocated and freed, as
aset.c is still the underlying implementation. Especially freeing can
easily run into O(n^2) behavior in aset.c.

While the O(n^2) behavior in aset.c can, and probably will, be
addressed, custom allocators for this behavior are more efficient
both in space and time.

This allocator is for evenly sized allocations, and supports both
cheap allocations and freeing, without fragmenting significantly.  It
does so by allocating evenly sized blocks via malloc(), and carves
them into chunks that can be used for allocations.  In order to
release blocks to the OS as early as possible, chunks are allocated
from the fullest block that still has free objects, increasing the
likelihood of a block being entirely unused.

A subsequent commit uses this in reorderbuffer.c, but a further
allocator is needed to resolve the performance problems triggering
this work.

There likely are further potentialy uses of this allocator besides
reorderbuffer.c.

There's potential further optimizations of the new slab.c, in
particular the array of freelists could be replaced by a more
intelligent structure - but for now this looks more than good enough.

Author: Tomas Vondra, editorialized by Andres Freund
Reviewed-By: Andres Freund, Petr Jelinek, Robert Haas, Jim Nasby
Discussion: https://postgr.es/m/d15dff83-0b37-28ed-0809-95a5cc7292ad@2ndquadrant.com
2017-02-27 03:41:44 -08:00
Robert Haas 569174f1be btree: Support parallel index scans.
This isn't exposed to the optimizer or the executor yet; we'll add
support for those things in a separate patch.  But this puts the
basic mechanism in place: several processes can attach to a parallel
btree index scan, and each one will get a subset of the tuples that
would have been produced by a non-parallel scan.  Each index page
becomes the responsibility of a single worker, which then returns
all of the TIDs on that page.

Rahila Syed, Amit Kapila, Robert Haas, reviewed and tested by
Anastasia Lubennikova, Tushar Ahuja, and Haribabu Kommi.
2017-02-15 07:41:14 -05:00
Robert Haas 7b4ac19982 Extend index AM API for parallel index scans.
This patch doesn't actually make any index AM parallel-aware, but it
provides the necessary functions at the AM layer to do so.

Rahila Syed, Amit Kapila, Robert Haas
2017-01-24 16:42:58 -05:00
Robert Haas acddbe221b Update typedefs.list
So developers can more easily run pgindent locally
2016-12-13 10:51:32 -05:00
Robert Haas f0e44751d7 Implement table partitioning.
Table partitioning is like table inheritance and reuses much of the
existing infrastructure, but there are some important differences.
The parent is called a partitioned table and is always empty; it may
not have indexes or non-inherited constraints, since those make no
sense for a relation with no data of its own.  The children are called
partitions and contain all of the actual data.  Each partition has an
implicit partitioning constraint.  Multiple inheritance is not
allowed, and partitioning and inheritance can't be mixed.  Partitions
can't have extra columns and may not allow nulls unless the parent
does.  Tuples inserted into the parent are automatically routed to the
correct partition, so tuple-routing ON INSERT triggers are not needed.
Tuple routing isn't yet supported for partitions which are foreign
tables, and it doesn't handle updates that cross partition boundaries.

Currently, tables can be range-partitioned or list-partitioned.  List
partitioning is limited to a single column, but range partitioning can
involve multiple columns.  A partitioning "column" can be an
expression.

Because table partitioning is less general than table inheritance, it
is hoped that it will be easier to reason about properties of
partitions, and therefore that this will serve as a better foundation
for a variety of possible optimizations, including query planner
optimizations.  The tuple routing based which this patch does based on
the implicit partitioning constraints is an example of this, but it
seems likely that many other useful optimizations are also possible.

Amit Langote, reviewed and tested by Robert Haas, Ashutosh Bapat,
Amit Kapila, Rajkumar Raghuwanshi, Corey Huinker, Jaime Casanova,
Rushabh Lathia, Erik Rijkers, among others.  Minor revisions by me.
2016-12-07 13:17:55 -05:00
Robert Haas 13df76a537 Introduce dynamic shared memory areas.
Programmers discovered decades ago that it was useful to have a simple
interface for allocating and freeing memory, which is why malloc() and
free() were invented.  Unfortunately, those handy tools don't work
with dynamic shared memory segments because those are specific to
PostgreSQL and are not necessarily mapped at the same address in every
cooperating process.  So invent our own allocator instead.  This makes
it possible for processes cooperating as part of parallel query
execution to allocate and free chunks of memory without having to
reserve them prior to the start of execution.  It could also be used
for longer lived objects; for example, we could consider storing data
for pg_stat_statements or the stats collector in shared memory using
these interfaces, rather than writing them to files.  Basically,
anything that needs shared memory but can't predict in advance how
much it's going to need might find this useful.

Thomas Munro and Robert Haas.  The original code (of mine) on which
Thomas based his work was actually designed to be a new backend-local
memory allocator for PostgreSQL, but that hasn't gone anywhere - or
not yet, anyway.  Thomas took that work and performed major
refactoring and extensive modifications to make it work with dynamic
shared memory, including the addition of appropriate locking.

Discussion: CA+TgmobkeWptGwiNa+SGFWsTLzTzD-CeLz0KcE-y6LFgoUus4A@mail.gmail.com
Discussion: CAEepm=1z5WLuNoJ80PaCvz6EtG9dN0j-KuHcHtU6QEfcPP5-qA@mail.gmail.com
2016-12-02 12:34:36 -05:00
Robert Haas 13e14a78ea Management of free memory pages.
This is intended as infrastructure for a full-fledged allocator for
dynamic shared memory.  The interface looks a bit like a real
allocator, but only supports allocating and freeing memory in
multiples of the 4kB page size.  Further, to free memory, you must
know the size of the span you wish to free, in pages.  While these are
make it unsuitable as an allocator in and of itself, it still serves
as very useful scaffolding for a full-fledged allocator.

Robert Haas and Thomas Munro.  This code is mostly the same as my 2014
submission, but Thomas fixed quite a few bugs and made some changes to
the interface.

Discussion: CA+TgmobkeWptGwiNa+SGFWsTLzTzD-CeLz0KcE-y6LFgoUus4A@mail.gmail.com
Discussion: CAEepm=1z5WLuNoJ80PaCvz6EtG9dN0j-KuHcHtU6QEfcPP5-qA@mail.gmail.com
2016-12-02 12:03:30 -05:00
Andres Freund 5dfc198146 Use more efficient hashtable for execGrouping.c to speed up hash aggregation.
The more efficient hashtable speeds up hash-aggregations with more than
a few hundred groups significantly. Improvements of over 120% have been
measured.

Due to the the different hash table queries that not fully
determined (e.g. GROUP BY without ORDER BY) may change their result
order.

The conversion is largely straight-forward, except that, due to the
static element types of simplehash.h type hashes, the additional data
some users store in elements (e.g. the per-group working data for hash
aggregaters) is now stored in TupleHashEntryData->additional.  The
meaning of BuildTupleHashTable's entrysize (renamed to additionalsize)
has been changed to only be about the additionally stored size.  That
size is only used for the initial sizing of the hash-table.

Reviewed-By: Tomas Vondra
Discussion: <20160727004333.r3e2k2y6fvk2ntup@alap3.anarazel.de>
2016-10-14 17:22:51 -07:00
Andres Freund b30d3ea824 Add a macro templatized hashtable.
dynahash.c hash tables aren't quite fast enough for some
use-cases. There are several reasons for lacking performance:
- the use of chaining for collision handling makes them cache
  inefficient, that's especially an issue when the tables get bigger.
- as the element sizes for dynahash are only determined at runtime,
  offset computations are somewhat expensive
- hash and element comparisons are indirect function calls, causing
  unnecessary pipeline stalls
- it's two level structure has some benefits (somewhat natural
  partitioning), but increases the number of indirections
to fix several of these the hash tables have to be adjusted to the
individual use-case at compile-time. C unfortunately doesn't provide a
good way to do compile code generation (like e.g. c++'s templates for
all their weaknesses do).  Thus the somewhat ugly approach taken here is
to allow for code generation using a macro-templatized header file,
which generates functions and types based on a prefix and other
parameters.

Later patches use this infrastructure to use such hash tables for
tidbitmap.c (bitmap scans) and execGrouping.c (hash aggregation,
...). In queries where these use up a large fraction of the time, this
has been measured to lead to performance improvements of over 100%.

There are other cases where this could be useful (e.g. catcache.c).

The hash table design chosen is a variant of linear open-addressing. The
biggest disadvantage of simple linear addressing schemes are highly
variable lookup times due to clustering, and deletions leaving a lot of
tombstones around.  To address these issues a variant of "robin hood"
hashing is employed.  Robin hood hashing optimizes chaining lengths by
moving elements close to their optimal bucket ("rich" elements), out of
the way if a to-be-inserted element is further away from its optimal
position (i.e. it's "poor").  While that can make insertions slower, the
average lookup performance is a lot better, and higher fill factors can
be used in a still performant manner.  To avoid tombstones - which
normally solve the issue that a deleted node's presence is relevant to
determine whether a lookup needs to continue looking or is done -
buckets following a deleted element are shifted backwards, unless
they're empty or already at their optimal position.

There's further possible improvements that can be made to this
implementation. Amongst others:
- Use distance as a termination criteria during searches. This is
  generally a good idea, but I've been able to see the overhead of
  distance calculations in some cases.
- Consider combining the 'empty' status into the hashvalue, and enforce
  storing the hashvalue. That could, in some cases, increase memory
  density and remove a few instructions.
- Experiment further with the, very conservatively choosen, fillfactor.
- Make maximum size of hashtable configurable, to allow storing very
  very large tables. That'd require 64bit hash values to be more common
  than now, though.
- some smaller memcpy calls could be optimized to copy larger chunks
But since the new implementation is already considerably faster than
dynahash it seem sensible to start using it.

Reviewed-By: Tomas Vondra
Discussion: <20160727004333.r3e2k2y6fvk2ntup@alap3.anarazel.de>
2016-10-14 16:07:38 -07:00
Robert Haas b25b6c9701 Once again allow LWLocks to be used within DSM segments.
Prior to commit 7882c3b0b9, it was
possible to use LWLocks within DSM segments, but that commit broke
this use case by switching from a doubly linked list to a circular
linked list.  Switch back, using a new bit of general infrastructure
for maintaining lists of PGPROCs.

Thomas Munro, reviewed by me.
2016-08-15 18:09:55 -04:00
Tom Lane b5bce6c1ec Final pgindent + perltidy run for 9.6. 2016-08-15 13:42:51 -04:00
Tom Lane 05d8dec690 Simplify the process of perltidy'ing our Perl files.
Wrap the perltidy invocation into a shell script to reduce the risk of
copy-and-paste errors.  Include removal of *.bak files in the script,
so they don't accidentally get committed.  Improve the directions in
the README file.
2016-08-15 11:32:09 -04:00
Robert Haas e472ce9624 Add integrity-checking functions to pg_visibility.
The new pg_check_visible() and pg_check_frozen() functions can be used to
verify that the visibility map bits for a relation's data pages match the
actual state of the tuples on those pages.

Amit Kapila and Robert Haas, reviewed (in earlier versions) by Andres
Freund.  Additional testing help by Thomas Munro.
2016-06-15 14:33:58 -04:00
Noah Misch b098abf905 Document the authoritative version of perltidy.
Every whole-tree perltidy run has used this version, firmly establishing
it as the de facto standard.
2016-06-12 04:19:44 -04:00
Robert Haas e7bcd983f5 Yet again update typedefs.list file in preparation for pgindent run
Because the run was delayed, the file had a chance to get out of date.
2016-06-09 12:16:17 -04:00
Robert Haas f2f5e7e78e Again update typedefs.list file in preparation for pgindent run
This time, use the buildfarm-supplied contents for this file, instead
of trying to update it by eyeballing the pgindent output.

Per discussion with Tom and Bruce.
2016-05-02 09:23:55 -04:00
Robert Haas acb51bd71d Update typedefs.list file in preparation for pgindent run
In addition to adding new typedefs, I also re-sorted the file so that
various entries add piecemeal, mostly or entirely by me, were alphabetized
the same way as other entries in the file.
2016-04-27 11:50:34 -04:00
Kevin Grittner 80647bf65a Make oldSnapshotControl a pointer to a volatile structure
It was incorrectly declared as a volatile pointer to a non-volatile
structure.  Eliminate the OldSnapshotControl struct definition; it
is really not needed.  Pointed out by Tom Lane.

While at it, add OldSnapshotControlData to pgindent's list of
structures.
2016-04-11 15:43:52 -05:00
Andres Freund 48354581a4 Allow Pin/UnpinBuffer to operate in a lockfree manner.
Pinning/Unpinning a buffer is a very frequent operation; especially in
read-mostly cache resident workloads. Benchmarking shows that in various
scenarios the spinlock protecting a buffer header's state becomes a
significant bottleneck. The problem can be reproduced with pgbench -S on
larger machines, but can be considerably worse for queries which touch
the same buffers over and over at a high frequency (e.g. nested loops
over a small inner table).

To allow atomic operations to be used, cram BufferDesc's flags,
usage_count, buf_hdr_lock, refcount into a single 32bit atomic variable;
that allows to manipulate them together using 32bit compare-and-swap
operations. This requires reducing MAX_BACKENDS to 2^18-1 (which could
be lifted by using a 64bit field, but it's not a realistic configuration
atm).

As not all operations can easily implemented in a lockfree manner,
implement the previous buf_hdr_lock via a flag bit in the atomic
variable. That way we can continue to lock the header in places where
it's needed, but can get away without acquiring it in the more frequent
hot-paths.  There's some additional operations which can be done without
the lock, but aren't in this patch; but the most important places are
covered.

As bufmgr.c now essentially re-implements spinlocks, abstract the delay
logic from s_lock.c into something more generic. It now has already two
users, and more are coming up; there's a follupw patch for lwlock.c at
least.

This patch is based on a proof-of-concept written by me, which Alexander
Korotkov made into a fully working patch; the committed version is again
revised by me.  Benchmarking and testing has, amongst others, been
provided by Dilip Kumar, Alexander Korotkov, Robert Haas.

On a large x86 system improvements for readonly pgbench, with a high
client count, of a factor of 8 have been observed.

Author: Alexander Korotkov and Andres Freund
Discussion: 2400449.GjM57CE0Yg@dinodell
2016-04-10 20:12:32 -07:00
Alvaro Herrera f2fcad27d5 Support ALTER THING .. DEPENDS ON EXTENSION
This introduces a new dependency type which marks an object as depending
on an extension, such that if the extension is dropped, the object
automatically goes away; and also, if the database is dumped, the object
is included in the dump output.  Currently the grammar supports this for
indexes, triggers, materialized views and functions only, although the
utility code is generic so adding support for more object types is a
matter of touching the parser rules only.

Author: Abhijit Menon-Sen
Reviewed-by: Alexander Korotkov, Álvaro Herrera
Discussion: http://www.postgresql.org/message-id/20160115062649.GA5068@toroid.org
2016-04-05 18:38:54 -03:00
Andres Freund 98a64d0bd7 Introduce WaitEventSet API.
Commit ac1d794 ("Make idle backends exit if the postmaster dies.")
introduced a regression on, at least, large linux systems. Constantly
adding the same postmaster_alive_fds to the OSs internal datastructures
for implementing poll/select can cause significant contention; leading
to a performance regression of nearly 3x in one example.

This can be avoided by using e.g. linux' epoll, which avoids having to
add/remove file descriptors to the wait datastructures at a high rate.
Unfortunately the current latch interface makes it hard to allocate any
persistent per-backend resources.

Replace, with a backward compatibility layer, WaitLatchOrSocket with a
new WaitEventSet API. Users can allocate such a Set across multiple
calls, and add more than one file-descriptor to wait on. The latter has
been added because there's upcoming postgres features where that will be
helpful.

In addition to the previously existing poll(2), select(2),
WaitForMultipleObjects() implementations also provide an epoll_wait(2)
based implementation to address the aforementioned performance
problem. Epoll is only available on linux, but that is the most likely
OS for machines large enough (four sockets) to reproduce the problem.

To actually address the aforementioned regression, create and use a
long-lived WaitEventSet for FE/BE communication.  There are additional
places that would benefit from a long-lived set, but that's a task for
another day.

Thanks to Amit Kapila, who helped make the windows code I blindly wrote
actually work.

Reported-By: Dmitry Vasilyev Discussion:
CAB-SwXZh44_2ybvS5Z67p_CDz=XFn4hNAD=CnMEF+QqkXwFrGg@mail.gmail.com
20160114143931.GG10941@awork2.anarazel.de
2016-03-21 12:22:54 +01:00
Andres Freund 9cd00c457e Checkpoint sorting and balancing.
Up to now checkpoints were written in the order they're in the
BufferDescriptors. That's nearly random in a lot of cases, which
performs badly on rotating media, but even on SSDs it causes slowdowns.

To avoid that, sort checkpoints before writing them out. We currently
sort by tablespace, relfilenode, fork and block number.

One of the major reasons that previously wasn't done, was fear of
imbalance between tablespaces. To address that balance writes between
tablespaces.

The other prime concern was that the relatively large allocation to sort
the buffers in might fail, preventing checkpoints from happening. Thus
pre-allocate the required memory in shared memory, at server startup.

This particularly makes it more efficient to have checkpoint flushing
enabled, because that'll often result in a lot of writes that can be
coalesced into one flush.

Discussion: alpine.DEB.2.10.1506011320000.28433@sto
Author: Fabien Coelho and Andres Freund
2016-03-10 17:05:09 -08:00
Andres Freund 428b1d6b29 Allow to trigger kernel writeback after a configurable number of writes.
Currently writes to the main data files of postgres all go through the
OS page cache. This means that some operating systems can end up
collecting a large number of dirty buffers in their respective page
caches.  When these dirty buffers are flushed to storage rapidly, be it
because of fsync(), timeouts, or dirty ratios, latency for other reads
and writes can increase massively.  This is the primary reason for
regular massive stalls observed in real world scenarios and artificial
benchmarks; on rotating disks stalls on the order of hundreds of seconds
have been observed.

On linux it is possible to control this by reducing the global dirty
limits significantly, reducing the above problem. But global
configuration is rather problematic because it'll affect other
applications; also PostgreSQL itself doesn't always generally want this
behavior, e.g. for temporary files it's undesirable.

Several operating systems allow some control over the kernel page
cache. Linux has sync_file_range(2), several posix systems have msync(2)
and posix_fadvise(2). sync_file_range(2) is preferable because it
requires no special setup, whereas msync() requires the to-be-flushed
range to be mmap'ed. For the purpose of flushing dirty data
posix_fadvise(2) is the worst alternative, as flushing dirty data is
just a side-effect of POSIX_FADV_DONTNEED, which also removes the pages
from the page cache.  Thus the feature is enabled by default only on
linux, but can be enabled on all systems that have any of the above
APIs.

While desirable and likely possible this patch does not contain an
implementation for windows.

With the infrastructure added, writes made via checkpointer, bgwriter
and normal user backends can be flushed after a configurable number of
writes. Each of these sources of writes controlled by a separate GUC,
checkpointer_flush_after, bgwriter_flush_after and backend_flush_after
respectively; they're separate because the number of flushes that are
good are separate, and because the performance considerations of
controlled flushing for each of these are different.

A later patch will add checkpoint sorting - after that flushes from the
ckeckpoint will almost always be desirable. Bgwriter flushes are most of
the time going to be random, which are slow on lots of storage hardware.
Flushing in backends works well if the storage and bgwriter can keep up,
but if not it can have negative consequences.  This patch is likely to
have negative performance consequences without checkpoint sorting, but
unfortunately so has sorting without flush control.

Discussion: alpine.DEB.2.10.1506011320000.28433@sto
Author: Fabien Coelho and Andres Freund
2016-03-10 17:04:34 -08:00
Robert Haas c1772ad922 Change the way that LWLocks for extensions are allocated.
The previous RequestAddinLWLocks() method had several disadvantages.
First, the locks would be in the main tranche; we've recently decided
that it's useful for LWLocks used for separate purposes to have
separate tranche IDs.  Second, there wasn't any correlation between
what code called RequestAddinLWLocks() and what code called
LWLockAssign(); when multiple modules are in use, it could become
quite difficult to troubleshoot problems where LWLockAssign() ran out
of locks.  To fix, create a concept of named LWLock tranches which
can be used either by extension or by core code.

Amit Kapila and Robert Haas
2016-02-04 16:43:04 -05:00
Robert Haas 6150a1b08a Move buffer I/O and content LWLocks out of the main tranche.
Move the content lock directly into the BufferDesc, so that locking and
pinning a buffer touches only one cache line rather than two.  Adjust
the definition of BufferDesc slightly so that this doesn't make the
BufferDesc any larger than one cache line (at least on platforms where
a spinlock is only 1 or 2 bytes).

We can't fit the I/O locks into the BufferDesc and stay within one
cache line, so move those to a completely separate tranche.  This
leaves a relatively limited number of LWLocks in the main tranche, so
increase the padding of those remaining locks to a full cache line,
rather than allowing adjacent locks to share a cache line, hopefully
reducing false sharing.

Performance testing shows that these changes make little difference
on laptop-class machines, but help significantly on larger servers,
especially those with more than 2 sockets.

Andres Freund, originally based on an earlier patch by Simon Riggs.
Review and cosmetic adjustments (including heavy rewriting of the
comments) by me.
2015-12-15 13:32:54 -05:00
Robert Haas 6e71dd7ce9 Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend.  Transferring such tuples without modification between two
cooperating backends does not work.  This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves.  The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender.  That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.

Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues.  This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was.  typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
Robert Haas fd5eaad715 Correct pg_indent to pgindent in various comments.
David Christensen
2015-10-08 12:27:54 -04:00
Robert Haas 3bd909b220 Add a Gather executor node.
A Gather executor node runs any number of copies of a plan in an equal
number of workers and merges all of the results into a single tuple
stream.  It can also run the plan itself, if the workers are
unavailable or haven't started up yet.  It is intended to work with
the Partial Seq Scan node which will be added in future commits.

It could also be used to implement parallel query of a different sort
by itself, without help from Partial Seq Scan, if the single_copy mode
is used.  In that mode, a worker executes the plan, and the parallel
leader does not, merely collecting the worker's results.  So, a Gather
node could be inserted into a plan to split the execution of that plan
across two processes.  Nested Gather nodes aren't currently supported,
but we might want to add support for that in the future.

There's nothing in the planner to actually generate Gather nodes yet,
so it's not quite time to break out the champagne.  But we're getting
close.

Amit Kapila.  Some designs suggestions were provided by me, and I also
reviewed the patch.  Single-copy mode, documentation, and other minor
changes also by me.
2015-09-30 19:23:36 -04:00
Andres Freund 4f627f8973 Rework the way multixact truncations work.
The fact that multixact truncations are not WAL logged has caused a fair
share of problems. Amongst others it requires to do computations during
recovery while the database is not in a consistent state, delaying
truncations till checkpoints, and handling members being truncated, but
offset not.

We tried to put bandaids on lots of these issues over the last years,
but it seems time to change course. Thus this patch introduces WAL
logging for multixact truncations.

This allows:
1) to perform the truncation directly during VACUUM, instead of delaying it
   to the checkpoint.
2) to avoid looking at the offsets SLRU for truncation during recovery,
   we can just use the master's values.
3) simplify a fair amount of logic to keep in memory limits straight,
   this has gotten much easier

During the course of fixing this a bunch of additional bugs had to be
fixed:
1) Data was not purged from memory the member's SLRU before deleting
   segments. This happened to be hard or impossible to hit due to the
   interlock between checkpoints and truncation.
2) find_multixact_start() relied on SimpleLruDoesPhysicalPageExist - but
   that doesn't work for offsets that haven't yet been flushed to
   disk. Add code to flush the SLRUs to fix. Not pretty, but it feels
   slightly safer to only make decisions based on actual on-disk state.
3) find_multixact_start() could be called concurrently with a truncation
   and thus fail. Via SetOffsetVacuumLimit() that could lead to a round
   of emergency vacuuming. The problem remains in
   pg_get_multixact_members(), but that's quite harmless.

For now this is going to only get applied to 9.5+, leaving the issues in
the older branches in place. It is quite possible that we need to
backpatch at a later point though.

For the case this gets backpatched we need to handle that an updated
standby may be replaying WAL from a not-yet upgraded primary. We have to
recognize that situation and use "old style" truncation (i.e. looking at
the SLRUs) during WAL replay. In contrast to before, this now happens in
the startup process, when replaying a checkpoint record, instead of the
checkpointer. Doing truncation in the restartpoint is incorrect, they
can happen much later than the original checkpoint, thereby leading to
wraparound.  To avoid "multixact_redo: unknown op code 48" errors
standbys would have to be upgraded before primaries.

A later patch will bump the WAL page magic, and remove the legacy
truncation codepaths. Legacy truncation support is just included to make
a possible future backpatch easier.

Discussion: 20150621192409.GA4797@alap3.anarazel.de
Reviewed-By: Robert Haas, Alvaro Herrera, Thomas Munro
Backpatch: 9.5 for now
2015-09-26 19:04:25 +02:00
Robert Haas 4a4e6893aa Glue layer to connect the executor to the shm_mq mechanism.
The shm_mq mechanism was built to send error (and notice) messages and
tuples between backends.  However, shm_mq itself only deals in raw
bytes.  Since commit 2bd9e412f9, we have
had infrastructure for one message to redirect protocol messages to a
queue and for another backend to parse them and do useful things with
them.  This commit introduces a somewhat analogous facility for tuples
by adding a new type of DestReceiver, DestTupleQueue, which writes
each tuple generated by a query into a shm_mq, and a new
TupleQueueFunnel facility which reads raw tuples out of the queue and
reconstructs the HeapTuple format expected by the executor.

The TupleQueueFunnel abstraction supports reading from multiple tuple
streams at the same time, but only in round-robin fashion.  Someone
could imaginably want other policies, but this should be good enough
to meet our short-term needs related to parallel query, and we can
always extend it later.

This also makes one minor addition to the shm_mq API that didn'
seem worth breaking out as a separate patch.

Extracted from Amit Kapila's parallel sequential scan patch.  This
code was originally written by me, and then it was revised by Amit,
and then it was revised some more by me.
2015-09-18 21:56:58 -04:00
Bruce Momjian ab959cc0ea pgindent: add typedef blog URL 2015-06-01 11:27:30 -04:00
Bruce Momjian 3503003eb7 pgindent: document location of "all" typedef lists 2015-05-25 16:53:51 -04:00
Bruce Momjian 8339e70da6 pgindent: fix typo
Report by Michael Paquier
2015-05-25 08:08:41 -04:00
Bruce Momjian 266b6984cd pgindent: more doc updates for skipping __asm__ files 2015-05-24 21:51:42 -04:00
Bruce Momjian befa3e648c Revert 9.5 pgindent changes to atomics directory files
This is because there are many __asm__ blocks there that pgindent messes
up.  Also configure pgindent to skip that directory in the future.
2015-05-24 21:45:01 -04:00
Bruce Momjian 225892552b Update typedef file in preparation for pgindent run 2015-05-23 21:20:37 -04:00
Bruce Momjian 58affdfb88 Improve pgindent instructions regarding Perl backup files 2015-05-23 21:09:00 -04:00
Heikki Linnakangas fa60fb63e5 Fix more typos in comments.
Patch by CharSyam, plus a few more I spotted with grep.
2015-05-20 19:45:43 +03:00
Fujii Masao 40bede5477 Move pg_lzcompress.c to src/common.
The meta data of PGLZ symbolized by PGLZ_Header is removed, to make
the compression and decompression code independent on the backend-only
varlena facility. PGLZ_Header is being used to store some meta data
related to the data being compressed like the raw length of the uncompressed
record or some varlena-related data, making it unpluggable once PGLZ is
stored in src/common as it contains some backend-only code paths with
the management of varlena structures. The APIs of PGLZ are reworked
at the same time to do only compression and decompression of buffers
without the meta-data layer, simplifying its use for a more general usage.

On-disk format is preserved as well, so there is no incompatibility with
previous major versions of PostgreSQL for TOAST entries.

Exposing compression and decompression APIs of pglz makes possible its
use by extensions and contrib modules. Especially this commit is required
for upcoming WAL compression feature so that the WAL reader facility can
decompress the WAL data by using pglz_decompress.

Michael Paquier, reviewed by me.
2015-02-09 15:15:24 +09:00
Stephen Frost 3c4cf08087 Rework 'MOVE ALL' to 'ALTER .. ALL IN TABLESPACE'
As 'ALTER TABLESPACE .. MOVE ALL' really didn't change the tablespace
but instead changed objects inside tablespaces, it made sense to
rework the syntax and supporting functions to operate under the
'ALTER (TABLE|INDEX|MATERIALIZED VIEW)' syntax and to be in
tablecmds.c.

Pointed out by Alvaro, who also suggested the new syntax.

Back-patch to 9.4.
2014-08-21 19:06:17 -04:00
Alvaro Herrera 346d7be184 Move view reloptions into their own varlena struct
Per discussion after a gripe from me in
http://www.postgresql.org/message-id/20140611194633.GH18688@eldon.alvh.no-ip.org

Jaime Casanova
2014-07-14 17:24:40 -04:00
Bruce Momjian 9516668e48 Remove pgindent ecpg exclusion pattern
Report by Tom Lane
2014-05-06 20:09:31 -04:00
Bruce Momjian 7c7b1f4ae5 Improve pgindent test instructions 2014-05-06 15:33:38 -04:00
Bruce Momjian 0a78320057 pgindent run for 9.4
This includes removing tabs after periods in C comments, which was
applied to back branches, so this change should not effect backpatching.
2014-05-06 12:12:18 -04:00
Bruce Momjian fb85cd4320 Adjust pgindent to remove tabs after periods in C comments. 2014-05-06 10:57:15 -04:00
Bruce Momjian 284c464b9f Update typedef list in preparation for pgindent run 2014-05-06 09:08:14 -04:00
Robert Haas 5a991ef869 Allow logical decoding via the walsender interface.
In order for this to work, walsenders need the optional ability to
connect to a database, so the "replication" keyword now allows true
or false, for backward-compatibility, and the new value "database"
(which causes the "dbname" parameter to be respected).

walsender needs to loop not only when idle but also when sending
decoded data to the user and when waiting for more xlog data to decode.
This means that there are now three separate loops inside walsender.c;
although some refactoring has been done here, this is still a bit ugly.

Andres Freund, with contributions from Álvaro Herrera, and further
review by me.
2014-03-10 13:50:28 -04:00
Robert Haas b89e151054 Introduce logical decoding.
This feature, building on previous commits, allows the write-ahead log
stream to be decoded into a series of logical changes; that is,
inserts, updates, and deletes and the transactions which contain them.
It is capable of handling decoding even across changes to the schema
of the effected tables.  The output format is controlled by a
so-called "output plugin"; an example is included.  To make use of
this in a real replication system, the output plugin will need to be
modified to produce output in the format appropriate to that system,
and to perform filtering.

Currently, information can be extracted from the logical decoding
system only via SQL; future commits will add the ability to stream
changes via walsender.

Andres Freund, with review and other contributions from many other
people, including Álvaro Herrera, Abhijit Menon-Sen, Peter Gheogegan,
Kevin Grittner, Robert Haas, Heikki Linnakangas, Fujii Masao, Abhijit
Menon-Sen, Michael Paquier, Simon Riggs, Craig Ringer, and Steve
Singer.
2014-03-03 16:32:18 -05:00
Bruce Momjian 2fc80e8e83 Rename 'gmake' to 'make' in docs and recommended commands
This simplifies the docs and makes it easier to cut/paste command lines.
2014-02-12 17:29:19 -05:00
Robert Haas 858ec11858 Introduce replication slots.
Replication slots are a crash-safe data structure which can be created
on either a master or a standby to prevent premature removal of
write-ahead log segments needed by a standby, as well as (with
hot_standby_feedback=on) pruning of tuples whose removal would cause
replication conflicts.  Slots have some advantages over existing
techniques, as explained in the documentation.

In a few places, we refer to the type of replication slots introduced
by this patch as "physical" slots, because forthcoming patches for
logical decoding will also have slots, but with somewhat different
properties.

Andres Freund and Robert Haas
2014-01-31 22:45:36 -05:00
Bruce Momjian 290d2cb500 pgindent: add Perl comment 2014-01-31 14:46:00 -05:00
Bruce Momjian cad1e022b2 pgindent: add --list-of-typedefs option
Allows typedefs to be specified on the command line, per request from
Andrew.
2014-01-31 13:35:50 -05:00
Bruce Momjian db98b31329 pgindent: preserve blank lines around #else/#endif
This requires a new version of pg_bsd_indent, version 1.3, to be
downloaded.
2014-01-30 22:40:05 -05:00
Robert Haas ea9df812d8 Relax the requirement that all lwlocks be stored in a single array.
This makes it possible to store lwlocks as part of some other data
structure in the main shared memory segment, or in a dynamic shared
memory segment.  There is still a main LWLock array and this patch does
not move anything out of it, but it provides necessary infrastructure
for doing that in the future.

This change is likely to increase the size of LWLockPadded on some
platforms, especially 32-bit platforms where it was previously only
16 bytes.

Patch by me.  Review by Andres Freund and KaiGai Kohei.
2014-01-27 11:07:44 -05:00
Stephen Frost 76e91b38ba Add ALTER TABLESPACE ... MOVE command
This adds a 'MOVE' sub-command to ALTER TABLESPACE which allows moving sets of
objects from one tablespace to another.  This can be extremely handy and avoids
a lot of error-prone scripting.  ALTER TABLESPACE ... MOVE will only move
objects the user owns, will notify the user if no objects were found, and can
be used to move ALL objects or specific types of objects (TABLES, INDEXES, or
MATERIALIZED VIEWS).
2014-01-18 18:56:40 -05:00
Robert Haas c32afe53c2 pg_prewarm, a contrib module for prewarming relationd data.
Patch by me.  Review by Álvaro Herrera, Amit Kapila, Jeff Janes,
Gurjeet Singh, and others.
2013-12-20 08:14:13 -05:00
Robert Haas e55704d8b2 Add new wal_level, logical, sufficient for logical decoding.
When wal_level=logical, we'll log columns from the old tuple as
configured by the REPLICA IDENTITY facility added in commit
07cacba983.  This makes it possible
a properly-configured logical replication solution to correctly
follow table updates even if they change the chosen key columns,
or, with REPLICA IDENTITY FULL, even if the table has no key at
all.  Note that updates which do not modify the replica identity
column won't log anything extra, making the choice of a good key
(i.e. one that will rarely be changed) important to performance
when wal_level=logical is configured.

Each insert, update, or delete to a catalog table will also log
the CMIN and/or CMAX values of stamped by the current transaction.
This is necessary because logical decoding will require access to
historical snapshots of the catalog in order to decode some data
types, and the CMIN/CMAX values that we may need in order to judge
row visibility may have been overwritten by the time we need them.

Andres Freund, reviewed in various versions by myself, Heikki
Linnakangas, KONDO Mitsumasa, and many others.
2013-12-10 19:01:40 -05:00
Peter Eisentraut 001e114b8d Fix whitespace issues found by git diff --check, add gitattributes
Set per file type attributes in .gitattributes to fine-tune whitespace
checks.  With the associated cleanups, the tree is now clean for git
2013-11-10 14:48:29 -05:00
Kevin Grittner 277607d600 Eliminate pg_rewrite.ev_attr column and related dead code.
Commit 95ef6a3448 removed the
ability to create rules on an individual column as of 7.3, but
left some residual code which has since been useless.  This cleans
up that dead code without any change in behavior other than
dropping the useless column from the catalog.
2013-09-05 14:03:43 -05:00
Stephen Frost c9fc28a7f1 Minor spelling fixes
Fix a few spelling mistakes.

Per bug report #8193 from Lajos Veres.
2013-06-01 10:18:59 -04:00
Peter Eisentraut 8b5a3998a1 Remove whitespace from end of lines 2013-05-30 21:05:07 -04:00
Bruce Momjian 9af4159fce pgindent run for release 9.3
This is the first run of the Perl-based pgindent script.  Also update
pgindent instructions.
2013-05-29 16:58:43 -04:00
Bruce Momjian d61dddba37 pgindent: add newline to die() so script line number is not reported on failure. 2013-04-16 10:30:35 -04:00
Bruce Momjian 5003f94f66 pgindent: improve error messages
per suggestion from Gurjeet Singh
2013-04-12 15:25:33 -04:00
Bruce Momjian 8daa4e960e pgindent: fix downloading of BSD indent binary
Also fix accessing pgentab binary and tar.

Gurjeet Singh
2013-04-12 11:42:27 -04:00
Peter Eisentraut 8e6c8da16a pgindent: Fix order in instructions
The previous order of steps didn't literally work, because git clean
-fdx would delete the downloaded typedefs.list.  Also, pgindent needs to
be called with a path when one is in at the top of the build tree.
2013-02-14 21:40:05 -05:00
Andrew Dunstan 74570db99c Fix a logic bug in pgindent. 2013-01-07 12:26:27 -05:00
Bruce Momjian e40bddb0f3 Have pgindent requre pg_bsd_indent version 1.2 now that a new version
has been created by adding #include <stdlib.h> to parse.c.

per request from Kevin Grittner.
2012-08-27 09:31:56 -04:00
Tom Lane 5078be4804 Tweak new Perl pgindent for compatibility with middle-aged Perls.
We seem to have a rough policy that our Perl scripts should work with
Perl 5.8, so make this one do so.  Main change is to not use the newfangled
\h character class in regexes; "[ \t]" is a serviceable replacement.
2012-08-07 17:52:53 -04:00
Bruce Momjian 149ac7d455 Replace pgindent shell script with Perl script. Update perltidy
instructions to perltidy Perl files that lack Perl file extensions.

pgindent Perl coding by Andrew Dunstan, restructured by me.
2012-08-04 12:41:21 -04:00
Bruce Momjian 76720bdf1a Remove 'x =- 1' check for pgindent, not needed, per report from Andrew
Dunstan.
2012-07-12 14:37:47 -04:00
Bruce Momjian 47463a8098 Remove 'for' loop perltidy argument, and move args to perltidyrc file.
Backpatch to 9.2.

Per suggestion from Noah Misch
2012-06-16 10:12:50 -04:00
Bruce Momjian 0acd978259 In pgindent, suppress reading the perltidy RC file using --noprofile. 2012-06-15 22:50:02 -04:00
Bruce Momjian d6e0207437 Update pgindent Perl indentation instructions based on feedback from
Àlvaro and Noah Misch.

Backpatch to 9.2.
2012-06-15 22:43:23 -04:00
Bruce Momjian 60801944fa Update pgindent install instructions and update typedef list. 2012-06-10 15:15:31 -04:00
Peter Eisentraut 621eb156f1 Add installing entab to pgindent instructions
And minor other pgindent documentation tweaks.
2012-03-21 23:33:10 +02:00
Robert Haas dc3f33f6be Fix pathname in pgindent README.
Kevin Grittner
2012-01-09 13:31:58 -05:00
Bruce Momjian 7260a0d00a Document that perl needs to be indented during the pgindent run. 2011-11-28 21:56:58 -05:00
Bruce Momjian 1a2586c1d0 Rerun pgindent with updated typedef list. 2011-11-14 12:12:23 -05:00
Bruce Momjian 360429e1d1 Fix pg_bsd_indent bug where newlines were not being trimmed from typedef
lines.  Update pg_bsd_indent required version to 1.1 (and update ftp
site).

Problem reported by Magnus.
2011-10-26 17:24:19 -04:00
Bruce Momjian 6e22ba03a9 Modify pgindent to use a renamed pg_bsd_indent binary. New features
include the ability to supply a typedef file, rather than list them on
the command line.  Also improve the README.
2011-10-12 15:51:27 -04:00
Alvaro Herrera d69149ed71 Add comment about pg_ctl stop 2011-06-10 15:27:38 -04:00
Bruce Momjian bb8f0c4b48 Mention "pg_ctl stop" in pgindent README instructions. 2011-06-09 20:51:44 -04:00
Bruce Momjian adf43b2b36 Update typedef list for upcoming pgindent run. 2011-06-09 14:01:49 -04:00
Andrew Dunstan fe1438da8a Latest consolidated typedef list from buildfarm. 2011-04-08 23:11:37 -04:00
Alvaro Herrera a5dfc94c9a Use $INDENT instead of `which` to find the indent binary
Per discussion after my commit o yesterday.
2011-02-18 12:49:16 -03:00
Alvaro Herrera c4d124365b Use $INDENT rather than indent throughout the pgindent code
This allows the user to change the path to be used more easily.
Also, change URL in README.
2011-02-17 22:20:19 -03:00
Heikki Linnakangas dafaa3efb7 Implement genuine serializable isolation level.
Until now, our Serializable mode has in fact been what's called Snapshot
Isolation, which allows some anomalies that could not occur in any
serialized ordering of the transactions. This patch fixes that using a
method called Serializable Snapshot Isolation, based on research papers by
Michael J. Cahill (see README-SSI for full references). In Serializable
Snapshot Isolation, transactions run like they do in Snapshot Isolation,
but a predicate lock manager observes the reads and writes performed and
aborts transactions if it detects that an anomaly might occur. This method
produces some false positives, ie. it sometimes aborts transactions even
though there is no anomaly.

To track reads we implement predicate locking, see storage/lmgr/predicate.c.
Whenever a tuple is read, a predicate lock is acquired on the tuple. Shared
memory is finite, so when a transaction takes many tuple-level locks on a
page, the locks are promoted to a single page-level lock, and further to a
single relation level lock if necessary. To lock key values with no matching
tuple, a sequential scan always takes a relation-level lock, and an index
scan acquires a page-level lock that covers the search key, whether or not
there are any matching keys at the moment.

A predicate lock doesn't conflict with any regular locks or with another
predicate locks in the normal sense. They're only used by the predicate lock
manager to detect the danger of anomalies. Only serializable transactions
participate in predicate locking, so there should be no extra overhead for
for other transactions.

Predicate locks can't be released at commit, but must be remembered until
all the transactions that overlapped with it have completed. That means that
we need to remember an unbounded amount of predicate locks, so we apply a
lossy but conservative method of tracking locks for committed transactions.
If we run short of shared memory, we overflow to a new "pg_serial" SLRU
pool.

We don't currently allow Serializable transactions in Hot Standby mode.
That would be hard, because even read-only transactions can cause anomalies
that wouldn't otherwise occur.

Serializable isolation mode now means the new fully serializable level.
Repeatable Read gives you the old Snapshot Isolation level that we have
always had.

Kevin Grittner and Dan Ports, reviewed by Jeff Davis, Heikki Linnakangas and
Anssi Kääriäinen
2011-02-08 00:09:08 +02:00
Bruce Momjian 97116ca417 Rename macro DECIMAL to DECIMAL_T to help pgindent; this is already
done for a few other macros in that file, for other reasons.  I also
remove pgindent/README mention of the file.
2011-02-06 10:48:17 -05:00
Peter Eisentraut fc946c39ae Remove useless whitespace at end of lines 2010-11-23 22:34:55 +02:00
Magnus Hagander 9f2e211386 Remove cvs keywords from all files. 2010-09-20 22:08:53 +02:00
Peter Eisentraut 3f11971916 Remove extra newlines at end and beginning of files, add missing newlines
at end of files.
2010-08-19 05:57:36 +00:00
Bruce Momjian 9e15b476de Mention why one C file fails pgindent. 2010-07-06 19:26:28 +00:00
Bruce Momjian 52783b212c Update pgindent testing instructions. 2010-07-06 19:18:19 +00:00
Andrew Dunstan 7004434a46 Exclude unwanted typedef symbols in pgindent, including FD_SET which is found on some Windows platforms. Also, silence unnecessary messages and make awk happier about literal '*' on some platforms. 2010-04-05 03:09:09 +00:00
Andrew Dunstan 799c0d3f65 Use a file of patterns of filenames to exclude from pgindent runs, instead if using multiple invocations of egrep. Add perl ppport.h to the current list. 2010-04-01 14:44:39 +00:00
Bruce Momjian 4b0f822c77 Suggest gmake installcheck-world for pgindent testing. 2010-02-26 18:00:15 +00:00
Bruce Momjian 2cc6ff45f8 Revert pgindent changes to ecpg include files that are part of ecpg
regession test output, and update pgindent script to avoid them in the
future.
2010-02-26 17:07:55 +00:00
Bruce Momjian 98c356c8ad Wording improvements to README. 2010-02-26 15:57:34 +00:00
Bruce Momjian 55d1402f61 Update pgindent docs to use maintainer-clean. 2010-02-26 15:42:36 +00:00
Bruce Momjian e0d4b9c66f Document why pgindent wants a fresh CVS checkout. 2010-02-26 13:50:34 +00:00
Bruce Momjian 637611585b Call output file typedefs.list; update README. 2010-02-26 02:58:49 +00:00
Bruce Momjian 4f96ddd1d3 Update pgindent instructions. 2010-02-26 02:11:52 +00:00
Bruce Momjian 16040575a0 Add pgindent typedefs file to CVS. 2010-02-26 01:55:35 +00:00
Bruce Momjian a8307560e0 Update pgindent instructions to avoid changes to flex output files. 2010-02-26 01:40:15 +00:00
Bruce Momjian 6e8d957d35 Document struct/union problem with pgindent. 2009-06-11 22:21:44 +00:00
Bruce Momjian 6c4b3f5f8c Update pgindent instructions. 2009-06-10 01:51:44 +00:00
Bruce Momjian 78f3c3906e Document new location for typedef list. 2009-06-10 01:47:59 +00:00
Bruce Momjian 06c22d7f51 Small shell syntax improvement. 2008-11-03 15:56:47 +00:00
Bruce Momjian d18f5c3eb0 Ignore blank lines in typedef file. 2008-04-16 21:03:08 +00:00
Bruce Momjian fca9fff41b More README src cleanups. 2008-03-21 13:23:29 +00:00
Peter Eisentraut 79a323ab49 Change /contrib to contrib for consistency. 2008-01-24 06:23:33 +00:00
Bruce Momjian bfde21a1a8 Improve usage message for pgindent. 2008-01-16 20:13:44 +00:00
Bruce Momjian 7b009a2a9d Modify pgindent to use an external typedefs file rather than included
list.

Remove pgjindent.
2007-12-21 14:20:36 +00:00
Bruce Momjian 812bf6984b Mention use all configure options when getting pgindent typedefs. 2007-12-17 02:02:48 +00:00
Bruce Momjian 55cfdd4400 Mention installing /contrib libraries for pgindent. 2007-12-17 01:56:43 +00:00
Bruce Momjian d6fda1b0bb Better guard token used by pgindent. 2007-11-16 01:25:15 +00:00
Bruce Momjian 0c2c061eb0 Cleanup for new else/comment handling. 2007-11-16 01:11:04 +00:00
Bruce Momjian 7d4c99b414 Fix pgindent to properly handle 'else' and single-line comments on the
same line;  previous fix was only partial.  Re-run pgindent on files
that need it.
2007-11-15 23:23:44 +00:00
Bruce Momjian da0b2cdff8 Beef up README instructions, again. 2007-11-15 22:15:46 +00:00
Bruce Momjian 6c8f69cd58 CUpdate README to suggest 'gmake distclean'. Add library typedefs. 2007-11-15 22:12:09 +00:00
Bruce Momjian 2a754d70d7 Update pgtools README to be clearer about typdefs. 2007-11-15 22:09:07 +00:00
Bruce Momjian ab895f3b40 Update pgindent with current typedefs. 2007-11-15 22:06:07 +00:00
Bruce Momjian 1f735c32b2 Add blank lines to pgindent. 2007-11-15 21:52:39 +00:00
Tom Lane 5c681ab1cb Exclude snowball/libstemmer/ files from the set processed by pgindent.
There's not much point in prettifying machine-generated code, and it
seems best to keep these files exactly like upstream anyway.  Also add
some notes about why various files are excluded.
2007-08-21 16:08:23 +00:00
Bruce Momjian 7accb29478 Clean up pgindent handling of comments after 'else' by only moving
multi-line comments to the next line.
2006-12-27 23:03:52 +00:00
Bruce Momjian abcf7603c0 Exclude pgindent from affecting the ecpg regression directory. 2006-10-04 20:42:19 +00:00
Bruce Momjian 451e419e98 Udpate typedefs for pgindent. 2006-10-04 00:02:10 +00:00
Bruce Momjian eff77a759a Update typedef list for 8.2 pgindent run. 2006-10-03 22:09:42 +00:00
Bruce Momjian f3d99d160d Add CVS tag lines to files that were lacking them. 2006-03-11 04:38:42 +00:00
Bruce Momjian aac96b8994 Fix pgindent of libpq-fe.h by hacking pgindent script.
Remove pgbench comment that was causing problems.
2005-11-23 04:23:30 +00:00
Bruce Momjian 62fb1d6028 Prevent certain symbols that are used for both typedefs and variable
names from being added to pgindent's typedef list.  The existance of
them caused weird formatting in the date/type files, and in keywords.c.

Backpatch to 8.1.X.
2005-11-15 14:45:10 +00:00
Bruce Momjian 02c43ffbec Fix recent problems with BSD indent, including indenting past 80
columns, shifting comment to the right when more than 150 'else if'
clauses were used, and update typedefs for 8.1.X.

NetBSD patched updated, with documentation.
2005-11-15 00:43:01 +00:00
Bruce Momjian 19cb457146 Revert pgindent length back to 79 because we are going to fix the BSD
indent bug.
2005-11-13 02:38:49 +00:00
Bruce Momjian 6521ea008e Lower pgident length to 77, document BSD indent bug. 2005-11-07 23:50:20 +00:00
Bruce Momjian aaf8cb0c72 Change maximum pgindent length from 79 to 78, per Tom. 2005-11-07 22:52:41 +00:00
Bruce Momjian 790c01d280 Update pgindent typedef list. 2005-10-15 02:14:22 +00:00
Alvaro Herrera a84429a1aa Remove an unused typedef. 2005-10-07 14:55:36 +00:00
Bruce Momjian 505b925276 Fix #elif spacing too. 2005-07-13 15:59:32 +00:00
Bruce Momjian 5d0a43c585 Fix pgindent to not have blank line before #else in variable definition
section of a function.
2005-07-13 04:44:42 +00:00
Bruce Momjian 7690b41328 Add backslashes to parentheses in awk regex because if not, they are
treated as regex groups.
2005-07-13 04:00:28 +00:00
Bruce Momjian b3d06e5a95 Update typedefs for pgindent. 2005-06-28 23:55:30 +00:00
Bruce Momjian 249802725d Change awk ~ pattern from "" to //.
Remove extra backslash in pattern.  Luke Lonergan
2005-06-28 23:16:33 +00:00
Bruce Momjian cdc84adbdb Indent comment pushed to new line by else so it is indented by BSD
indent.
2004-10-07 14:15:50 +00:00
Bruce Momjian 8a28f50f8a Improve pgindent processing of comment after 'else'.
Improve comment of pg_dump Win32 link workaround.
2004-10-07 13:45:51 +00:00
Bruce Momjian 4e28b08e53 Improve comment after 'else' handling of pgindent. 2004-10-07 02:32:06 +00:00
Bruce Momjian b3f2b19218 Update length from 75 to 79. 2004-10-02 01:10:58 +00:00
Bruce Momjian 409de6be6c Re-add brace removal code but comment it out so we know why we removed
it and have it in case we need it for some special case.
2004-09-12 22:21:30 +00:00
Bruce Momjian 47402a9b00 Remove code that delete braces around single statements. 2004-09-12 22:11:27 +00:00
Bruce Momjian 15d3f9f6b7 Another pgindent run with lib typedefs added. 2004-08-30 02:54:42 +00:00
Bruce Momjian ee66401f31 Update typedefs with /lib info. 2004-08-29 17:31:42 +00:00
Bruce Momjian 90cb9c3051 Update with new typedefs. Remove java and c++ parts of readme. 2004-08-29 04:49:45 +00:00
Bruce Momjian e3107b2844 Mention grabbing typedefs from pgsql/lib too. 2004-01-04 00:11:29 +00:00
Bruce Momjian a994d5984f Adjust pgindent for newer awks.
Nigel J. Andrews
2003-09-28 00:25:22 +00:00
Bruce Momjian b4ca39b956 Allow pgindent to work with newer BSD indents. 2003-09-28 00:22:58 +00:00
Bruce Momjian 16e4adc38f Update bsd indent patch. 2003-09-27 21:26:09 +00:00
Bruce Momjian ee84100cc1 Cleanup pgindent patch. 2003-09-27 21:19:47 +00:00
Bruce Momjian 5a288903b9 Guard against pgindent changing =- to = -. 2003-08-30 14:59:34 +00:00
Bruce Momjian 0e2b12bd96 pgindent fix for new typedefs. 2003-08-08 21:25:06 +00:00
Bruce Momjian c7fda55cc6 Update pgindent readme. 2003-08-07 15:02:43 +00:00
Bruce Momjian 78154363f9 Update typedef names for pgindent 7.4. 2003-08-07 05:18:14 +00:00
Bruce Momjian 4490c7b14b Update symbols for 7.3. 2002-09-04 19:11:06 +00:00
Bruce Momjian 2355482e28 Update for 7.3 typedefs. 2002-09-04 19:00:01 +00:00
Bruce Momjian af3cf2cfa8 Update to reflect Tom's suggestions. 2002-09-04 18:45:52 +00:00
Bruce Momjian d54ae2aff2 Add C++ indent tool. 2002-06-15 19:13:04 +00:00
Bruce Momjian 09634eafe1 Indent jdbc case labels using pgjindent. 2001-11-19 23:16:46 +00:00
Bruce Momjian 46d50783bf Update pgindent README so it gets *.java.in files. 2001-11-19 22:36:11 +00:00
Bruce Momjian 876c7009fb Make extern C handling more flexible. 2001-11-08 17:03:23 +00:00
Bruce Momjian c6e25ed1af Fix replacement of extern C string. 2001-11-07 22:10:02 +00:00
Bruce Momjian 1233d4fd6c Fix typo. 2001-11-07 21:29:04 +00:00
Bruce Momjian e644fc25c7 Prevent indenting of 'extern "C"' blocks. 2001-11-07 21:24:28 +00:00
Bruce Momjian ea08e6cd55 New pgindent run with fixes suggested by Tom. Patch manually reviewed,
initdb/regression tests pass.
2001-11-05 17:46:40 +00:00
Bruce Momjian 0f450dae8b More cleanup for stuff after closing brace in first column. 2001-11-05 06:37:51 +00:00
Bruce Momjian d447dbf392 Handle tabs after closing brace in first column with less indenting. 2001-11-05 05:47:50 +00:00
Bruce Momjian 158129be72 Improve readability of script. 2001-11-05 05:18:43 +00:00
Bruce Momjian 3bb110ebb3 Pull in variables defined in structs; had too many tabs. 2001-11-04 21:27:41 +00:00
Bruce Momjian 8ee7c19e3c Require closing paren on line above brace to identify function
difinition, just for formatting workaround, per Tom's discovery.
2001-11-03 22:34:13 +00:00
Bruce Momjian f008976bcd More updates for GNU indent. 2001-11-03 12:34:15 +00:00
Bruce Momjian ffba91cd1e Make pgindent use GNU Indent version 2.X better. 2001-11-03 01:49:22 +00:00
Bruce Momjian 04550d3c90 Add check for 'extern "C"' for pgindent. 2001-11-02 23:43:24 +00:00
Bruce Momjian c41b6b1b9c Fix small problem Tom Lane found with pgindent run. 2001-10-30 05:38:56 +00:00
Bruce Momjian 6783b2372e Another pgindent run. Fixes enum indenting, and improves #endif
spacing.  Also adds space for one-line comments.
2001-10-28 06:26:15 +00:00
Bruce Momjian c29797deeb Add code to trip trailing newlines in a file. 2001-10-27 13:54:45 +00:00
Bruce Momjian 5ef74fe593 Correct fix for indenting. 2001-10-27 03:31:36 +00:00
Bruce Momjian b93939a6a7 Adjust NR tests. More accurate. 2001-10-26 17:54:45 +00:00
Bruce Momjian 8c1f4e574b Add code to not indent enum, per Tom Lane. 2001-10-26 16:21:13 +00:00
Bruce Momjian 99a9f2f6f4 Add ODBC typedefs. 2001-10-26 15:42:54 +00:00
Bruce Momjian 80b9a00439 Add blank line before #endif to #endif's at the end of the file. 2001-10-25 19:57:03 +00:00
Bruce Momjian 3231341eed Add slash for comment spacing, for Tom. 2001-10-25 19:22:05 +00:00
Bruce Momjian 81d9a9674e Add comment spaces for trailing ) and comment. 2001-10-25 18:44:42 +00:00
Bruce Momjian cae059ba5e Add spacing for single-line comments with trailing semicolon _and_
comma, per Tom.
2001-10-25 18:25:23 +00:00
Bruce Momjian 05584c9660 Code cleanup. 2001-10-25 06:27:56 +00:00
Bruce Momjian 59da2105d8 Update to prevent CATALOG() from wrapping. 2001-10-25 05:07:56 +00:00
Bruce Momjian bbc7491de1 Add current typedef symbols to pgindent. 2001-10-25 03:56:35 +00:00
Bruce Momjian 3fb3678409 Create pgjindent for java. 2001-09-07 21:25:44 +00:00
Bruce Momjian 5840db21fb Add back incremental patch for BSD indent. 2001-09-04 03:34:42 +00:00
Bruce Momjian e5390263ed Add patch for 0LL for BSD indent/pgindent. 2001-09-03 23:11:20 +00:00
Bruce Momjian 398b41a23f pgindent fix for asterisk indented too much in return type, for Tom. 2001-06-06 20:51:31 +00:00
Bruce Momjian a62c19e4ec Fix for comments at top of functions. 2001-05-22 17:24:58 +00:00
Bruce Momjian e7f47ed5b4 Pgindent fixes for Tom, mostly indenting problems. 2001-05-22 01:28:16 +00:00
Bruce Momjian 8266e8a84b OK, now pgindent has blank lines before comment blocks, except when
there is a brace on the line above it.
2001-05-17 16:11:08 +00:00
Bruce Momjian 2d7795ebb4 Prevent forced blank line before comment block in pgindent. 2001-05-17 15:55:24 +00:00
Bruce Momjian 1e7b79cebc Remove unused tables pg_variable, pg_inheritproc, pg_ipl tables. Initdb
forced.
2001-05-14 20:30:21 +00:00
Bruce Momjian 281b7d84fc Add // -> /* */ mapping to pgindent. 2001-02-12 18:30:53 +00:00
Bruce Momjian 3152ef63a6 Source alignment cleanups. 2001-02-11 05:58:41 +00:00
Bruce Momjian a952c79b23 More updates. 2001-02-11 05:15:25 +00:00
Bruce Momjian 26dc50141b More cleanup. 2001-02-11 05:13:52 +00:00
Bruce Momjian 755a87332a Run pgindent over ODBC source. We couldn't do this years ago because we
weren't the master source.  We are now, and it really needs it.
2001-02-10 07:01:19 +00:00
Bruce Momjian 398bb1fcb6 Update pgindent 2000-04-12 01:01:49 +00:00
Bruce Momjian 83a57694d1 Update pgindent 2000-04-11 22:15:08 +00:00
Bruce Momjian 862d677682 Update pgindent for 7.0 release 2000-04-11 19:09:04 +00:00
Bruce Momjian 7acc237744 This patch implements ORACLE's COMMENT SQL command.
>From the ORACLE 7 SQL Language Reference Manual:
-----------------------------------------------------
COMMENT

Purpose:

To add a comment about a table, view, snapshot, or
column into the data dictionary.

Prerequisites:

The table, view, or snapshot must be in your own
schema
or you must have COMMENT ANY TABLE system privilege.

Syntax:

COMMENT ON [ TABLE table ] |
           [ COLUMN table.column] IS 'text'

You can effectively drop a comment from the database
by setting it to the empty string ''.
-----------------------------------------------------

Example:

COMMENT ON TABLE workorders IS
   'Maintains base records for workorder information';

COMMENT ON COLUMN workorders.hours IS
   'Number of hours the engineer worked on the task';

to drop a comment:

COMMENT ON COLUMN workorders.hours IS '';

The current patch will simply perform the insert into
pg_description, as per the TODO. And, of course, when
the table is dropped, any comments relating to it
or any of its attributes are also dropped. I haven't
looked at the ODBC source yet, but I do know from
an ODBC client standpoint that the standard does
support the notion of table and column comments.
Hopefully the ODBC driver is already fetching these
values from pg_description, but if not, it should be
trivial.

Hope this makes the grade,

Mike Mascari
(mascarim@yahoo.com)
1999-10-15 01:49:49 +00:00
Bruce Momjian e7cad7b0cb Add TRUNCATE command, with psql help and sgml additions. 1999-09-23 17:03:39 +00:00
Bruce Momjian c1d5e88b41 Make pgindent gnu test better. 1999-09-09 19:39:06 +00:00
Bruce Momjian 9c56b408c4 Add fix for 0x7fU constants to pgindent 1999-05-26 15:20:04 +00:00
Bruce Momjian fcff1cdf4e Another pgindent run. Sorry folks. 1999-05-25 22:43:53 +00:00
Bruce Momjian 07842084fe pgindent run over code. 1999-05-25 16:15:34 +00:00
Bruce Momjian 8849655d24 I agree. I think, though, that the best argument presented in the
debate was from Paul Vixie, who wanted INET to be the name covering
both IPV4 and IPV6.  The following kit makes the needed changes:

Tom Ivar Helbekkmo
1998-10-08 00:19:47 +00:00
Bruce Momjian 2d69fd90b9 Integrate new IP type from Tom Ivar Helbekkmo. 1998-10-03 05:41:01 +00:00
Bruce Momjian f1ab71ec5f The attached patches fix the following problems:
1.  The UnixWare tas macro was reformatted (by indent or it like?) which caused
    it to break.  The asm macro construct is very particular about the %mem
    construct -- it has to start in column 1.

2.  When compiling libpq++, g++ was used even if configure found the C++ com-
    piler to be CC.

3.  When compiling libpq++, '-Wno-error' was added to CXXFLAGS, even if the
    compiler wasn't g++.

Billy G. Allie
1998-09-11 16:56:24 +00:00
Bruce Momjian fa1a8d6a97 OK, folks, here is the pgindent output. 1998-09-01 04:40:42 +00:00
Bruce Momjian 7971539020 heap_fetch requires buffer pointer, must be released; heap_getnext
no longer returns buffer pointer, can be gotten from scan;
	descriptor; bootstrap can create multi-key indexes;
pg_procname index now is multi-key index; oidint2, oidint4, oidname
are gone (must be removed from regression tests); use System Cache
rather than sequential scan in many places; heap_modifytuple no
longer takes buffer parameter; remove unused buffer parameter in
a few other functions; oid8 is not index-able; remove some use of
single-character variable names; cleanup Buffer variables usage
and scan descriptor looping; cleaned up allocation and freeing of
tuples; 18k lines of diff;
1998-08-19 02:04:17 +00:00
Bruce Momjian addddea313 Update pgindent. 1998-08-09 17:57:31 +00:00
Bruce Momjian a08dc16c47 New pgindent. 1998-08-09 04:59:10 +00:00
Bruce Momjian 56bdbe1f4c Add remove extra braces code to pgindent. 1998-06-15 20:45:57 +00:00
Bruce Momjian 0d203b745d Re-apply Darren's char2-16 removal code. 1998-04-26 04:12:15 +00:00
Bruce Momjian db21523314 Back out char2-char16 removal. Add later. 1998-04-07 18:14:38 +00:00
Bruce Momjian 57b5966405 The following uuencoded, gzip'd file will ...
1. Remove the char2, char4, char8 and char16 types from postgresql
2. Change references of char16 to name in the regression tests.
3. Rename the char16.sql regression test to name.sql.  4. Modify
the regression test scripts and outputs to match up.

Might require new regression.{SYSTEM} files...

Darren King
1998-03-30 17:28:21 +00:00
Bruce Momjian 748fab8d5d Prevent pgindent from being run on odbc in the future. 1998-03-28 02:24:49 +00:00
Bruce Momjian d067f83b27 pgindent changes for Thomas proc/lock cleanup 1998-02-25 00:31:23 +00:00
Marc G. Fournier ba0b03de2e Let's hope this fixes the "bug" that was introduced 1997-09-13 16:27:13 +00:00