Commit Graph

640 Commits

Author SHA1 Message Date
Alvaro Herrera af38498d4c Move hash_any prototype from access/hash.h to utils/hashutils.h
... as well as its implementation from backend/access/hash/hashfunc.c to
backend/utils/hash/hashfn.c.

access/hash is the place for the hash index AM, not really appropriate
for generic facilities, which is what hash_any is; having things the old
way meant that anything using hash_any had to include the AM's include
file, pointlessly polluting its namespace with unrelated, unnecessary
cruft.

Also move the HTEqual strategy number to access/stratnum.h from
access/hash.h.

To avoid breaking third-party extension code, add an #include
"utils/hashutils.h" to access/hash.h.  (An easily removed line by
committers who enjoy their asbestos suits to protect them from angry
extension authors.)

Discussion: https://postgr.es/m/201901251935.ser5e4h6djt2@alvherre.pgsql
2019-03-11 13:17:50 -03:00
Tom Lane caf626b2cd Convert [autovacuum_]vacuum_cost_delay into floating-point GUCs.
This change makes it possible to specify sub-millisecond delays,
which work well on most modern platforms, though that was not true
when the cost-delay feature was designed.

To support this without breaking existing configuration entries,
improve guc.c to allow floating-point GUCs to have units.  Also,
allow "us" (microseconds) as an input/output unit for time-unit GUCs.
(It's not allowed as a base unit, at least not yet.)

Likewise change the autovacuum_vacuum_cost_delay reloption to be
floating-point; this forces a catversion bump because the layout of
StdRdOptions changes.

This patch doesn't in itself change the default values or allowed
ranges for these parameters, and it should not affect the behavior
for any already-allowed setting for them.

Discussion: https://postgr.es/m/1798.1552165479@sss.pgh.pa.us
2019-03-10 15:01:39 -04:00
Tom Lane 80b9e9c466 Improve performance of index-only scans with many index columns.
StoreIndexTuple was a loop over index_getattr, which is O(N^2)
if the index columns are variable-width, and the performance
impact is already quite visible at ten columns.  The obvious
move is to replace that with a call to index_deform_tuple ...
but that's *also* a loop over index_getattr.  Improve it to
be essentially a clone of heap_deform_tuple.

(There are a few other places that loop over all index columns
with index_getattr, and perhaps should be changed likewise,
but most of them don't seem performance-critical.  Anyway, the
rest would mostly only be interested in the index key columns,
which there aren't likely to be so many of.  Wide index tuples
are a new thing with INCLUDE.)

Konstantin Knizhnik

Discussion: https://postgr.es/m/e06b2d27-04fc-5c0e-bb8c-ecd72aa24959@postgrespro.ru
2019-03-03 16:57:14 -05:00
Tom Lane c94fb8e8ac Standardize some more loops that chase down parallel lists.
We have forboth() and forthree() macros that simplify iterating
through several parallel lists, but not everyplace that could
reasonably use those was doing so.  Also invent forfour() and
forfive() macros to do the same for four or five parallel lists,
and use those where applicable.

The immediate motivation for doing this is to reduce the number
of ad-hoc lnext() calls, to reduce the footprint of a WIP patch.
However, it seems like good cleanup and error-proofing anyway;
the places that were combining forthree() with a manually iterated
loop seem particularly illegible and bug-prone.

There was some speculation about restructuring related parsetree
representations to reduce the need for parallel list chasing of
this sort.  Perhaps that's a win, or perhaps not, but in any case
it would be considerably more invasive than this patch; and it's
not particularly related to my immediate goal of improving the
List infrastructure.  So I'll leave that question for another day.

Patch by me; thanks to David Rowley for review.

Discussion: https://postgr.es/m/11587.1550975080@sss.pgh.pa.us
2019-02-28 14:25:01 -05:00
Andres Freund 171e0418b0 Fix heap_getattr() handling of fast defaults.
Previously heap_getattr() returned NULL for attributes with a fast
default value (c.f. 16828d5c02), as it had no handling whatsoever
for that case.

A previous fix, 7636e5c60f, attempted to fix issues caused by this
oversight, but just expanding OLD tuples for triggers doesn't actually
solve the underlying issue.

One known consequence of this bug is that the check for HOT updates
can return the wrong result, when a previously fast-default'ed column
is set to NULL. Which in turn means that an index over a column with
fast default'ed columns might be corrupt if the underlying column(s)
allow NULLs.

Fix by handling fast default columns in heap_getattr(), remove now
superfluous expansion in GetTupleForTrigger().

Author: Andres Freund
Discussion: https://postgr.es/m/20190201162404.onngi77f26baem4g@alap3.anarazel.de
Backpatch: 11, where fast defaults were introduced
2019-02-06 01:09:32 -08:00
Michael Paquier c9b75c5838 Simplify restriction handling of two-phase commit for temporary objects
There were two flags used to track the access to temporary tables and
to the temporary namespace of a session which are used to restrict
PREPARE TRANSACTION, however the first control flag is a concept
included in the second.  This removes the flag for temporary table
tracking, keeping around only the one at namespace level.

Author: Michael Paquier
Reviewed-by: Álvaro Herrera
Discussion: https://postgr.es/m/20190118053126.GH1883@paquier.xyz
2019-01-26 10:45:23 +09:00
Andres Freund 4b21acf522 Introduce access/{table.h, relation.h}, for generic functions from heapam.h.
access/heapam contains functions that are very storage specific (say
heap_insert() and a lot of lower level functions), and fairly generic
infrastructure like relation_open(), heap_open() etc.  In the upcoming
pluggable storage work we're introducing a layer between table
accesses in general and heapam, to allow for different storage
methods. For a bit cleaner separation it thus seems advantageous to
move generic functions like the aforementioned to their own headers.

access/relation.h will contain relation_open() etc, and access/table.h
will contain table_open() (formerly known as heap_open()). I've decided
for table.h not to include relation.h, but we might change that at a
later stage.

relation.h already exists in another directory, but the other
plausible name (rel.h) also conflicts. It'd be nice if there were a
non-conflicting name, but nobody came up with a suggestion. It's
possible that the appropriate way to address the naming conflict would
be to rename nodes/relation.h, which isn't particularly well named.

To avoid breaking a lot of extensions that just use heap_open() etc,
table.h has macros mapping the old names to the new ones, and heapam.h
includes relation, table.h.  That also allows to keep the
bulk renaming of existing callers in a separate commit.

Author: Andres Freund
Discussion: https://postgr.es/m/20190111000539.xbv7s6w7ilcvm7dp@alap3.anarazel.de
2019-01-21 10:51:36 -08:00
Tom Lane 1c53c4dec3 Finish reverting "recheck_on_update" patch.
This reverts commit c203d6cf8 and some follow-on fixes, completing the
task begun in commit 5d28c9bd7.  If that feature is ever resurrected,
the code will look quite a bit different from this, so it seems best
to start from a clean slate.

The v11 branch is not touched; in that branch, the recheck_on_update
storage option remains present, but nonfunctional and undocumented.

Discussion: https://postgr.es/m/20190114223409.3tcvejfhlvbucrv5@alap3.anarazel.de
2019-01-15 12:07:10 -05:00
Andres Freund 774a975c9a Make naming of tupdesc related structs more consistent with the rest of PG.
We usually don't change the name of structs between the struct name
itself and the name of the typedef. Additionally, structs that are
usually used via a typedef that hides being a pointer, are commonly
suffixed Data.  Change tupdesc code to follow those convention.

This is triggered by a future patch that intends to forward declare
TupleDescData in another header - keeping with the naming scheme makes
that easier to understand.

Author: Andres Freund
Discussion: https://postgr.es/m/20190114000701.y4ttcb74jpskkcfb@alap3.anarazel.de
2019-01-14 16:25:50 -08:00
Andres Freund 4c850ecec6 Don't include heapam.h from others headers.
heapam.h previously was included in a number of widely used
headers (e.g. execnodes.h, indirectly in executor.h, ...). That's
problematic on its own, as heapam.h contains a lot of low-level
details that don't need to be exposed that widely, but becomes more
problematic with the upcoming introduction of pluggable table storage
- it seems inappropriate for heapam.h to be included that widely
afterwards.

heapam.h was largely only included in other headers to get the
HeapScanDesc typedef (which was defined in heapam.h, even though
HeapScanDescData is defined in relscan.h). The better solution here
seems to be to just use the underlying struct (forward declared where
necessary). Similar for BulkInsertState.

Another problem was that LockTupleMode was used in executor.h - parts
of the file tried to cope without heapam.h, but due to the fact that
it indirectly included it, several subsequent violations of that goal
were not not noticed. We could just reuse the approach of declaring
parameters as int, but it seems nicer to move LockTupleMode to
lockoptions.h - that's not a perfect location, but also doesn't seem
bad.

As a number of files relied on implicitly included heapam.h, a
significant number of files grew an explicit include. It's quite
probably that a few external projects will need to do the same.

Author: Andres Freund
Reviewed-By: Alvaro Herrera
Discussion: https://postgr.es/m/20190114000701.y4ttcb74jpskkcfb@alap3.anarazel.de
2019-01-14 16:24:41 -08:00
Bruce Momjian 97c39498e5 Update copyright for 2019
Backpatch-through: certain files through 9.4
2019-01-02 12:44:25 -05:00
Tom Lane 6b0faf7236 Make collation-aware system catalog columns use "C" collation.
Up to now we allowed text columns in system catalogs to use collation
"default", but that isn't really safe because it might mean something
different in template0 than it means in a database cloned from template0.
In particular, this could mean that cloned pg_statistic entries for such
columns weren't entirely valid, possibly leading to bogus planner
estimates, though (probably) not any outright failures.

In the wake of commit 5e0928005, a better solution is available: if we
label such columns with "C" collation, then their pg_statistic entries
will also use that collation and hence will be valid independently of
the database collation.

This also provides a cleaner solution for indexes on such columns than
the hack added by commit 0b28ea79c: the indexes will naturally inherit
"C" collation and don't have to be forced to use text_pattern_ops.

Also, with the planned improvement of type "name" to be collation-aware,
this policy will apply cleanly to both text and name columns.

Because of the pg_statistic angle, we should also apply this policy
to the tables in information_schema.  This patch does that by adjusting
information_schema's textual domain types to specify "C" collation.
That has the user-visible effect that order-sensitive comparisons to
textual information_schema view columns will now use "C" collation
by default.  The SQL standard says that the collation of those view
columns is implementation-defined, so I think this is legal per spec.
At some point this might allow for translation of such comparisons
into indexable conditions on the underlying "name" columns, although
additional work will be needed before that can happen.

Discussion: https://postgr.es/m/19346.1544895309@sss.pgh.pa.us
2018-12-18 12:48:15 -05:00
Tom Lane b7a29695f7 Make TupleDescInitBuiltinEntry throw error for unsupported types.
Previously, it would just pass back a partially-uninitialized tupdesc,
which doesn't seem like a safe or useful behavior.

Backpatch to v10 where this code came in.

Discussion: https://postgr.es/m/30830.1544384975@sss.pgh.pa.us
2018-12-10 10:38:48 -05:00
Andres Freund 578b229718 Remove WITH OIDS support, change oid catalog column visibility.
Previously tables declared WITH OIDS, including a significant fraction
of the catalog tables, stored the oid column not as a normal column,
but as part of the tuple header.

This special column was not shown by default, which was somewhat odd,
as it's often (consider e.g. pg_class.oid) one of the more important
parts of a row.  Neither pg_dump nor COPY included the contents of the
oid column by default.

The fact that the oid column was not an ordinary column necessitated a
significant amount of special case code to support oid columns. That
already was painful for the existing, but upcoming work aiming to make
table storage pluggable, would have required expanding and duplicating
that "specialness" significantly.

WITH OIDS has been deprecated since 2005 (commit ff02d0a05280e0).
Remove it.

Removing includes:
- CREATE TABLE and ALTER TABLE syntax for declaring the table to be
  WITH OIDS has been removed (WITH (oids[ = true]) will error out)
- pg_dump does not support dumping tables declared WITH OIDS and will
  issue a warning when dumping one (and ignore the oid column).
- restoring an pg_dump archive with pg_restore will warn when
  restoring a table with oid contents (and ignore the oid column)
- COPY will refuse to load binary dump that includes oids.
- pg_upgrade will error out when encountering tables declared WITH
  OIDS, they have to be altered to remove the oid column first.
- Functionality to access the oid of the last inserted row (like
  plpgsql's RESULT_OID, spi's SPI_lastoid, ...) has been removed.

The syntax for declaring a table WITHOUT OIDS (or WITH (oids = false)
for CREATE TABLE) is still supported. While that requires a bit of
support code, it seems unnecessary to break applications / dumps that
do not use oids, and are explicit about not using them.

The biggest user of WITH OID columns was postgres' catalog. This
commit changes all 'magic' oid columns to be columns that are normally
declared and stored. To reduce unnecessary query breakage all the
newly added columns are still named 'oid', even if a table's column
naming scheme would indicate 'reloid' or such.  This obviously
requires adapting a lot code, mostly replacing oid access via
HeapTupleGetOid() with access to the underlying Form_pg_*->oid column.

The bootstrap process now assigns oids for all oid columns in
genbki.pl that do not have an explicit value (starting at the largest
oid previously used), only oids assigned later by oids will be above
FirstBootstrapObjectId. As the oid column now is a normal column the
special bootstrap syntax for oids has been removed.

Oids are not automatically assigned during insertion anymore, all
backend code explicitly assigns oids with GetNewOidWithIndex(). For
the rare case that insertions into the catalog via SQL are called for
the new pg_nextoid() function can be used (which only works on catalog
tables).

The fact that oid columns on system tables are now normal columns
means that they will be included in the set of columns expanded
by * (i.e. SELECT * FROM pg_class will now include the table's oid,
previously it did not). It'd not technically be hard to hide oid
column by default, but that'd mean confusing behavior would either
have to be carried forward forever, or it'd cause breakage down the
line.

While it's not unlikely that further adjustments are needed, the
scope/invasiveness of the patch makes it worthwhile to get merge this
now. It's painful to maintain externally, too complicated to commit
after the code code freeze, and a dependency of a number of other
patches.

Catversion bump, for obvious reasons.

Author: Andres Freund, with contributions by John Naylor
Discussion: https://postgr.es/m/20180930034810.ywp2c7awz7opzcfr@alap3.anarazel.de
2018-11-20 16:00:17 -08:00
Andres Freund 4da597edf1 Make TupleTableSlots extensible, finish split of existing slot type.
This commit completes the work prepared in 1a0586de36, splitting the
old TupleTableSlot implementation (which could store buffer, heap,
minimal and virtual slots) into four different slot types.  As
described in the aforementioned commit, this is done with the goal of
making tuple table slots extensible, to allow for pluggable table
access methods.

To achieve runtime extensibility for TupleTableSlots, operations on
slots that can differ between types of slots are performed using the
TupleTableSlotOps struct provided at slot creation time.  That
includes information from the size of TupleTableSlot struct to be
allocated, initialization, deforming etc.  See the struct's definition
for more detailed information about callbacks TupleTableSlotOps.

I decided to rename TTSOpsBufferTuple to TTSOpsBufferHeapTuple and
ExecCopySlotTuple to ExecCopySlotHeapTuple, as that seems more
consistent with other naming introduced in recent patches.

There's plenty optimization potential in the slot implementation, but
according to benchmarking the state after this commit has similar
performance characteristics to before this set of changes, which seems
sufficient.

There's a few changes in execReplication.c that currently need to poke
through the slot abstraction, that'll be repaired once the pluggable
storage patchset provides the necessary infrastructure.

Author: Andres Freund and  Ashutosh Bapat, with changes by Amit Khandekar
Discussion: https://postgr.es/m/20181105210039.hh4vvi4vwoq5ba2q@alap3.anarazel.de
2018-11-16 16:35:15 -08:00
Andrew Dunstan 040a1df614 Correctly set t_self for heap tuples in expand_tuple
Commit 16828d5c0 incorrectly set an invalid pointer for t_self for heap
tuples. This patch correctly copies it from the source tuple, and
includes a regression test that relies on it being set correctly.

Backpatch to release 11.

Fixes bug #15448 reported by Tillmann Schulz

Diagnosis and test case by Amit Langote
2018-10-24 10:56:27 -04:00
Andres Freund c5257345ef Move TupleTableSlots boolean member into one flag variable.
There's several reasons for this change:
1) It reduces the total size of TupleTableSlot / reduces alignment
   padding, making the commonly accessed members fit into a single
   cacheline (but we currently do not force proper alignment, so
   that's not yet guaranteed to be helpful)
2) Combining the booleans into a flag allows to combine read/writes
   from memory.
3) With the upcoming slot abstraction changes, it allows to have core
   and extended flags, in a memory efficient way.

Author: Ashutosh Bapat and Andres Freund
Discussion: https://postgr.es/m/20180220224318.gw4oe5jadhpmcdnm@alap3.anarazel.de
2018-10-15 18:23:25 -07:00
Andres Freund 9d906f1119 Move generic slot support functions from heaptuple.c into execTuples.c.
heaptuple.c was never a particular good fit for slot_getattr(),
slot_getsomeattrs() and slot_getmissingattrs(), but in upcoming
changes slots will be made more abstract (allowing slots that contain
different types of tuples), making it clearly the wrong place.

Note that slot_deform_tuple() remains in it's current place, as it
clearly deals with a HeapTuple.  getmissingattrs() also remains, but
it's less clear that that's correct - but execTuples.c wouldn't be the
right place.

Author: Ashutosh Bapat.
Discussion: https://postgr.es/m/20180220224318.gw4oe5jadhpmcdnm@alap3.anarazel.de
2018-10-15 15:17:04 -07:00
Andres Freund cc2905e963 Use slots more widely in tuple mapping code and make naming more consistent.
It's inefficient to use a single slot for mapping between tuple
descriptors for multiple tuples, as previously done when using
ConvertPartitionTupleSlot(), as that means the slot's tuple descriptors
change for every tuple.

Previously we also, via ConvertPartitionTupleSlot(), built new tuples
after the mapping even in cases where we, immediately afterwards,
access individual columns again.

Refactor the code so one slot, on demand, is used for each
partition. That avoids having to change the descriptor (and allows to
use the more efficient "fixed" tuple slots). Then use slot->slot
mapping, to avoid unnecessarily forming a tuple.

As the naming between the tuple and slot mapping functions wasn't
consistent, rename them to execute_attr_map_{tuple,slot}.  It's likely
that we'll also rename convert_tuples_by_* to denote that these
functions "only" build a map, but that's left for later.

Author: Amit Khandekar and Amit Langote, editorialized by me
Reviewed-By: Amit Langote, Amit Khandekar, Andres Freund
Discussion:
    https://postgr.es/m/CAJ3gD9fR0wRNeAE8VqffNTyONS_UfFPRpqxhnD9Q42vZB+Jvpg@mail.gmail.com
    https://postgr.es/m/e4f9d743-cd4b-efb0-7574-da21d86a7f36%40lab.ntt.co.jp
Backpatch: -
2018-10-02 11:14:26 -07:00
Andrew Dunstan 7636e5c60f Fast default trigger and expand_tuple fixes
Ensure that triggers get properly filled in tuples for the OLD value.
Also fix the logic of detecting missing null values. The previous logic
failed to detect a missing null column before the first missing column
with a default. Fixing this has simplified the logic a bit.

Regression tests are added to test changes. This should ensure better
coverage of expand_tuple().

Original bug reports, and some code and test scripts from Tomas Vondra

Backpatch to release 11.
2018-09-24 16:11:24 -04:00
Andres Freund 88ebd62fcc Deduplicate code between slot_getallattrs() and slot_getsomeattrs().
Code in slot_getallattrs() is the same as if slot_getsomeattrs() is
called with number of attributes specified in the tuple
descriptor. Implement it that way instead of duplicating the code
between those two functions.

This is part of a patchseries abstracting TupleTableSlots so they can
store arbitrary forms of tuples, but is a nice enough cleanup on its
own.

Author: Ashutosh Bapat
Reviewed-By: Andres Freund
Discussion: https://postgr.es/m/20180220224318.gw4oe5jadhpmcdnm@alap3.anarazel.de
2018-08-23 16:58:53 -07:00
Heikki Linnakangas 42f70cd9c3 Improve performance of tuple conversion map generation
Previously convert_tuples_by_name_map naively performed a search of each
outdesc column starting at the first column in indesc and searched each
indesc column until a match was found.  When partitioned tables had many
columns this could result in slow generation of the tuple conversion maps.
For INSERT and UPDATE statements that touched few rows, this could mean a
very large overhead indeed.

We can do a bit better with this loop.  It's quite likely that the columns
in partitioned tables and their partitions are in the same order, so it
makes sense to start searching for each column outer column at the inner
column position 1 after where the previous match was found (per idea from
Alexander Kuzmenkov). This makes the best case search O(N) instead of
O(N^2).  The worst case is still O(N^2), but it seems unlikely that would
happen.

Likewise, in the planner, make_inh_translation_list's search for the
matching column could often end up falling back on an O(N^2) type search.
This commit also improves that by first checking the column that follows
the previous match, instead of the column with the same attnum.  If we
fail to match here we fallback on the syscache's hashtable lookup.

Author: David Rowley
Reviewed-by: Alexander Kuzmenkov
Discussion: https://www.postgresql.org/message-id/CAKJS1f9-wijVgMdRp6_qDMEQDJJ%2BA_n%3DxzZuTmLx5Fz6cwf%2B8A%40mail.gmail.com
2018-07-13 19:54:05 +03:00
Alexander Korotkov edf59c40dd Fix more wrong paths in header comments
It appears that there are more files, whose header comment paths are
wrong.  So, fix those paths.  No backpatching per proposal of Tom Lane.

Discussion: https://postgr.es/m/CAPpHfdsJyYbOj59MOQL%2B4XxdcomLSLfLqBtAvwR%2BpsCqj3ELdQ%40mail.gmail.com
2018-07-11 17:57:04 +03:00
Amit Kapila 8121ab88e7 Cosmetic improvements for faster column addition.
Changed the name of few structure members for the sake of clarity and
removed spurious whitespace.

Reported-by: Amit Kapila
Author: Amit Kapila, based on suggestion by Andrew Dunstan
Reviewed-by: Alvaro Herrera
Discussion: https://postgr.es/m/CAA4eK1K2znsFpC+NQ9A4vxT4uDxADN4RmvHX0L6Y=aHVo9gB4Q@mail.gmail.com
2018-06-27 08:16:13 +05:30
Alexander Korotkov 4d54543efa Fix upper limit for vacuum_cleanup_index_scale_factor
6ca33a88 sets upper limit for vacuum_cleanup_index_scale_factor to
DBL_MAX.  DBL_MAX appears to be platform-dependent. That causes
many buildfarm animals to fail, because we check boundaries of
vacuum_cleanup_index_scale_factor in regression tests.

This commit changes upper limit from DBL_MAX to just "large enough"
limit, which was arbitrary selected as 1e10.

Author: Alexander Korotkov
Reported-by: Tom Lane, Darafei Praliaskouski
Discussion: https://postgr.es/m/CAPpHfdvewmr4PcpRjrkstoNn1n2_6dL-iHRB21CCfZ0efZdBTg%40mail.gmail.com
Discussion: https://postgr.es/m/CAC8Q8tLYFOpKNaPS_E7V8KtPdE%3D_TnAn16t%3DA3LuL%3DXjfOO-BQ%40mail.gmail.com
2018-06-26 21:55:59 +03:00
Alexander Korotkov 6ca33a885b Increase upper limit for vacuum_cleanup_index_scale_factor
Upper limits for vacuum_cleanup_index_scale_factor GUC and reloption
were initially set to 100.0 in 857f9c36.  However, after further
discussion, it appears that some users like to disable B-tree cleanup
index scan completely (assuming there are no deleted pages).

vacuum_cleanup_index_scale_factor is used barely to protect against
stalled index statistics.  And after detailed consideration it appears
that risk of stalled index statistics is low.  And it would be nice to
allow advanced users setting higher values of
vacuum_cleanup_index_scale_factor.  So, set upper limit for these
GUC and reloption to DBL_MAX.

Author: Alexander Korotkov
Reviewed-by: Masahiko Sawada
Discussion: https://postgr.es/m/CAC8Q8tJCb%3DgxhzcV7T6ctx7PY-Ux1oA-AsTJc6cAVNsQiYcCzA%40mail.gmail.com
2018-06-26 15:00:51 +03:00
Tom Lane bdf46af748 Post-feature-freeze pgindent run.
Discussion: https://postgr.es/m/15719.1523984266@sss.pgh.pa.us
2018-04-26 14:47:16 -04:00
Teodor Sigaev 075aade436 Adjust INCLUDE index truncation comments and code.
Add several assertions that ensure that we're dealing with a pivot tuple
without non-key attributes where that's expected.  Also, remove the
assertion within _bt_isequal(), restoring the v10 function signature.  A
similar check will be performed for the page highkey within
_bt_moveright() in most cases.  Also avoid dropping all objects within
regression tests, to increase pg_dump test coverage for INCLUDE indexes.

Rather than using infrastructure that's generally intended to be used
with reference counted heap tuple descriptors during truncation, use the
same function that was introduced to store flat TupleDescs in shared
memory (we use a temp palloc'd buffer).  This isn't strictly necessary,
but seems more future-proof than the old approach.  It also lets us
avoid including rel.h within indextuple.c, which was arguably a
modularity violation.  Also, we now call index_deform_tuple() with the
truncated TupleDesc, not the source TupleDesc, since that's more robust,
and saves a few cycles.

In passing, fix a memory leak by pfree'ing truncated pivot tuple memory
during CREATE INDEX.  Also pfree during a page split, just to be
consistent.

Refactor _bt_check_natts() to be more readable.

Author: Peter Geoghegan with some editorization by me
Reviewed by: Alexander Korotkov, Teodor Sigaev
Discussion: https://www.postgresql.org/message-id/CAH2-Wz%3DkCWuXeMrBCopC-tFs3FbiVxQNjjgNKdG2sHxZ5k2y3w%40mail.gmail.com
2018-04-19 08:45:58 +03:00
Andrew Dunstan 7c44c46deb Prevent segfault in expand_tuple with no missing values
Commit 16828d5c forgot to check that it had a set of missing values
before trying to retrieve a value from it.

An additional query to add coverage for this code is added to the
regression test.

Per bug report from Andreas Seltenreich.
2018-04-13 16:43:33 -04:00
Teodor Sigaev 34602b0a1d Remove unused variable in non-assert-enabled build
Use field of structure in Assert directly

Jeff Janes
2018-04-08 19:30:38 +03:00
Teodor Sigaev 8224de4f42 Indexes with INCLUDE columns and their support in B-tree
This patch introduces INCLUDE clause to index definition.  This clause
specifies a list of columns which will be included as a non-key part in
the index.  The INCLUDE columns exist solely to allow more queries to
benefit from index-only scans.  Also, such columns don't need to have
appropriate operator classes.  Expressions are not supported as INCLUDE
columns since they cannot be used in index-only scans.

Index access methods supporting INCLUDE are indicated by amcaninclude flag
in IndexAmRoutine.  For now, only B-tree indexes support INCLUDE clause.

In B-tree indexes INCLUDE columns are truncated from pivot index tuples
(tuples located in non-leaf pages and high keys).  Therefore, B-tree indexes
now might have variable number of attributes.  This patch also provides
generic facility to support that: pivot tuples contain number of their
attributes in t_tid.ip_posid.  Free 13th bit of t_info is used for indicating
that.  This facility will simplify further support of index suffix truncation.
The changes of above are backward-compatible, pg_upgrade doesn't need special
handling of B-tree indexes for that.

Bump catalog version

Author: Anastasia Lubennikova with contribition by Alexander Korotkov and me
Reviewed by: Peter Geoghegan, Tomas Vondra, Antonin Houska, Jeff Janes,
			 David Rowley, Alexander Korotkov
Discussion: https://www.postgresql.org/message-id/flat/56168952.4010101@postgrespro.ru
2018-04-07 23:00:39 +03:00
Teodor Sigaev 857f9c36cd Skip full index scan during cleanup of B-tree indexes when possible
Vacuum of index consists from two stages: multiple (zero of more) ambulkdelete
calls and one amvacuumcleanup call. When workload on particular table
is append-only, then autovacuum isn't intended to touch this table. However,
user may run vacuum manually in order to fill visibility map and get benefits
of index-only scans. Then ambulkdelete wouldn't be called for indexes
of such table (because no heap tuples were deleted), only amvacuumcleanup would
be called In this case, amvacuumcleanup would perform full index scan for
two objectives: put recyclable pages into free space map and update index
statistics.

This patch allows btvacuumclanup to skip full index scan when two conditions
are satisfied: no pages are going to be put into free space map and index
statistics isn't stalled. In order to check first condition, we store
oldest btpo_xact in the meta-page. When it's precedes RecentGlobalXmin, then
there are some recyclable pages. In order to check second condition we store
number of heap tuples observed during previous full index scan by cleanup.
If fraction of newly inserted tuples is less than
vacuum_cleanup_index_scale_factor, then statistics isn't considered to be
stalled. vacuum_cleanup_index_scale_factor can be defined as both reloption and GUC (default).

This patch bumps B-tree meta-page version. Upgrade of meta-page is performed
"on the fly": during VACUUM meta-page is rewritten with new version. No special
handling in pg_upgrade is required.

Author: Masahiko Sawada, Alexander Korotkov
Review by: Peter Geoghegan, Kyotaro Horiguchi, Alexander Korotkov, Yura Sokolov
Discussion: https://www.postgresql.org/message-id/flat/CAD21AoAX+d2oD_nrd9O2YkpzHaFr=uQeGr9s1rKC3O4ENc568g@mail.gmail.com
2018-04-04 19:29:00 +03:00
Tom Lane 0b11a674fb Fix a boatload of typos in C comments.
Justin Pryzby

Discussion: https://postgr.es/m/20180331105640.GK28454@telsasoft.com
2018-04-01 15:01:28 -04:00
Andrew Dunstan ed69864350 Small cleanups in fast default code.
Problems identified by Andres Freund and Haribabu Kommi
2018-04-01 08:16:18 +09:30
Andres Freund f4f5845b31 Quick adaption of JIT tuple deforming to the fast default patch.
Instead using memset to set tts_isnull, call the new
slot_getmissingattrs().

Also fix a bug (= instead of >=) in the code generation. Normally = is
correct, but when repeatedly deforming fields not in a
tuple (e.g. deform up to natts + 1 and then natts + 2) >= is needed.

Discussion: https://postgr.es/m/20180328010053.i2qvsuuusst4lgmc@alap3.anarazel.de
2018-03-27 21:03:10 -07:00
Andrew Dunstan 16828d5c02 Fast ALTER TABLE ADD COLUMN with a non-NULL default
Currently adding a column to a table with a non-NULL default results in
a rewrite of the table. For large tables this can be both expensive and
disruptive. This patch removes the need for the rewrite as long as the
default value is not volatile. The default expression is evaluated at
the time of the ALTER TABLE and the result stored in a new column
(attmissingval) in pg_attribute, and a new column (atthasmissing) is set
to true. Any existing row when fetched will be supplied with the
attmissingval. New rows will have the supplied value or the default and
so will never need the attmissingval.

Any time the table is rewritten all the atthasmissing and attmissingval
settings for the attributes are cleared, as they are no longer needed.

The most visible code change from this is in heap_attisnull, which
acquires a third TupleDesc argument, allowing it to detect a missing
value if there is one. In many cases where it is known that there will
not be any (e.g.  catalog relations) NULL can be passed for this
argument.

Andrew Dunstan, heavily modified from an original patch from Serge
Rielau.
Reviewed by Tom Lane, Andres Freund, Tomas Vondra and David Rowley.

Discussion: https://postgr.es/m/31e2e921-7002-4c27-59f5-51f08404c858@2ndQuadrant.com
2018-03-28 10:43:52 +10:30
Simon Riggs c203d6cf81 Allow HOT updates for some expression indexes
If the value of an index expression is unchanged after UPDATE,
allow HOT updates where previously we disallowed them, giving
a significant performance boost in those cases.

Particularly useful for indexes such as JSON->>field where the
JSON value changes but the indexed value does not.

Submitted as "surjective indexes" patch, now enabled by use
of new "recheck_on_update" parameter.

Author: Konstantin Knizhnik
Reviewer: Simon Riggs, with much wordsmithing and some cleanup
2018-03-27 19:57:02 +01:00
Andres Freund 32af96b2b1 JIT tuple deforming in LLVM JIT provider.
Performing JIT compilation for deforming gains performance benefits
over unJITed deforming from compile-time knowledge of the tuple
descriptor. Fixed column widths, NOT NULLness, etc can be taken
advantage of.

Right now the JITed deforming is only used when deforming tuples as
part of expression evaluation (and obviously only if the descriptor is
known). It's likely to be beneficial in other cases, too.

By default tuple deforming is JITed whenever an expression is JIT
compiled. There's a separate boolean GUC controlling it, but that's
expected to be primarily useful for development and benchmarking.

Docs will follow in a later commit containing docs for the whole JIT
feature.

Author: Andres Freund
Discussion: https://postgr.es/m/20170901064131.tazjxwus3k2w3ybh@alap3.anarazel.de
2018-03-26 12:57:19 -07:00
Tom Lane 8f5ac44043 Fix WHERE CURRENT OF when the referenced cursor uses an index-only scan.
"UPDATE/DELETE WHERE CURRENT OF cursor_name" failed, with an error message
like "cannot extract system attribute from virtual tuple", if the cursor
was using a index-only scan for the target table.  Fix it by digging the
current TID out of the indexscan state.

It seems likely that the same failure could occur for CustomScan plans
and perhaps some FDW plan types, so that leaving this to be treated as an
internal error with an obscure message isn't as good an idea as it first
seemed.  Hence, add a bit of heaptuple.c infrastructure to let us deliver
a more on-topic message.  I chose to make the message match what you get
for the case where execCurrentOf can't identify the target scan node at
all, "cursor "foo" is not a simply updatable scan of table "bar"".
Perhaps it should be different, but we can always adjust that later.

In the future, it might be nice to provide hooks that would let custom
scan providers and/or FDWs deal with this in other ways; but that's
not a suitable topic for a back-patchable bug fix.

It's been like this all along, so back-patch to all supported branches.

Yugo Nagata and Tom Lane

Discussion: https://postgr.es/m/20180201013349.937dfc5f.nagata@sraoss.co.jp
2018-03-17 14:59:49 -04:00
Andres Freund 4c0ec9ee28 Use platform independent type for TupleTableSlot->tts_off.
Previously tts_off was, for unknown reasons, of type long. For one
that's unnecessary as tuples are restricted in length, for another
long would be a bad choice of type even if that weren't the case, as
it's not reliably wider than an int. Also HeapTupleHeader->t_len is a
uint32.

This is split off from a larger patch implementing JITed tuple
deforming. Seems like an independent improvement, as tiny as it is.

Author: Andres Freund
2018-02-20 15:12:52 -08:00
Tom Lane fb8697b31a Avoid unnecessary use of pg_strcasecmp for already-downcased identifiers.
We have a lot of code in which option names, which from the user's
viewpoint are logically keywords, are passed through the grammar as plain
identifiers, and then matched to string literals during command execution.
This approach avoids making words into lexer keywords unnecessarily.  Some
places matched these strings using plain strcmp, some using pg_strcasecmp.
But the latter should be unnecessary since identifiers would have been
downcased on their way through the parser.  Aside from any efficiency
concerns (probably not a big factor), the lack of consistency in this area
creates a hazard of subtle bugs due to different places coming to different
conclusions about whether two option names are the same or different.
Hence, standardize on using strcmp() to match any option names that are
expected to have been fed through the parser.

This does create a user-visible behavioral change, which is that while
formerly all of these would work:
	alter table foo set (fillfactor = 50);
	alter table foo set (FillFactor = 50);
	alter table foo set ("fillfactor" = 50);
	alter table foo set ("FillFactor" = 50);
now the last case will fail because that double-quoted identifier is
different from the others.  However, none of our documentation says that
you can use a quoted identifier in such contexts at all, and we should
discourage doing so since it would break if we ever decide to parse such
constructs as true lexer keywords rather than poor man's substitutes.
So this shouldn't create a significant compatibility issue for users.

Daniel Gustafsson, reviewed by Michael Paquier, small changes by me

Discussion: https://postgr.es/m/29405B24-564E-476B-98C0-677A29805B84@yesql.se
2018-01-26 18:25:14 -05:00
Alvaro Herrera 8b08f7d482 Local partitioned indexes
When CREATE INDEX is run on a partitioned table, create catalog entries
for an index on the partitioned table (which is just a placeholder since
the table proper has no data of its own), and recurse to create actual
indexes on the existing partitions; create them in future partitions
also.

As a convenience gadget, if the new index definition matches some
existing index in partitions, these are picked up and used instead of
creating new ones.  Whichever way these indexes come about, they become
attached to the index on the parent table and are dropped alongside it,
and cannot be dropped on isolation unless they are detached first.

To support pg_dump'ing these indexes, add commands
    CREATE INDEX ON ONLY <table>
(which creates the index on the parent partitioned table, without
recursing) and
    ALTER INDEX ATTACH PARTITION
(which is used after the indexes have been created individually on each
partition, to attach them to the parent index).  These reconstruct prior
database state exactly.

Reviewed-by: (in alphabetical order) Peter Eisentraut, Robert Haas, Amit
	Langote, Jesper Pedersen, Simon Riggs, David Rowley
Discussion: https://postgr.es/m/20171113170646.gzweigyrgg6pwsg4@alvherre.pgsql
2018-01-19 11:49:22 -03:00
Tom Lane 47c6772eb7 Clean up tupdesc.c for recent changes.
TupleDescCopy needs to have the same effects as CreateTupleDescCopy in
that, since it doesn't copy constraints, it should clear the per-attribute
fields associated with them.  Oversight in commit cc5f81366.

Since TupleDescCopy has already established the presumption that it
can just flat-copy the entire attribute array in one go, propagate
that approach into CreateTupleDescCopy and CreateTupleDescCopyConstr.
(I'm suspicious that this would lead to valgrind complaints if we
had any trailing padding in the struct, but we do not, and anyway
fixing that seems like a job for a separate commit.)

Add some better comments.

Thomas Munro, reviewed by Vik Fearing, some additional hacking by me

Discussion: https://postgr.es/m/CAEepm=0NvOGZ8B6GbQyQe2C_c2m3LKJ9w=8OMBaYRLgZ_Gw6Nw@mail.gmail.com
2018-01-03 17:53:41 -05:00
Bruce Momjian 9d4649ca49 Update copyright for 2018
Backpatch-through: certain files through 9.3
2018-01-02 23:30:12 -05:00
Robert Haas eaedf0df71 Update typedefs.list and re-run pgindent
Discussion: http://postgr.es/m/CA+TgmoaA9=1RWKtBWpDaj+sF3Stgc8sHgf5z=KGtbjwPLQVDMA@mail.gmail.com
2017-11-29 09:24:24 -05:00
Simon Riggs c2513365a0 Parameter toast_tuple_target controls TOAST for new rows
Specifies the point at which we try to move long column values
into TOAST tables.

No effect on existing rows.

Discussion: https://postgr.es/m/CANP8+jKsVmw6CX6YP9z7zqkTzcKV1+Uzr3XjKcZW=2Ya00OyQQ@mail.gmail.com

Author: Simon Riggs <simon@2ndQudrant.com>
Reviewed-by: Andrew Dunstan <andrew.dunstan@2ndQuadrant.com>
2017-11-20 09:50:10 +11:00
Peter Eisentraut 0e1539ba0d Add some const decorations to prototypes
Reviewed-by: Fabien COELHO <coelho@cri.ensmp.fr>
2017-11-10 13:38:57 -05:00
Peter Eisentraut 2eb4a831e5 Change TRUE/FALSE to true/false
The lower case spellings are C and C++ standard and are used in most
parts of the PostgreSQL sources.  The upper case spellings are only used
in some files/modules.  So standardize on the standard spellings.

The APIs for ICU, Perl, and Windows define their own TRUE and FALSE, so
those are left as is when using those APIs.

In code comments, we use the lower-case spelling for the C concepts and
keep the upper-case spelling for the SQL concepts.

Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
2017-11-08 11:37:28 -05:00
Andres Freund 31079a4a8e Replace remaining uses of pq_sendint with pq_sendint{8,16,32}.
pq_sendint() remains, so extension code doesn't unnecessarily break.

Author: Andres Freund
Discussion: https://postgr.es/m/20170914063418.sckdzgjfrsbekae4@alap3.anarazel.de
2017-10-11 21:00:46 -07:00
Andres Freund 4c119fbcd4 Improve performance of SendRowDescriptionMessage.
There's three categories of changes leading to better performance:
- Splitting the per-attribute part of SendRowDescriptionMessage into a
  v2 and a v3 version allows avoiding branches for every attribute.
- Preallocating the size of the buffer to be big enough for all
  attributes and then using pq_write* avoids unnecessary buffer
  size checks & resizing.
- Reusing a persistently allocated StringInfo for all
  SendRowDescriptionMessage() invocations avoids repeated allocations
  & reallocations.

Author: Andres Freund
Discussion: https://postgr.es/m/20170914063418.sckdzgjfrsbekae4@alap3.anarazel.de
2017-10-11 17:23:23 -07:00
Andres Freund f2dec34e19 Use one stringbuffer for all rows printed in printtup.c.
This avoids newly allocating, and then possibly growing, the
stringbuffer for every row. For wide rows this can substantially
reduce memory allocator overhead, at the price of not immediately
reducing memory usage after outputting an especially wide row.

Author: Andres Freund
Discussion: https://postgr.es/m/20170914063418.sckdzgjfrsbekae4@alap3.anarazel.de
2017-10-11 16:26:35 -07:00
Robert Haas 6a2fa09c0c For wal_consistency_checking, mask page checksum as well as page LSN.
If the LSN is different, the checksum will be different, too.

Ashwin Agrawal, reviewed by Michael Paquier and Kuntal Ghosh

Discussion: http://postgr.es/m/CALfoeis5iqrAU-+JAN+ZzXkpPr7+-0OAGv7QUHwFn=-wDy4o4Q@mail.gmail.com
2017-09-22 14:28:22 -04:00
Andres Freund cc5f81366c Add support for coordinating record typmods among parallel workers.
Tuples can have type RECORDOID and a typmod number that identifies a blessed
TupleDesc in a backend-private cache.  To support the sharing of such tuples
through shared memory and temporary files, provide a typmod registry in
shared memory.

To achieve that, introduce per-session DSM segments, created on demand when a
backend first runs a parallel query.  The per-session DSM segment has a
table-of-contents just like the per-query DSM segment, and initially the
contents are a shared record typmod registry and a DSA area to provide the
space it needs to grow.

State relating to the current session is accessed via a Session object
reached through global variable CurrentSession that may require significant
redesign further down the road as we figure out what else needs to be shared
or remodelled.

Author: Thomas Munro
Reviewed-By: Andres Freund
Discussion: https://postgr.es/m/CAEepm=0ZtQ-SpsgCyzzYpsXS6e=kZWqk3g5Ygn3MDV7A8dabUA@mail.gmail.com
2017-09-14 19:59:21 -07:00
Andres Freund 35ea75632a Refactor typcache.c's record typmod hash table.
Previously, tuple descriptors were stored in chains keyed by a fixed size
array of OIDs.  That meant there were effectively two levels of collision
chain -- one inside and one outside the hash table.  Instead, let dynahash.c
look after conflicts for us by supplying a proper hash and equal function
pair.

This is a nice cleanup on its own, but also simplifies followup
changes allowing blessed TupleDescs to be shared between backends
participating in parallel query.

Author: Thomas Munro
Reviewed-By: Andres Freund
Discussion: https://postgr.es/m/CAEepm%3D34GVhOL%2BarUx56yx7OPk7%3DqpGsv3CpO54feqjAwQKm5g%40mail.gmail.com
2017-08-22 16:11:54 -07:00
Andres Freund c6293249dc Partially flatten struct tupleDesc so that it can be used in DSM.
TupleDesc's attributes were already stored in contiguous memory after the
struct.  Go one step further and get rid of the array of pointers to
attributes so that they can be stored in shared memory mapped at different
addresses in each backend.  This won't work for TupleDescs with contraints
and defaults, since those point to other objects, but for many purposes
only attributes are needed.

Author: Thomas Munro
Reviewed-By: Andres Freund
Discussion: https://postgr.es/m/CAEepm=0ZtQ-SpsgCyzzYpsXS6e=kZWqk3g5Ygn3MDV7A8dabUA@mail.gmail.com
2017-08-20 11:19:12 -07:00
Andres Freund 2cd7084524 Change tupledesc->attrs[n] to TupleDescAttr(tupledesc, n).
This is a mechanical change in preparation for a later commit that
will change the layout of TupleDesc.  Introducing a macro to abstract
the details of where attributes are stored will allow us to change
that in separate step and revise it in future.

Author: Thomas Munro, editorialized by Andres Freund
Reviewed-By: Andres Freund
Discussion: https://postgr.es/m/CAEepm=0ZtQ-SpsgCyzzYpsXS6e=kZWqk3g5Ygn3MDV7A8dabUA@mail.gmail.com
2017-08-20 11:19:07 -07:00
Tom Lane 382ceffdf7 Phase 3 of pgindent updates.
Don't move parenthesized lines to the left, even if that means they
flow past the right margin.

By default, BSD indent lines up statement continuation lines that are
within parentheses so that they start just to the right of the preceding
left parenthesis.  However, traditionally, if that resulted in the
continuation line extending to the right of the desired right margin,
then indent would push it left just far enough to not overrun the margin,
if it could do so without making the continuation line start to the left of
the current statement indent.  That makes for a weird mix of indentations
unless one has been completely rigid about never violating the 80-column
limit.

This behavior has been pretty universally panned by Postgres developers.
Hence, disable it with indent's new -lpl switch, so that parenthesized
lines are always lined up with the preceding left paren.

This patch is much less interesting than the first round of indent
changes, but also bulkier, so I thought it best to separate the effects.

Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
2017-06-21 15:35:54 -04:00
Tom Lane c7b8998ebb Phase 2 of pgindent updates.
Change pg_bsd_indent to follow upstream rules for placement of comments
to the right of code, and remove pgindent hack that caused comments
following #endif to not obey the general rule.

Commit e3860ffa4d wasn't actually using
the published version of pg_bsd_indent, but a hacked-up version that
tried to minimize the amount of movement of comments to the right of
code.  The situation of interest is where such a comment has to be
moved to the right of its default placement at column 33 because there's
code there.  BSD indent has always moved right in units of tab stops
in such cases --- but in the previous incarnation, indent was working
in 8-space tab stops, while now it knows we use 4-space tabs.  So the
net result is that in about half the cases, such comments are placed
one tab stop left of before.  This is better all around: it leaves
more room on the line for comment text, and it means that in such
cases the comment uniformly starts at the next 4-space tab stop after
the code, rather than sometimes one and sometimes two tabs after.

Also, ensure that comments following #endif are indented the same
as comments following other preprocessor commands such as #else.
That inconsistency turns out to have been self-inflicted damage
from a poorly-thought-through post-indent "fixup" in pgindent.

This patch is much less interesting than the first round of indent
changes, but also bulkier, so I thought it best to separate the effects.

Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
2017-06-21 15:19:25 -04:00
Tom Lane e3860ffa4d Initial pgindent run with pg_bsd_indent version 2.0.
The new indent version includes numerous fixes thanks to Piotr Stefaniak.
The main changes visible in this commit are:

* Nicer formatting of function-pointer declarations.
* No longer unexpectedly removes spaces in expressions using casts,
  sizeof, or offsetof.
* No longer wants to add a space in "struct structname *varname", as
  well as some similar cases for const- or volatile-qualified pointers.
* Declarations using PG_USED_FOR_ASSERTS_ONLY are formatted more nicely.
* Fixes bug where comments following declarations were sometimes placed
  with no space separating them from the code.
* Fixes some odd decisions for comments following case labels.
* Fixes some cases where comments following code were indented to less
  than the expected column 33.

On the less good side, it now tends to put more whitespace around typedef
names that are not listed in typedefs.list.  This might encourage us to
put more effort into typedef name collection; it's not really a bug in
indent itself.

There are more changes coming after this round, having to do with comment
indentation and alignment of lines appearing within parentheses.  I wanted
to limit the size of the diffs to something that could be reviewed without
one's eyes completely glazing over, so it seemed better to split up the
changes as much as practical.

Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
2017-06-21 14:39:04 -04:00
Bruce Momjian a6fd7b7a5f Post-PG 10 beta1 pgindent run
perltidy run not included.
2017-05-17 16:31:56 -04:00
Alvaro Herrera b66adb7b0c Revert "Permit dump/reload of not-too-large >1GB tuples"
This reverts commits fa2fa99552 and 42f50cb8fa.

While the functionality that was intended to be provided by these
commits is desired, the patch didn't actually solve as many of the
problematic situations as we hoped, and it created a bunch of its own
problems.  Since we're going to require more extensive changes soon for
other reasons and users have been working around these problems for a
long time already, there is no point in spending effort in fixing this
halfway measure.

Per complaint from Tom Lane.
Discussion: https://postgr.es/m/21407.1484606922@sss.pgh.pa.us

(Commit fa2fa99552 had already been reverted in branches 9.5 as
f858524ee4 and 9.6 as e9e44a0953, so this touches master only.
Commit 42f50cb8fa was not present in the older branches.)
2017-05-10 18:41:27 -03:00
Tom Lane 3f902354b0 Clean up after insufficiently-researched optimization of tuple conversions.
tupconvert.c's functions formerly considered that an explicit tuple
conversion was necessary if the input and output tupdescs contained
different type OIDs.  The point of that was to make sure that a composite
datum resulting from the conversion would contain the destination rowtype
OID in its composite-datum header.  However, commit 3838074f8 entirely
misunderstood what that check was for, thinking that it had something to do
with presence or absence of an OID column within the tuple.  Removal of the
check broke the no-op conversion path in ExecEvalConvertRowtype, as
reported by Ashutosh Bapat.

It turns out that of the dozen or so call sites for tupconvert.c functions,
ExecEvalConvertRowtype is the only one that cares about the composite-datum
header fields in the output tuple.  In all the rest, we'd much rather avoid
an unnecessary conversion whenever the tuples are physically compatible.
Moreover, the comments in tupconvert.c only promise physical compatibility
not a metadata match.  So, let's accept the removal of the guarantee about
the output tuple's rowtype marking, recognizing that this is a API change
that could conceivably break third-party callers of tupconvert.c.  (So,
let's remember to mention it in the v10 release notes.)

However, commit 3838074f8 did have a bit of a point here, in that two
tuples mustn't be considered physically compatible if one has HEAP_HASOID
set and the other doesn't.  (Some of the callers of tupconvert.c might not
really care about that, but we can't assume it in general.)  The previous
check accidentally covered that issue, because no RECORD types ever have
OIDs, while if two tupdescs have the same named composite type OID then,
a fortiori, they have the same tdhasoid setting.  If we're removing the
type OID match check then we'd better include tdhasoid match as part of
the physical compatibility check.

Without that hack in tupconvert.c, we need ExecEvalConvertRowtype to take
responsibility for inserting the correct rowtype OID label whenever
tupconvert.c decides it need not do anything.  This is easily done with
heap_copy_tuple_as_datum, which will be considerably faster than a tuple
disassembly and reassembly anyway; so from a performance standpoint this
change is a win all around compared to what happened in earlier branches.
It just means a couple more lines of code in ExecEvalConvertRowtype.

Ashutosh Bapat and Tom Lane

Discussion: https://postgr.es/m/CAFjFpRfvHABV6+oVvGcshF8rHn+1LfRUhj7Jz1CDZ4gPUwehBg@mail.gmail.com
2017-04-06 21:10:20 -04:00
Peter Eisentraut 3217327053 Identity columns
This is the SQL standard-conforming variant of PostgreSQL's serial
columns.  It fixes a few usability issues that serial columns have:

- CREATE TABLE / LIKE copies default but refers to same sequence
- cannot add/drop serialness with ALTER TABLE
- dropping default does not drop sequence
- need to grant separate privileges to sequence
- other slight weirdnesses because serial is some kind of special macro

Reviewed-by: Vitaly Burovoy <vitaly.burovoy@gmail.com>
2017-04-06 08:41:37 -04:00
Alvaro Herrera 7526e10224 BRIN auto-summarization
Previously, only VACUUM would cause a page range to get initially
summarized by BRIN indexes, which for some use cases takes too much time
since the inserts occur.  To avoid the delay, have brininsert request a
summarization run for the previous range as soon as the first tuple is
inserted into the first page of the next range.  Autovacuum is in charge
of processing these requests, after doing all the regular vacuuming/
analyzing work on tables.

This doesn't impose any new tasks on autovacuum, because autovacuum was
already in charge of doing summarizations.  The only actual effect is to
change the timing, i.e. that it occurs earlier.  For this reason, we
don't go any great lengths to record these requests very robustly; if
they are lost because of a server crash or restart, they will happen at
a later time anyway.

Most of the new code here is in autovacuum, which can now be told about
"work items" to process.  This can be used for other things such as GIN
pending list cleaning, perhaps visibility map bit setting, both of which
are currently invoked during vacuum, but do not really depend on vacuum
taking place.

The requests are at the page range level, a granularity for which we did
not have SQL-level access; we only had index-level summarization
requests via brin_summarize_new_values().  It seems reasonable to add
SQL-level access to range-level summarization too, so add a function
brin_summarize_range() to do that.

Authors: Álvaro Herrera, based on sketch from Simon Riggs.
Reviewed-by: Thomas Munro.
Discussion: https://postgr.es/m/20170301045823.vneqdqkmsd4as4ds@alvherre.pgsql
2017-04-01 14:00:53 -03:00
Robert Haas c94e6942ce Don't allocate storage for partitioned tables.
Also, don't allow setting reloptions on them, since that would have no
effect given the lack of storage.  The patch does this by introducing
a new reloption kind for which there are currently no reloptions -- we
might have some in the future -- so it adjusts parseRelOptions to
handle that case correctly.

Bumped catversion.  System catalogs that contained reloptions for
partitioned tables are no longer valid; plus, there are now fewer
physical files on disk, which is not technically a catalog change but
still a good reason to re-initdb.

Amit Langote, reviewed by Maksim Milyutin and Kyotaro Horiguchi and
revised a bit by me.

Discussion: http://postgr.es/m/20170331.173326.212311140.horiguchi.kyotaro@lab.ntt.co.jp
2017-03-31 16:28:51 -04:00
Noah Misch 2fd26b23b6 Assume deconstruct_array() outputs are untoasted.
In functions that issue a deconstruct_array() call, consistently use
plain VARSIZE()/VARDATA() on the array elements.  Prior practice was
divided between those and VARSIZE_ANY_EXHDR()/VARDATA_ANY().
2017-03-12 19:35:31 -04:00
Simon Riggs 21d4e2e206 Reduce lock levels for table storage params related to planning
The following parameters are now updateable with ShareUpdateExclusiveLock
effective_io_concurrency
parallel_workers
seq_page_cost
random_page_cost
n_distinct
n_distinct_inherited

Simon Riggs and Fabrízio Mello
2017-03-06 16:04:31 +05:30
Tom Lane c29aff959d Consistently declare timestamp variables as TimestampTz.
Twiddle the replication-related code so that its timestamp variables
are declared TimestampTz, rather than the uninformative "int64" that
was previously used for meant-to-be-always-integer timestamps.
This resolves the int64-vs-TimestampTz declaration inconsistencies
introduced by commit 7c030783a, though in the opposite direction to
what was originally suggested.

This required including datatype/timestamp.h in a couple more places
than before.  I decided it would be a good idea to slim down that
header by not having it pull in <float.h> etc, as those headers are
no longer at all relevant to its purpose.  Unsurprisingly, a small number
of .c files turn out to have been depending on those inclusions, so add
them back in the .c files as needed.

Discussion: https://postgr.es/m/26788.1487455319@sss.pgh.pa.us
Discussion: https://postgr.es/m/27694.1487456324@sss.pgh.pa.us
2017-02-23 15:57:08 -05:00
Robert Haas fb47544d0c Minor fixes for WAL consistency checking.
Michael Paquier, reviewed and slightly revised by me.

Discussion: http://postgr.es/m/CAB7nPqRzCQb=vdfHvMtP0HMLBHU6z1aGdo4GJsUP-HP8jx+Pkw@mail.gmail.com
2017-02-14 12:41:01 -05:00
Robert Haas a507b86900 Add WAL consistency checking facility.
When the new GUC wal_consistency_checking is set to a non-empty value,
it triggers recording of additional full-page images, which are
compared on the standby against the results of applying the WAL record
(without regard to those full-page images).  Allowable differences
such as hints are masked out, and the resulting pages are compared;
any difference results in a FATAL error on the standby.

Kuntal Ghosh, based on earlier patches by Michael Paquier and Heikki
Linnakangas.  Extensively reviewed and revised by Michael Paquier and
by me, with additional reviews and comments from Amit Kapila, Álvaro
Herrera, Simon Riggs, and Peter Eisentraut.
2017-02-08 15:45:30 -05:00
Robert Haas bbd8550bce Refactor other replication commands to use DestRemoteSimple.
Commit a84069d935 added a new type of
DestReceiver to avoid duplicating the existing code for the SHOW
command, but it turns out we can leverage that new DestReceiver
type in a few more places, saving some code.

Michael Paquier, reviewed by Andres Freund and by me.

Discussion: http://postgr.es/m/CAB7nPqSdFOQC0evc0r1nJeQyGBqjBrR41MC4rcMqUUpoJaZbtQ%40mail.gmail.com
Discussion: http://postgr.es/m/CAB7nPqT2K4XFT1JgqufFBjsOc-NUKXg5qBDucHPMbk6Xi1kYaA@mail.gmail.com
2017-02-01 13:42:41 -05:00
Robert Haas 3838074f86 Be more aggressive in avoiding tuple conversion.
According to the comments in tupconvert.c, it's necessary to perform
tuple conversion when either table has OIDs, and this was previously
checked by ensuring that the tdtypeid value matched between the tables
in question.  However, that's overly stringent: we have access to
tdhasoid and can test directly whether OIDs are present, which lets us
avoid conversion in cases where the type OIDs are different but the
tuple descriptors are entirely the same (and neither has OIDs).  This
is useful to the partitioning code, which can thereby avoid converting
tuples when inserting into a partition whose columns appear in the
same order as the parent columns, the normal case.  It's possible
for the tuple routing code to avoid some additional overhead in this
case as well, so do that, too.

It's not clear whether it would be OK to skip this when both tables
have OIDs: do callers count on this to build a new tuple (losing the
previous OID) in such instances?  Until we figure it out, leave the
behavior in that case alone.

Amit Langote, reviewed by me.
2017-01-24 21:53:38 -05:00
Robert Haas d1ecd53947 Add a SHOW command to the replication command language.
This is useful infrastructure for an upcoming proposed patch to
allow the WAL segment size to be changed at initdb time; tools like
pg_basebackup need the ability to interrogate the server setting.
But it also doesn't seem like a bad thing to have independently of
that; it may find other uses in the future.

Robert Haas and Beena Emerson.  (The original patch here was by
Beena, but I rewrote it to such a degree that most of the code
being committed here is mine.)

Discussion: http://postgr.es/m/CA+TgmobNo4qz06wHEmy9DszAre3dYx-WNhHSCbU9SAwf+9Ft6g@mail.gmail.com
2017-01-24 17:04:12 -05:00
Robert Haas a84069d935 Add a new DestReceiver for printing tuples without catalog access.
If you create a DestReciver of type DestRemote and try to use it from
a replication connection that is not bound to a specific daabase, or
any other hypothetical type of backend that is not bound to a specific
database, it will fail because it doesn't have a pg_proc catalog to
look up properties of the types being printed.  In general, that's
an unavoidable problem, but we can hardwire the properties of a few
builtin types in order to support utility commands.  This new
DestReceiver of type DestRemoteSimple does just that.

Patch by me, reviewed by Michael Paquier.

Discussion: http://postgr.es/m/CA+TgmobNo4qz06wHEmy9DszAre3dYx-WNhHSCbU9SAwf+9Ft6g@mail.gmail.com
2017-01-24 16:53:56 -05:00
Alvaro Herrera 3957b58b88 Fix ALTER TABLE / SET TYPE for irregular inheritance
If inherited tables don't have exactly the same schema, the USING clause
in an ALTER TABLE / SET DATA TYPE misbehaves when applied to the
children tables since commit 9550e8348b.  Starting with that commit,
the attribute numbers in the USING expression are fixed during parse
analysis.  This can lead to bogus errors being reported during
execution, such as:
   ERROR:  attribute 2 has wrong type
   DETAIL:  Table has type smallint, but query expects integer.

Since it wouldn't do to revert to the original coding, we now apply a
transformation to map the attribute numbers to the correct ones for each
child.

Reported by Justin Pryzby
Analysis by Tom Lane; patch by me.
Discussion: https://postgr.es/m/20170102225618.GA10071@telsasoft.com
2017-01-09 19:26:58 -03:00
Bruce Momjian 1d25779284 Update copyright via script for 2017 2017-01-03 13:48:53 -05:00
Robert Haas f0e44751d7 Implement table partitioning.
Table partitioning is like table inheritance and reuses much of the
existing infrastructure, but there are some important differences.
The parent is called a partitioned table and is always empty; it may
not have indexes or non-inherited constraints, since those make no
sense for a relation with no data of its own.  The children are called
partitions and contain all of the actual data.  Each partition has an
implicit partitioning constraint.  Multiple inheritance is not
allowed, and partitioning and inheritance can't be mixed.  Partitions
can't have extra columns and may not allow nulls unless the parent
does.  Tuples inserted into the parent are automatically routed to the
correct partition, so tuple-routing ON INSERT triggers are not needed.
Tuple routing isn't yet supported for partitions which are foreign
tables, and it doesn't handle updates that cross partition boundaries.

Currently, tables can be range-partitioned or list-partitioned.  List
partitioning is limited to a single column, but range partitioning can
involve multiple columns.  A partitioning "column" can be an
expression.

Because table partitioning is less general than table inheritance, it
is hoped that it will be easier to reason about properties of
partitions, and therefore that this will serve as a better foundation
for a variety of possible optimizations, including query planner
optimizations.  The tuple routing based which this patch does based on
the implicit partitioning constraints is an example of this, but it
seems likely that many other useful optimizations are also possible.

Amit Langote, reviewed and tested by Robert Haas, Ashutosh Bapat,
Amit Kapila, Rajkumar Raghuwanshi, Corey Huinker, Jaime Casanova,
Rushabh Lathia, Erik Rijkers, among others.  Minor revisions by me.
2016-12-07 13:17:55 -05:00
Alvaro Herrera fa2fa99552 Permit dump/reload of not-too-large >1GB tuples
Our documentation states that our maximum field size is 1 GB, and that
our maximum row size of 1.6 TB.  However, while this might be attainable
in theory with enough contortions, it is not workable in practice; for
starters, pg_dump fails to dump tables containing rows larger than 1 GB,
even if individual columns are well below the limit; and even if one
does manage to manufacture a dump file containing a row that large, the
server refuses to load it anyway.

This commit enables dumping and reloading of such tuples, provided two
conditions are met:

1. no single column is larger than 1 GB (in output size -- for bytea
   this includes the formatting overhead)
2. the whole row is not larger than 2 GB

There are three related changes to enable this:

a. StringInfo's API now has two additional functions that allow creating
a string that grows beyond the typical 1GB limit (and "long" string).
ABI compatibility is maintained.  We still limit these strings to 2 GB,
though, for reasons explained below.

b. COPY now uses long StringInfos, so that pg_dump doesn't choke
trying to emit rows longer than 1GB.

c. heap_form_tuple now uses the MCXT_ALLOW_HUGE flag in its allocation
for the input tuple, which means that large tuples are accepted on
input.  Note that at this point we do not apply any further limit to the
input tuple size.

The main reason to limit to 2 GB is that the FE/BE protocol uses 32 bit
length words to describe each row; and because the documentation is
ambiguous on its signedness and libpq does consider it signed, we cannot
use the highest-order bit.  Additionally, the StringInfo API uses "int"
(which is 4 bytes wide in most platforms) in many places, so we'd need
to change that API too in order to improve, which has lots of fallout.

Backpatch to 9.5, which is the oldest that has
MemoryContextAllocExtended, a necessary piece of infrastructure.  We
could apply to 9.4 with very minimal additional effort, but any further
than that would require backpatching "huge" allocations too.

This is the largest set of changes we could find that can be
back-patched without breaking compatibility with existing systems.
Fixing a bigger set of problems (for example, dumping tuples bigger than
2GB, or dumping fields bigger than 1GB) would require changing the FE/BE
protocol and/or changing the StringInfo API in an ABI-incompatible way,
neither of which would be back-patchable.

Authors: Daniel Vérité, Álvaro Herrera
Reviewed by: Tomas Vondra
Discussion: https://postgr.es/m/20160229183023.GA286012@alvherre.pgsql
2016-12-02 00:34:01 -03:00
Tom Lane 9257f07872 Replace uses of SPI_modifytuple that intend to allocate in current context.
Invent a new function heap_modify_tuple_by_cols() that is functionally
equivalent to SPI_modifytuple except that it always allocates its result
by simple palloc.  I chose however to make the API details a bit more
like heap_modify_tuple: pass a tupdesc rather than a Relation, and use
bool convention for the isnull array.

Use this function in place of SPI_modifytuple at all call sites where the
intended behavior is to allocate in current context.  (There actually are
only two call sites left that depend on the old behavior, which makes me
wonder if we should just drop this function rather than keep it.)

This new function is easier to use than heap_modify_tuple() for purposes
of replacing a single column (or, really, any fixed number of columns).
There are a number of places where it would simplify the code to change
over, but I resisted that temptation for the moment ... everywhere except
in plpgsql's exec_assign_value(); changing that might offer some small
performance benefit, so I did it.

This is on the way to removing SPI_push/SPI_pop, but it seems like
good code cleanup in its own right.

Discussion: <9633.1478552022@sss.pgh.pa.us>
2016-11-08 15:36:44 -05:00
Tom Lane 2f1eaf87e8 Drop server support for FE/BE protocol version 1.0.
While this isn't a lot of code, it's been essentially untestable for
a very long time, because libpq doesn't support anything older than
protocol 2.0, and has not since release 6.3.  There's no reason to
believe any other client-side code still uses that protocol, either.

Discussion: <2661.1475849167@sss.pgh.pa.us>
2016-10-11 12:19:18 -04:00
Peter Eisentraut 49eb0fd097 Add location field to DefElem
Add a location field to the DefElem struct, used to parse many utility
commands.  Update various error messages to supply error position
information.

To propogate the error position information in a more systematic way,
create a ParseState in standard_ProcessUtility() and pass that to
interested functions implementing the utility commands.  This seems
better than passing the query string and then reassembling a parse state
ad hoc, which violates the encapsulation of the ParseState type.

Reviewed-by: Pavel Stehule <pavel.stehule@gmail.com>
2016-09-06 12:00:00 -04:00
Tom Lane ea268cdc9a Add macros to make AllocSetContextCreate() calls simpler and safer.
I found that half a dozen (nearly 5%) of our AllocSetContextCreate calls
had typos in the context-sizing parameters.  While none of these led to
especially significant problems, they did create minor inefficiencies,
and it's now clear that expecting people to copy-and-paste those calls
accurately is not a great idea.  Let's reduce the risk of future errors
by introducing single macros that encapsulate the common use-cases.
Three such macros are enough to cover all but two special-purpose contexts;
those two calls can be left as-is, I think.

While this patch doesn't in itself improve matters for third-party
extensions, it doesn't break anything for them either, and they can
gradually adopt the simplified notation over time.

In passing, change TopMemoryContext to use the default allocation
parameters.  Formerly it could only be extended 8K at a time.  That was
probably reasonable when this code was written; but nowadays we create
many more contexts than we did then, so that it's not unusual to have a
couple hundred K in TopMemoryContext, even without considering various
dubious code that sticks other things there.  There seems no good reason
not to let it use growing blocks like most other contexts.

Back-patch to 9.6, mostly because that's still close enough to HEAD that
it's easy to do so, and keeping the branches in sync can be expected to
avoid some future back-patching pain.  The bugs fixed by these changes
don't seem to be significant enough to justify fixing them further back.

Discussion: <21072.1472321324@sss.pgh.pa.us>
2016-08-27 17:50:38 -04:00
Robert Haas 4bc424b968 pgindent run for 9.6 2016-06-09 18:02:36 -04:00
Robert Haas c9ce4a1c61 Eliminate "parallel degree" terminology.
This terminology provoked widespread complaints.  So, instead, rename
the GUC max_parallel_degree to max_parallel_workers_per_gather
(leaving room for a possible future GUC max_parallel_workers that acts
as a system-wide limit), and rename the parallel_degree reloption to
parallel_workers.  Rename structure members to match.

These changes create a dump/restore hazard for users of PostgreSQL
9.6beta1 who have set the reloption (or applied the GUC using ALTER
USER or ALTER DATABASE).
2016-06-09 10:00:26 -04:00
Robert Haas c6dbf1fe79 Stop the executor if no more tuples can be sent from worker to leader.
If a Gather node has read as many tuples as it needs (for example, due
to Limit) it may detach the queue connecting it to the worker before
reading all of the worker's tuples.  Rather than let the worker
continue to generate and send all of the results, have it stop after
sending the next tuple.

More could be done here to stop the worker even quicker, but this is
about as well as we can hope to do for 9.6.

This is in response to a problem report from Andreas Seltenreich.
Commit 44339b892a should be actually be
sufficient to fix that example even without this change, but it seems
better to do this, too, since we might otherwise waste quite a large
amount of effort in one or more workers.

Discussion: CAA4eK1KOKGqmz9bGu+Z42qhRwMbm4R5rfnqsLCNqFs9j14jzEA@mail.gmail.com

Amit Kapila
2016-06-06 14:52:58 -04:00
Robert Haas c7ea68ff8d Limit maximum parallel degree to 1024.
This new limit affects both the max_parallel_degree GUC and the
parallel_degree reloption.  There may some day be a use case for using
more than 1024 CPUs for a single query, but that's surely not the case
right now.  Not only do not very many people have that many CPUs, but
the code hasn't been tested at that kind of scale and is very unlikely
to perform well, or even work at all, without a lot more work.  The
issue addressed by commit 06bd458cb8 is
probably just one problem of many.

The idea of a more reasonable limit here was suggested by Tom Lane;
the value of 1024 was suggested by Amit Kapila.
2016-05-06 14:50:54 -04:00
Teodor Sigaev 8b99edefca Revert CREATE INDEX ... INCLUDING ...
It's not ready yet, revert two commits
690c543550 - unstable test output
386e3d7609 - patch itself
2016-04-08 21:52:13 +03:00
Teodor Sigaev 386e3d7609 CREATE INDEX ... INCLUDING (column[, ...])
Now indexes (but only B-tree for now) can contain "extra" column(s) which
doesn't participate in index structure, they are just stored in leaf
tuples. It allows to use index only scan by using single index instead
of two or more indexes.

Author: Anastasia Lubennikova with minor editorializing by me
Reviewers: David Rowley, Peter Geoghegan, Jeff Janes
2016-04-08 19:45:59 +03:00
Robert Haas 25fe8b5f1a Add a 'parallel_degree' reloption.
The code that estimates what parallel degree should be uesd for the
scan of a relation is currently rather stupid, so add a parallel_degree
reloption that can be used to override the planner's rather limited
judgement.

Julien Rouhaud, reviewed by David Rowley, James Sewell, Amit Kapila,
and me.  Some further hacking by me.
2016-04-08 11:14:56 -04:00
Simon Riggs fcb4bfddb6 Reduce lock level for altering fillfactor
Fabrízio de Royes Mello and Simon Riggs
2016-03-10 12:07:33 +00:00
Tom Lane 65c5fcd353 Restructure index access method API to hide most of it at the C level.
This patch reduces pg_am to just two columns, a name and a handler
function.  All the data formerly obtained from pg_am is now provided
in a C struct returned by the handler function.  This is similar to
the designs we've adopted for FDWs and tablesample methods.  There
are multiple advantages.  For one, the index AM's support functions
are now simple C functions, making them faster to call and much less
error-prone, since the C compiler can now check function signatures.
For another, this will make it far more practical to define index access
methods in installable extensions.

A disadvantage is that SQL-level code can no longer see attributes
of index AMs; in particular, some of the crosschecks in the opr_sanity
regression test are no longer possible from SQL.  We've addressed that
by adding a facility for the index AM to perform such checks instead.
(Much more could be done in that line, but for now we're content if the
amvalidate functions more or less replace what opr_sanity used to do.)
We might also want to expose some sort of reporting functionality, but
this patch doesn't do that.

Alexander Korotkov, reviewed by Petr Jelínek, and rather heavily
editorialized on by me.
2016-01-17 19:36:59 -05:00
Bruce Momjian ee94300446 Update copyright for 2016
Backpatch certain files through 9.1
2016-01-02 13:33:40 -05:00
Andres Freund 2596d705bd Re-Align *_freeze_max_age reloption limits with corresponding GUC limits.
In 020235a575 I lowered the autovacuum_*freeze_max_age minimums to
allow for easier testing of wraparounds. I did not touch the
corresponding per-table limits. While those don't matter for the purpose
of wraparound, it seems more consistent to lower them as well.

It's noteworthy that the previous reloption lower limit for
autovacuum_multixact_freeze_max_age was too high by one magnitude, even
before 020235a575.

Discussion: 26377.1443105453@sss.pgh.pa.us
Backpatch: back to 9.0 (in parts), like the prior patch
2015-10-05 11:53:43 +02:00
Alvaro Herrera 1aba62ec63 Allow per-tablespace effective_io_concurrency
Per discussion, nowadays it is possible to have tablespaces that have
wildly different I/O characteristics from others.  Setting different
effective_io_concurrency parameters for those has been measured to
improve performance.

Author: Julien Rouhaud
Reviewed by: Andres Freund
2015-09-08 12:51:42 -03:00
Heikki Linnakangas c80b5f66c6 Fix misc typos.
Oskari Saarenmaa. Backpatch to stable branches where applicable.
2015-09-05 11:35:49 +03:00
Simon Riggs 47167b7907 Reduce lock levels for ALTER TABLE SET autovacuum storage options
Reduce lock levels down to ShareUpdateExclusiveLock for all autovacuum-related
relation options when setting them using ALTER TABLE.

Add infrastructure to allow varying lock levels for relation options in later
patches. Setting multiple options together uses the highest lock level required
for any option. Works for both main and toast tables.

Fabrízio Mello, reviewed by Michael Paquier, mild edit and additional regression
tests from myself
2015-08-14 14:19:28 +01:00
Tom Lane 09cecdf285 Fix a number of places that produced XX000 errors in the regression tests.
It's against project policy to use elog() for user-facing errors, or to
omit an errcode() selection for errors that aren't supposed to be "can't
happen" cases.  Fix all the violations of this policy that result in
ERRCODE_INTERNAL_ERROR log entries during the standard regression tests,
as errors that can reliably be triggered from SQL surely should be
considered user-facing.

I also looked through all the files touched by this commit and fixed
other nearby problems of the same ilk.  I do not claim to have fixed
all violations of the policy, just the ones in these files.

In a few places I also changed existing ERRCODE choices that didn't
seem particularly appropriate; mainly replacing ERRCODE_SYNTAX_ERROR
by something more specific.

Back-patch to 9.5, but no further; changing ERRCODE assignments in
stable branches doesn't seem like a good idea.
2015-08-02 23:49:19 -04:00
Heikki Linnakangas 7261172430 Remove obsolete heap_formtuple/modifytuple/deformtuple functions.
These variants used the old-style 'n'/' ' NULL indicators. The new-style
functions have been available since version 8.1. That should be long enough
that if there is still any old external code using these functions, they
can just switch to the new functions without worrying about backwards
compatibility

Peter Geoghegan
2015-07-02 21:21:23 +03:00
Tom Lane 1dc5ebc907 Support "expanded" objects, particularly arrays, for better performance.
This patch introduces the ability for complex datatypes to have an
in-memory representation that is different from their on-disk format.
On-disk formats are typically optimized for minimal size, and in any case
they can't contain pointers, so they are often not well-suited for
computation.  Now a datatype can invent an "expanded" in-memory format
that is better suited for its operations, and then pass that around among
the C functions that operate on the datatype.  There are also provisions
(rudimentary as yet) to allow an expanded object to be modified in-place
under suitable conditions, so that operations like assignment to an element
of an array need not involve copying the entire array.

The initial application for this feature is arrays, but it is not hard
to foresee using it for other container types like JSON, XML and hstore.
I have hopes that it will be useful to PostGIS as well.

In this initial implementation, a few heuristics have been hard-wired
into plpgsql to improve performance for arrays that are stored in
plpgsql variables.  We would like to generalize those hacks so that
other datatypes can obtain similar improvements, but figuring out some
appropriate APIs is left as a task for future work.  (The heuristics
themselves are probably not optimal yet, either, as they sometimes
force expansion of arrays that would be better left alone.)

Preliminary performance testing shows impressive speed gains for plpgsql
functions that do element-by-element access or update of large arrays.
There are other cases that get a little slower, as a result of added array
format conversions; but we can hope to improve anything that's annoyingly
bad.  In any case most applications should see a net win.

Tom Lane, reviewed by Andres Freund
2015-05-14 12:08:49 -04:00
Tom Lane 0bb8528b5c Fix postgres_fdw to return the right ctid value in EvalPlanQual cases.
If a postgres_fdw foreign table is a non-locked source relation in an
UPDATE, DELETE, or SELECT FOR UPDATE/SHARE, and the query selects its
ctid column, the wrong value would be returned if an EvalPlanQual
recheck occurred.  This happened because the foreign table's result row
was copied via the ROW_MARK_COPY code path, and EvalPlanQualFetchRowMarks
just unconditionally set the reconstructed tuple's t_self to "invalid".

To fix that, we can have EvalPlanQualFetchRowMarks copy the composite
datum's t_ctid field, and be sure to initialize that along with t_self
when postgres_fdw constructs a tuple to return.

If we just did that much then EvalPlanQualFetchRowMarks would start
returning "(0,0)" as ctid for all other ROW_MARK_COPY cases, which perhaps
does not matter much, but then again maybe it might.  The cause of that is
that heap_form_tuple, which is the ultimate source of all composite datums,
simply leaves t_ctid as zeroes in newly constructed tuples.  That seems
like a bad idea on general principles: a field that's really not been
initialized shouldn't appear to have a valid value.  So let's eat the
trivial additional overhead of doing "ItemPointerSetInvalid(&(td->t_ctid))"
in heap_form_tuple.

This closes out our handling of Etsuro Fujita's report that tableoid and
ctid weren't correctly set in postgres_fdw EvalPlanQual cases.  Along the
way we did a great deal of work to improve FDWs' ability to control row
locking behavior; which was not wasted effort by any means, but it didn't
end up being a fix for this problem because that feature would be too
expensive for postgres_fdw to use all the time.

Although the fix for the tableoid misbehavior was back-patched, I'm
hesitant to do so here; it seems far less likely that people would care
about remote ctid than tableoid, and even such a minor behavioral change
as this in heap_form_tuple is perhaps best not back-patched.  So commit
to HEAD only, at least for the moment.

Etsuro Fujita, with some adjustments by me
2015-05-13 14:05:29 -04:00
Alvaro Herrera 4ff695b17d Add log_min_autovacuum_duration per-table option
This is useful to control autovacuum log volume, for situations where
monitoring only a set of tables is necessary.

Author: Michael Paquier
Reviewed by: A team led by Naoya Anzai (also including Akira Kurosawa,
Taiki Kondo, Huong Dangminh), Fujii Masao.
2015-04-03 11:55:50 -03:00
Tom Lane e1a11d9311 Use FLEXIBLE_ARRAY_MEMBER for HeapTupleHeaderData.t_bits[].
This requires changing quite a few places that were depending on
sizeof(HeapTupleHeaderData), but it seems for the best.

Michael Paquier, some adjustments by me
2015-02-21 15:13:06 -05:00
Bruce Momjian 4baaf863ec Update copyright for 2015
Backpatch certain files through 9.0
2015-01-06 11:43:47 -05:00
Fujii Masao c291503b1c Rename pending_list_cleanup_size to gin_pending_list_limit.
Since this parameter is only for GIN index, it's better to
add "gin" to the parameter name for easier understanding.
2014-11-13 12:14:48 +09:00
Fujii Masao a1b395b6a2 Add GUC and storage parameter to set the maximum size of GIN pending list.
Previously the maximum size of GIN pending list was controlled only by
work_mem. But the reasonable value of work_mem and the reasonable size
of the list are basically not the same, so it was not appropriate to
control both of them by only one GUC, i.e., work_mem. This commit
separates new GUC, pending_list_cleanup_size, from work_mem to allow
users to control only the size of the list.

Also this commit adds pending_list_cleanup_size as new storage parameter
to allow users to specify the size of the list per index. This is useful,
for example, when users want to increase the size of the list only for
the GIN index which can be updated heavily, and decrease it otherwise.

Reviewed by Etsuro Fujita.
2014-11-11 21:08:21 +09:00
Alvaro Herrera 7516f52594 BRIN: Block Range Indexes
BRIN is a new index access method intended to accelerate scans of very
large tables, without the maintenance overhead of btrees or other
traditional indexes.  They work by maintaining "summary" data about
block ranges.  Bitmap index scans work by reading each summary tuple and
comparing them with the query quals; all pages in the range are returned
in a lossy TID bitmap if the quals are consistent with the values in the
summary tuple, otherwise not.  Normal index scans are not supported
because these indexes do not store TIDs.

As new tuples are added into the index, the summary information is
updated (if the block range in which the tuple is added is already
summarized) or not; in the latter case, a subsequent pass of VACUUM or
the brin_summarize_new_values() function will create the summary
information.

For data types with natural 1-D sort orders, the summary info consists
of the maximum and the minimum values of each indexed column within each
page range.  This type of operator class we call "Minmax", and we
supply a bunch of them for most data types with B-tree opclasses.
Since the BRIN code is generalized, other approaches are possible for
things such as arrays, geometric types, ranges, etc; even for things
such as enum types we could do something different than minmax with
better results.  In this commit I only include minmax.

Catalog version bumped due to new builtin catalog entries.

There's more that could be done here, but this is a good step forwards.

Loosely based on ideas from Simon Riggs; code mostly by Álvaro Herrera,
with contribution by Heikki Linnakangas.

Patch reviewed by: Amit Kapila, Heikki Linnakangas, Robert Haas.
Testing help from Jeff Janes, Erik Rijkers, Emanuel Calvo.

PS:
  The research leading to these results has received funding from the
  European Union's Seventh Framework Programme (FP7/2007-2013) under
  grant agreement n° 318633.
2014-11-07 16:38:14 -03:00
Fujii Masao 9df492664a Revert "Allow units to be specified in relation option setting value."
This reverts commit e23014f3d4.

As the side effect of the reverted commit, when the unit is
specified, the reloption was stored in the catalog with the unit.
This broke pg_dump (specifically, it prevented pg_dump from
outputting restorable backup regarding the reloption) and
turned the buildfarm red. Revert the commit until the fixed
version is ready.
2014-08-29 05:10:47 +09:00
Fujii Masao e23014f3d4 Allow units to be specified in relation option setting value.
This introduces an infrastructure which allows us to specify the units
like ms (milliseconds) in integer relation option, like GUC parameter.
Currently only autovacuum_vacuum_cost_delay reloption can accept
the units.

Reviewed by Michael Paquier
2014-08-28 15:55:50 +09:00
Bruce Momjian 6cb74a67e2 revert "Throw error for ALTER TABLE RESET of an invalid option"
Reverts commits 73d78e11a0 and
b0488e5c4f.  Also reverts pg_upgrade
changes.
2014-08-25 20:07:37 -04:00
Bruce Momjian 73d78e11a0 Throw error for ALTER TABLE RESET of an invalid option
Also adjust pg_upgrade to not use this method for optional TOAST table
creation.

Patch by Fabrízio de Royes Mello
2014-08-25 17:06:40 -04:00
Tom Lane a844c29966 Prevent memory leaks in parseRelOptions().
parseRelOptions() tended to leak memory in the caller's context.  Most
of the time this doesn't really matter since the caller's context is
at most query-lifespan, and the function won't be invoked very many times.
However, when testing with CLOBBER_CACHE_RECURSIVELY, the same relcache
entry can get rebuilt a *lot* of times in one query, leading to significant
intraquery memory bloat if it has any reloptions.  Noted while
investigating a related report from Tomas Vondra.

In passing, get rid of some Asserts that are redundant with the one
done by deconstruct_array().

As with other patches to avoid leaks in CLOBBER_CACHE testing, it doesn't
really seem worth back-patching this.
2014-08-13 11:35:51 -04:00
Alvaro Herrera 346d7be184 Move view reloptions into their own varlena struct
Per discussion after a gripe from me in
http://www.postgresql.org/message-id/20140611194633.GH18688@eldon.alvh.no-ip.org

Jaime Casanova
2014-07-14 17:24:40 -04:00
Robert Haas 9f03ca9151 Avoid copying index tuples when building an index.
The previous code, perhaps out of concern for avoid memory leaks, formed
the tuple in one memory context and then copied it to another memory
context.  However, this doesn't appear to be necessary, since
index_form_tuple and the functions it calls take precautions against
leaking memory.  In my testing, building the tuple directly inside the
sort context shaves several percent off the index build time.
Rearrange things so we do that.

Patch by me.  Review by Amit Kapila, Tom Lane, Andres Freund.
2014-07-01 10:34:42 -04:00
Bruce Momjian 0a78320057 pgindent run for 9.4
This includes removing tabs after periods in C comments, which was
applied to back branches, so this change should not effect backpatching.
2014-05-06 12:12:18 -04:00
Tom Lane 3f8c8e3c61 Fix failure to detoast fields in composite elements of structured types.
If we have an array of records stored on disk, the individual record fields
cannot contain out-of-line TOAST pointers: the tuptoaster.c mechanisms are
only prepared to deal with TOAST pointers appearing in top-level fields of
a stored row.  The same applies for ranges over composite types, nested
composites, etc.  However, the existing code only took care of expanding
sub-field TOAST pointers for the case of nested composites, not for other
structured types containing composites.  For example, given a command such
as

UPDATE tab SET arraycol = ARRAY[(ROW(x,42)::mycompositetype] ...

where x is a direct reference to a field of an on-disk tuple, if that field
is long enough to be toasted out-of-line then the TOAST pointer would be
inserted as-is into the array column.  If the source record for x is later
deleted, the array field value would become a dangling pointer, leading
to errors along the line of "missing chunk number 0 for toast value ..."
when the value is referenced.  A reproducible test case for this was
provided by Jan Pecek, but it seems likely that some of the "missing chunk
number" reports we've heard in the past were caused by similar issues.

Code-wise, the problem is that PG_DETOAST_DATUM() is not adequate to
produce a self-contained Datum value if the Datum is of composite type.
Seen in this light, the problem is not just confined to arrays and ranges,
but could also affect some other places where detoasting is done in that
way, for example form_index_tuple().

I tried teaching the array code to apply toast_flatten_tuple_attribute()
along with PG_DETOAST_DATUM() when the array element type is composite,
but this was messy and imposed extra cache lookup costs whether or not any
TOAST pointers were present, indeed sometimes when the array element type
isn't even composite (since sometimes it takes a typcache lookup to find
that out).  The idea of extending that approach to all the places that
currently use PG_DETOAST_DATUM() wasn't attractive at all.

This patch instead solves the problem by decreeing that composite Datum
values must not contain any out-of-line TOAST pointers in the first place;
that is, we expand out-of-line fields at the point of constructing a
composite Datum, not at the point where we're about to insert it into a
larger tuple.  This rule is applied only to true composite Datums, not
to tuples that are being passed around the system as tuples, so it's not
as invasive as it might sound at first.  With this approach, the amount
of code that has to be touched for a full solution is greatly reduced,
and added cache lookup costs are avoided except when there actually is
a TOAST pointer that needs to be inlined.

The main drawback of this approach is that we might sometimes dereference
a TOAST pointer that will never actually be used by the query, imposing a
rather large cost that wasn't there before.  On the other side of the coin,
if the field value is used multiple times then we'll come out ahead by
avoiding repeat detoastings.  Experimentation suggests that common SQL
coding patterns are unaffected either way, though.  Applications that are
very negatively affected could be advised to modify their code to not fetch
columns they won't be using.

In future, we might consider reverting this solution in favor of detoasting
only at the point where data is about to be stored to disk, using some
method that can drill down into multiple levels of nested structured types.
That will require defining new APIs for structured types, though, so it
doesn't seem feasible as a back-patchable fix.

Note that this patch changes HeapTupleGetDatum() from a macro to a function
call; this means that any third-party code using that macro will not get
protection against creating TOAST-pointer-containing Datums until it's
recompiled.  The same applies to any uses of PG_RETURN_HEAPTUPLEHEADER().
It seems likely that this is not a big problem in practice: most of the
tuple-returning functions in core and contrib produce outputs that could
not possibly be toasted anyway, and the same probably holds for third-party
extensions.

This bug has existed since TOAST was invented, so back-patch to all
supported branches.
2014-05-01 15:19:06 -04:00
Noah Misch c31305de5f Address ccvalid/ccnoinherit in TupleDesc support functions.
equalTupleDescs() neglected both of these ConstrCheck fields, and
CreateTupleDescCopyConstr() neglected ccnoinherit.  At this time, the
only known behavior defect resulting from these omissions is constraint
exclusion disregarding a CHECK constraint validated by an ALTER TABLE
VALIDATE CONSTRAINT statement issued earlier in the same transaction.
Back-patch to 9.2, where these fields were introduced.
2014-03-23 02:13:43 -04:00
Alvaro Herrera 801c2dc72c Separate multixact freezing parameters from xid's
Previously we were piggybacking on transaction ID parameters to freeze
multixacts; but since there isn't necessarily any relationship between
rates of Xid and multixact consumption, this turns out not to be a good
idea.

Therefore, we now have multixact-specific freezing parameters:

vacuum_multixact_freeze_min_age: when to remove multis as we come across
them in vacuum (default to 5 million, i.e. early in comparison to Xid's
default of 50 million)

vacuum_multixact_freeze_table_age: when to force whole-table scans
instead of scanning only the pages marked as not all visible in
visibility map (default to 150 million, same as for Xids).  Whichever of
both which reaches the 150 million mark earlier will cause a whole-table
scan.

autovacuum_multixact_freeze_max_age: when for cause emergency,
uninterruptible whole-table scans (default to 400 million, double as
that for Xids).  This means there shouldn't be more frequent emergency
vacuuming than previously, unless multixacts are being used very
rapidly.

Backpatch to 9.3 where multixacts were made to persist enough to require
freezing.  To avoid an ABI break in 9.3, VacuumStmt has a couple of
fields in an unnatural place, and StdRdOptions is split in two so that
the newly added fields can go at the end.

Patch by me, reviewed by Robert Haas, with additional input from Andres
Freund and Tom Lane.
2014-02-13 19:36:31 -03:00
Bruce Momjian c871e8f53b Revert C comment change in slot_attisnull()
Revert 89774b58b0
2014-01-28 12:28:14 -05:00
Bruce Momjian 89774b58b0 Adjust C comment in slot_attisnull() regarding nulls. 2014-01-25 16:43:36 -05:00
Tom Lane ac4ef637ad Allow use of "z" flag in our printf calls, and use it where appropriate.
Since C99, it's been standard for printf and friends to accept a "z" size
modifier, meaning "whatever size size_t has".  Up to now we've generally
dealt with printing size_t values by explicitly casting them to unsigned
long and using the "l" modifier; but this is really the wrong thing on
platforms where pointers are wider than longs (such as Win64).  So let's
start using "z" instead.  To ensure we can do that on all platforms, teach
src/port/snprintf.c to understand "z", and add a configure test to force
use of that implementation when the platform's version doesn't handle "z".

Having done that, modify a bunch of places that were using the
unsigned-long hack to use "z" instead.  This patch doesn't pretend to have
gotten everyplace that could benefit, but it catches many of them.  I made
an effort in particular to ensure that all uses of the same error message
text were updated together, so as not to increase the number of
translatable strings.

It's possible that this change will result in format-string warnings from
pre-C99 compilers.  We might have to reconsider if there are any popular
compilers that will warn about this; but let's start by seeing what the
buildfarm thinks.

Andres Freund, with a little additional work by me
2014-01-23 17:18:33 -05:00
Bruce Momjian 7e04792a1c Update copyright for 2014
Update all files in head, and files COPYRIGHT and legal.sgml in all back
branches.
2014-01-07 16:05:30 -05:00
Robert Haas d43760b624 Revise documentation for new freezing method.
Commit 37484ad2aa invalidated a good
chunk of documentation, so patch it up to reflect the new state of
play.  Along the way, patch remaining documentation references to
FrozenXID to say instead FrozenTransactionId, so that they match the
way we actually spell it in the code.
2013-12-23 20:36:31 -05:00
Robert Haas 37484ad2aa Change the way we mark tuples as frozen.
Instead of changing the tuple xmin to FrozenTransactionId, the combination
of HEAP_XMIN_COMMITTED and HEAP_XMIN_INVALID, which were previously never
set together, is now defined as HEAP_XMIN_FROZEN.  A variety of previous
proposals to freeze tuples opportunistically before vacuum_freeze_min_age
is reached have foundered on the objection that replacing xmin by
FrozenTransactionId might hinder debugging efforts when things in this
area go awry; this patch is intended to solve that problem by keeping
the XID around (but largely ignoring the value to which it is set).

Third-party code that checks for HEAP_XMIN_INVALID on tuples where
HEAP_XMIN_COMMITTED might be set will be broken by this change.  To fix,
use the new accessor macros in htup_details.h rather than consulting the
bits directly.  HeapTupleHeaderGetXmin has been modified to return
FrozenTransactionId when the infomask bits indicate that the tuple is
frozen; use HeapTupleHeaderGetRawXmin when you already know that the
tuple isn't marked commited or frozen, or want the raw value anyway.
We currently do this in routines that display the xmin for user consumption,
in tqual.c where it's known to be safe and important for the avoidance of
extra cycles, and in the function-caching code for various procedural
languages, which shouldn't invalidate the cache just because the tuple
gets frozen.

Robert Haas and Andres Freund
2013-12-22 15:49:09 -05:00
Robert Haas 66abc2608c Add a new reloption, user_catalog_table.
When this reloption is set and wal_level=logical is configured,
we'll record the CIDs stamped by inserts, updates, and deletes to
the table just as we would for an actual catalog table.  This will
allow logical decoding to use historical MVCC snapshots to access
such tables just as they access ordinary catalog tables.

Replication solutions built around the logical decoding machinery
will likely need to set this operation for their configuration
tables; it might also be needed by extensions which perform table
access in their output functions.

Andres Freund, reviewed by myself and others.
2013-12-10 19:17:34 -05:00
Tom Lane 784e762e88 Support multi-argument UNNEST(), and TABLE() syntax for multiple functions.
This patch adds the ability to write TABLE( function1(), function2(), ...)
as a single FROM-clause entry.  The result is the concatenation of the
first row from each function, followed by the second row from each
function, etc; with NULLs inserted if any function produces fewer rows than
others.  This is believed to be a much more useful behavior than what
Postgres currently does with multiple SRFs in a SELECT list.

This syntax also provides a reasonable way to combine use of column
definition lists with WITH ORDINALITY: put the column definition list
inside TABLE(), where it's clear that it doesn't control the ordinality
column as well.

Also implement SQL-compliant multiple-argument UNNEST(), by turning
UNNEST(a,b,c) into TABLE(unnest(a), unnest(b), unnest(c)).

The SQL standard specifies TABLE() with only a single function, not
multiple functions, and it seems to require an implicit UNNEST() which is
not what this patch does.  There may be something wrong with that reading
of the spec, though, because if it's right then the spec's TABLE() is just
a pointless alternative spelling of UNNEST().  After further review of
that, we might choose to adopt a different syntax for what this patch does,
but in any case this functionality seems clearly worthwhile.

Andrew Gierth, reviewed by Zoltán Böszörményi and Heikki Linnakangas, and
significantly revised by me
2013-11-21 19:37:20 -05:00
Tom Lane b006f4ddb9 Prevent memory leaks from accumulating across printtup() calls.
Historically, printtup() has assumed that it could prevent memory leakage
by pfree'ing the string result of each output function and manually
managing detoasting of toasted values.  This amounts to assuming that
datatype output functions never leak any memory internally; an assumption
we've already decided to be bogus elsewhere, for example in COPY OUT.
range_out in particular is known to leak multiple kilobytes per call, as
noted in bug #8573 from Godfried Vanluffelen.  While we could go in and fix
that leak, it wouldn't be very notationally convenient, and in any case
there have been and undoubtedly will again be other leaks in other output
functions.  So what seems like the best solution is to run the output
functions in a temporary memory context that can be reset after each row,
as we're doing in COPY OUT.  Some quick experimentation suggests this is
actually a tad faster than the retail pfree's anyway.

This patch fixes all the variants of printtup, except for debugtup()
which is used in standalone mode.  It doesn't seem worth worrying
about query-lifespan leaks in standalone mode, and fixing that case
would be a bit tedious since debugtup() doesn't currently have any
startup or shutdown functions.

While at it, remove manual detoast management from several other
output-function call sites that had copied it from printtup().  This
doesn't make a lot of difference right now, but in view of recent
discussions about supporting "non-flattened" Datums, we're going to
want that code gone eventually anyway.

Back-patch to 9.2 where range_out was introduced.  We might eventually
decide to back-patch this further, but in the absence of known major
leaks in older output functions, I'll refrain for now.
2013-11-03 11:33:05 -05:00
Tom Lane 9a9473f3cc Prevent using strncpy with src == dest in TupleDescInitEntry.
The C and POSIX standards state that strncpy's behavior is undefined when
source and destination areas overlap.  While it remains dubious whether any
implementations really misbehave when the pointers are exactly equal, some
platforms are now starting to force the issue by complaining when an
undefined call occurs.  (In particular OS X 10.9 has been seen to dump core
here, though the exact set of circumstances needed to trigger that remain
elusive.  Similar behavior can be expected to be optional on Linux and
other platforms in the near future.)  So tweak the code to explicitly do
nothing when nothing need be done.

Back-patch to all active branches.  In HEAD, this also lets us get rid of
an exception in valgrind.supp.

Per discussion of a report from Matthias Schmitt.
2013-10-28 20:49:24 -04:00
Greg Stark c62736cc37 Add SQL Standard WITH ORDINALITY support for UNNEST (and any other SRF)
Author: Andrew Gierth, David Fetter
Reviewers: Dean Rasheed, Jeevan Chalke, Stephen Frost
2013-07-29 16:38:01 +01:00
Stephen Frost 4cbe3ac3e8 WITH CHECK OPTION support for auto-updatable VIEWs
For simple views which are automatically updatable, this patch allows
the user to specify what level of checking should be done on records
being inserted or updated.  For 'LOCAL CHECK', new tuples are validated
against the conditionals of the view they are being inserted into, while
for 'CASCADED CHECK' the new tuples are validated against the
conditionals for all views involved (from the top down).

This option is part of the SQL specification.

Dean Rasheed, reviewed by Pavel Stehule
2013-07-18 17:10:16 -04:00
Noah Misch 19085116ee Cooperate with the Valgrind instrumentation framework.
Valgrind "client requests" in aset.c and mcxt.c teach Valgrind and its
Memcheck tool about the PostgreSQL allocator.  This makes Valgrind
roughly as sensitive to memory errors involving palloc chunks as it is
to memory errors involving malloc chunks.  Further client requests in
PageAddItem() and printtup() verify that all bits being added to a
buffer page or furnished to an output function are predictably-defined.
Those tests catch failures of C-language functions to fully initialize
the bits of a Datum, which in turn stymie optimizations that rely on
_equalConst().  Define the USE_VALGRIND symbol in pg_config_manual.h to
enable these additions.  An included "suppression file" silences nominal
errors we don't plan to fix.

Reviewed in earlier versions by Peter Geoghegan and Korry Douglas.
2013-06-26 20:22:25 -04:00
Kevin Grittner 3bf3ab8c56 Add a materialized view relations.
A materialized view has a rule just like a view and a heap and
other physical properties like a table.  The rule is only used to
populate the table, references in queries refer to the
materialized data.

This is a minimal implementation, but should still be useful in
many cases.  Currently data is only populated "on demand" by the
CREATE MATERIALIZED VIEW and REFRESH MATERIALIZED VIEW statements.
It is expected that future releases will add incremental updates
with various timings, and that a more refined concept of defining
what is "fresh" data will be developed.  At some point it may even
be possible to have queries use a materialized in place of
references to underlying tables, but that requires the other
above-mentioned features to be working first.

Much of the documentation work by Robert Haas.
Review by Noah Misch, Thom Brown, Robert Haas, Marko Tiikkaja
Security review by KaiGai Kohei, with a decision on how best to
implement sepgsql still pending.
2013-03-03 18:23:31 -06:00
Alvaro Herrera 0ac5ad5134 Improve concurrency of foreign key locking
This patch introduces two additional lock modes for tuples: "SELECT FOR
KEY SHARE" and "SELECT FOR NO KEY UPDATE".  These don't block each
other, in contrast with already existing "SELECT FOR SHARE" and "SELECT
FOR UPDATE".  UPDATE commands that do not modify the values stored in
the columns that are part of the key of the tuple now grab a SELECT FOR
NO KEY UPDATE lock on the tuple, allowing them to proceed concurrently
with tuple locks of the FOR KEY SHARE variety.

Foreign key triggers now use FOR KEY SHARE instead of FOR SHARE; this
means the concurrency improvement applies to them, which is the whole
point of this patch.

The added tuple lock semantics require some rejiggering of the multixact
module, so that the locking level that each transaction is holding can
be stored alongside its Xid.  Also, multixacts now need to persist
across server restarts and crashes, because they can now represent not
only tuple locks, but also tuple updates.  This means we need more
careful tracking of lifetime of pg_multixact SLRU files; since they now
persist longer, we require more infrastructure to figure out when they
can be removed.  pg_upgrade also needs to be careful to copy
pg_multixact files over from the old server to the new, or at least part
of multixact.c state, depending on the versions of the old and new
servers.

Tuple time qualification rules (HeapTupleSatisfies routines) need to be
careful not to consider tuples with the "is multi" infomask bit set as
being only locked; they might need to look up MultiXact values (i.e.
possibly do pg_multixact I/O) to find out the Xid that updated a tuple,
whereas they previously were assured to only use information readily
available from the tuple header.  This is considered acceptable, because
the extra I/O would involve cases that would previously cause some
commands to block waiting for concurrent transactions to finish.

Another important change is the fact that locking tuples that have
previously been updated causes the future versions to be marked as
locked, too; this is essential for correctness of foreign key checks.
This causes additional WAL-logging, also (there was previously a single
WAL record for a locked tuple; now there are as many as updated copies
of the tuple there exist.)

With all this in place, contention related to tuples being checked by
foreign key rules should be much reduced.

As a bonus, the old behavior that a subtransaction grabbing a stronger
tuple lock than the parent (sub)transaction held on a given tuple and
later aborting caused the weaker lock to be lost, has been fixed.

Many new spec files were added for isolation tester framework, to ensure
overall behavior is sane.  There's probably room for several more tests.

There were several reviewers of this patch; in particular, Noah Misch
and Andres Freund spent considerable time in it.  Original idea for the
patch came from Simon Riggs, after a problem report by Joel Jacobson.
Most code is from me, with contributions from Marti Raudsepp, Alexander
Shulgin, Noah Misch and Andres Freund.

This patch was discussed in several pgsql-hackers threads; the most
important start at the following message-ids:
	AANLkTimo9XVcEzfiBR-ut3KVNDkjm2Vxh+t8kAmWjPuv@mail.gmail.com
	1290721684-sup-3951@alvh.no-ip.org
	1294953201-sup-2099@alvh.no-ip.org
	1320343602-sup-2290@alvh.no-ip.org
	1339690386-sup-8927@alvh.no-ip.org
	4FE5FF020200002500048A3D@gw.wicourts.gov
	4FEAB90A0200002500048B7D@gw.wicourts.gov
2013-01-23 12:04:59 -03:00
Bruce Momjian bd61a623ac Update copyrights for 2013
Fully update git head, and update back branches in ./COPYRIGHT and
legal.sgml files.
2013-01-01 17:15:01 -05:00
Alvaro Herrera c219d9b0a5 Split tuple struct defs from htup.h to htup_details.h
This reduces unnecessary exposure of other headers through htup.h, which
is very widely included by many files.

I have chosen to move the function prototypes to the new file as well,
because that means htup.h no longer needs to include tupdesc.h.  In
itself this doesn't have much effect in indirect inclusion of tupdesc.h
throughout the tree, because it's also required by execnodes.h; but it's
something to explore in the future, and it seemed best to do the htup.h
change now while I'm busy with it.
2012-08-30 16:52:35 -04:00
Alvaro Herrera 45326c5a11 Split resowner.h
This lets files that are mere users of ResourceOwner not automatically
include the headers for stuff that is managed by the resowner mechanism.
2012-08-28 18:02:07 -04:00
Peter Eisentraut 15b1918e7d Improve reporting of permission errors for array types
Because permissions are assigned to element types, not array types,
complaining about permission denied on an array type would be
misleading to users.  So adjust the reporting to refer to the element
type instead.

In order not to duplicate the required logic in two dozen places,
refactor the permission denied reporting for types a bit.

pointed out by Yeb Havinga during the review of the type privilege
feature
2012-06-15 22:55:03 +03:00
Robert Haas d2c86a1ccd Remove RELKIND_UNCATALOGED.
This may have been important at some point in the past, but it no
longer does anything useful.

Review by Tom Lane.
2012-06-14 09:47:30 -04:00
Tom Lane 9873001e6d Another trivial comment-typo fix. 2012-04-25 14:28:58 -04:00
Heikki Linnakangas 342baf4ce6 Update outdated comment. HeapTupleHeader.t_natts field doesn't exist anymore.
Kevin Grittner
2012-03-09 08:07:56 +02:00
Bruce Momjian e126958c2e Update copyright notices for year 2012. 2012-01-01 18:01:58 -05:00
Robert Haas 0e4611c023 Add a security_barrier option for views.
When a view is marked as a security barrier, it will not be pulled up
into the containing query, and no quals will be pushed down into it,
so that no function or operator chosen by the user can be applied to
rows not exposed by the view.  Views not configured with this
option cannot provide robust row-level security, but will perform far
better.

Patch by KaiGai Kohei; original problem report by Heikki Linnakangas
(in October 2009!).  Review (in earlier versions) by Noah Misch and
others.  Design advice by Tom Lane and myself.  Further review and
cleanup by me.
2011-12-22 16:16:31 -05:00
Peter Eisentraut 729205571e Add support for privileges on types
This adds support for the more or less SQL-conforming USAGE privilege
on types and domains.  The intent is to be able restrict which users
can create dependencies on types, which restricts the way in which
owners can alter types.

reviewed by Yeb Havinga
2011-12-20 00:05:19 +02:00
Tom Lane 8daeb5ddd6 Add SP-GiST (space-partitioned GiST) index access method.
SP-GiST is comparable to GiST in flexibility, but supports non-balanced
partitioned search structures rather than balanced trees.  As described at
PGCon 2011, this new indexing structure can beat GiST in both index build
time and query speed for search problems that it is well matched to.

There are a number of areas that could still use improvement, but at this
point the code seems committable.

Teodor Sigaev and Oleg Bartunov, with considerable revisions by Tom Lane
2011-12-17 16:42:30 -05:00
Heikki Linnakangas 5edb24a898 Buffering GiST index build algorithm.
When building a GiST index that doesn't fit in cache, buffers are attached
to some internal nodes in the index. This speeds up the build by avoiding
random I/O that would otherwise be needed to traverse all the way down the
tree to the find right leaf page for tuple.

Alexander Korotkov
2011-09-08 17:51:23 +03:00
Tom Lane 1609797c25 Clean up the #include mess a little.
walsender.h should depend on xlog.h, not vice versa.  (Actually, the
inclusion was circular until a couple hours ago, which was even sillier;
but Bruce broke it in the expedient rather than logically correct
direction.)  Because of that poor decision, plus blind application of
pgrminclude, we had a situation where half the system was depending on
xlog.h to include such unrelated stuff as array.h and guc.h.  Clean up
the header inclusion, and manually revert a lot of what pgrminclude had
done so things build again.

This episode reinforces my feeling that pgrminclude should not be run
without adult supervision.  Inclusion changes in header files in particular
need to be reviewed with great care.  More generally, it'd be good if we
had a clearer notion of module layering to dictate which headers can sanely
include which others ... but that's a big task for another day.
2011-09-04 01:13:16 -04:00
Bruce Momjian 6416a82a62 Remove unnecessary #include references, per pgrminclude script. 2011-09-01 10:04:27 -04:00
Heikki Linnakangas 77949a2913 Change the way string relopts are allocated.
Don't try to allocate the default value for a string relopt in the same
palloc chunk as the relopt_string struct. That didn't work too well if you
added a built-in string relopt in the stringRelOpts array, as it's not
possible to have an initializer for a variable length struct in C. This
makes the code slightly simpler too.

While we're at it, move the call to validator function in
add_string_reloption to before the allocation, so that if someone does pass
a bogus default value, we don't leak memory.
2011-08-09 15:25:44 +03:00
Robert Haas c4096c7639 Allow per-column foreign data wrapper options.
Shigeru Hanada, with fairly minor editing by me.
2011-08-05 13:24:03 -04:00
Alvaro Herrera 897795240c Enable CHECK constraints to be declared NOT VALID
This means that they can initially be added to a large existing table
without checking its initial contents, but new tuples must comply to
them; a separate pass invoked by ALTER TABLE / VALIDATE can verify
existing data and ensure it complies with the constraint, at which point
it is marked validated and becomes a normal part of the table ecosystem.

An non-validated CHECK constraint is ignored in the planner for
constraint_exclusion purposes; when validated, cached plans are
recomputed so that partitioning starts working right away.

This patch also enables domains to have unvalidated CHECK constraints
attached to them as well by way of ALTER DOMAIN / ADD CONSTRAINT / NOT
VALID, which can later be validated with ALTER DOMAIN / VALIDATE
CONSTRAINT.

Thanks to Thom Brown, Dean Rasheed and Jaime Casanova for the various
reviews, and Robert Hass for documentation wording improvement
suggestions.

This patch was sponsored by Enova Financial.
2011-06-30 11:24:31 -04:00
Tom Lane 9e9b9ac7d1 Make a code-cleanup pass over the collations patch.
This patch is almost entirely cosmetic --- mostly cleaning up a lot of
neglected comments, and fixing code layout problems in places where the
patch made lines too long and then pgindent did weird things with that.
I did find a bug-of-omission in equalTupleDescs().
2011-04-22 17:43:18 -04:00
Tom Lane d64713df7e Pass collations to functions in FunctionCallInfoData, not FmgrInfo.
Since collation is effectively an argument, not a property of the function,
FmgrInfo is really the wrong place for it; and this becomes critical in
cases where a cached FmgrInfo is used for varying purposes that might need
different collation settings.  Fix by passing it in FunctionCallInfoData
instead.  In particular this allows a clean fix for bug #5970 (record_cmp
not working).  This requires touching a bit more code than the original
method, but nobody ever thought that collations would not be an invasive
patch...
2011-04-12 19:19:24 -04:00
Bruce Momjian bf50caf105 pgindent run before PG 9.1 beta 1. 2011-04-10 11:42:00 -04:00
Tom Lane 7208fae18f Clean up cruft around collation initialization for tupdescs and scankeys.
I found actual bugs in GiST and plpgsql; the rest of this is cosmetic
but meant to decrease the odds of future bugs of omission.
2011-03-26 18:28:40 -04:00
Tom Lane a051ef699c Remove collation information from TypeName, where it does not belong.
The initial collations patch treated a COLLATE spec as part of a TypeName,
following what can only be described as brain fade on the part of the SQL
committee.  It's a lot more reasonable to treat COLLATE as a syntactically
separate object, so that it can be added in only the productions where it
actually belongs, rather than needing to reject it in a boatload of places
where it doesn't belong (something the original patch mostly failed to do).
In addition this change lets us meet the spec's requirement to allow
COLLATE anywhere in the clauses of a ColumnDef, and it avoids unfriendly
behavior for constructs such as "foo::type COLLATE collation".

To do this, pull collation information out of TypeName and put it in
ColumnDef instead, thus reverting most of the collation-related changes in
parse_type.c's API.  I made one additional structural change, which was to
use a ColumnDef as an intermediate node in AT_AlterColumnType AlterTableCmd
nodes.  This provides enough room to get rid of the "transform" wart in
AlterTableCmd too, since the ColumnDef can carry the USING expression
easily enough.

Also fix some other minor bugs that have crept in in the same areas,
like failure to copy recently-added fields of ColumnDef in copyfuncs.c.

While at it, document the formerly secret ability to specify a collation
in ALTER TABLE ALTER COLUMN TYPE, ALTER TYPE ADD ATTRIBUTE, and
ALTER TYPE ALTER ATTRIBUTE TYPE; and correct some misstatements about
what the default collation selection will be when COLLATE is omitted.

BTW, the three-parameter form of format_type() should go away too,
since it just contributes to the confusion in this area; but I'll do
that in a separate patch.
2011-03-09 22:39:20 -05:00
Peter Eisentraut 414c5a2ea6 Per-column collation support
This adds collation support for columns and domains, a COLLATE clause
to override it per expression, and B-tree index support.

Peter Eisentraut
reviewed by Pavel Stehule, Itagaki Takahiro, Robert Haas, Noah Misch
2011-02-08 23:04:18 +02:00
Robert Haas 0d692a0dc9 Basic foreign table support.
Foreign tables are a core component of SQL/MED.  This commit does
not provide a working SQL/MED infrastructure, because foreign tables
cannot yet be queried.  Support for foreign table scans will need to
be added in a future patch.  However, this patch creates the necessary
system catalog structure, syntax support, and support for ancillary
operations such as COMMENT and SECURITY LABEL.

Shigeru Hanada, heavily revised by Robert Haas
2011-01-01 23:48:11 -05:00
Bruce Momjian 5d950e3b0c Stamp copyrights for year 2011. 2011-01-01 13:18:15 -05:00
Peter Eisentraut 35670340f5 Refactor typenameTypeId()
Split the old typenameTypeId() into two functions: A new typenameTypeId() that
returns only a type OID, and typenameTypeIdAndMod() that returns type OID and
typmod.  This isolates call sites better that actually care about the typmod.
2010-10-25 21:44:49 +03:00
Magnus Hagander 9f2e211386 Remove cvs keywords from all files. 2010-09-20 22:08:53 +02:00
Itagaki Takahiro b5faba1284 Ensure default-only storage parameters for TOAST relations
to be initialized with proper values. Affected parameters are
fillfactor, analyze_threshold, and analyze_scale_factor.

Especially uninitialized fillfactor caused inefficient page usage
because we built a StdRdOptions struct in which fillfactor is zero
if any reloption is set for the toast table.

In addition, we disallow toast.autovacuum_analyze_threshold and
toast.autovacuum_analyze_scale_factor because we didn't actually
support them; they are always ignored.

Report by Rumko on pgsql-bugs on 12 May 2010.
Analysis by Tom Lane and Alvaro Herrera. Patch by me.

Backpatch to 8.4.
2010-06-07 02:59:02 +00:00
Tom Lane 1f44a313bd Add missing reset of need_initialization in reloptions code.
This resulted in useless extra work during every call of parseRelOptions,
but no bad effects other than that.  Noted by Alvaro.
2010-03-11 21:47:19 +00:00
Bruce Momjian 65e806cba1 pgindent run for 9.0 2010-02-26 02:01:40 +00:00
Robert Haas e26c539e9f Wrap calls to SearchSysCache and related functions using macros.
The purpose of this change is to eliminate the need for every caller
of SearchSysCache, SearchSysCacheCopy, SearchSysCacheExists,
GetSysCacheOid, and SearchSysCacheList to know the maximum number
of allowable keys for a syscache entry (currently 4).  This will
make it far easier to increase the maximum number of keys in a
future release should we choose to do so, and it makes the code
shorter, too.

Design and review by Tom Lane.
2010-02-14 18:42:19 +00:00
Robert Haas 76a47c0e74 Replace ALTER TABLE ... SET STATISTICS DISTINCT with a more general mechanism.
Attributes can now have options, just as relations and tablespaces do, and
the reloptions code is used to parse, validate, and store them.  For
simplicity and because these options are not performance critical, we store
them in a separate cache rather than the main relcache.

Thanks to Alex Hunsaker for the review.
2010-01-22 16:40:19 +00:00
Robert Haas 84b6d5f359 Remove partial, broken support for NULL pointers when fetching attributes.
Previously, fastgetattr() and heap_getattr() tested their fourth argument
against a null pointer, but any attempt to use them with a literal-NULL
fourth argument evaluated to *(void *)0, resulting in a compiler error.
Remove these NULL tests to avoid leading future readers of this code to
believe that this has a chance of working.  Also clean up related legacy
code in nocachegetattr(), heap_getsysattr(), and nocache_index_getattr().

The new coding standard is that any code which calls a getattr-type
function or macro which takes an isnull argument MUST pass a valid
boolean pointer.  Per discussion with Bruce Momjian, Tom Lane, Alvaro
Herrera.
2010-01-10 04:26:36 +00:00
Robert Haas d86d51a958 Support ALTER TABLESPACE name SET/RESET ( tablespace_options ).
This patch only supports seq_page_cost and random_page_cost as parameters,
but it provides the infrastructure to scalably support many more.
In particular, we may want to add support for effective_io_concurrency,
but I'm leaving that as future work for now.

Thanks to Tom Lane for design help and Alvaro Herrera for the review.
2010-01-05 21:54:00 +00:00
Bruce Momjian 0239800893 Update copyright for the year 2010. 2010-01-02 16:58:17 +00:00
Tom Lane 29c4ad9829 Support "x IS NOT NULL" clauses as indexscan conditions. This turns out
to be just a minor extension of the previous patch that made "x IS NULL"
indexable, because we can treat the IS NOT NULL condition as if it were
"x < NULL" or "x > NULL" (depending on the index's NULLS FIRST/LAST option),
just like IS NULL is treated like "x = NULL".  Aside from any possible
usefulness in its own right, this is an important improvement for
index-optimized MAX/MIN aggregates: it is now reliably possible to get
a column's min or max value cheaply, even when there are a lot of nulls
cluttering the interesting end of the index.
2010-01-01 21:53:49 +00:00
Tom Lane 85d02a6586 Redefine Datum as uintptr_t, instead of unsigned long.
This is more in keeping with modern practice, and is a first step towards
porting to Win64 (which has sizeof(pointer) > sizeof(long)).

Tsutomu Yamada, Magnus Hagander, Tom Lane
2009-12-31 19:41:37 +00:00
Tom Lane 8d54c2482b Code review for LIKE INCLUDING patch --- clean up some cosmetic and not
so cosmetic stuff.
2009-10-13 00:53:08 +00:00
Andrew Dunstan faa1afc6c1 CREATE LIKE INCLUDING COMMENTS and STORAGE, and INCLUDING ALL shortcut. Itagaki Takahiro. 2009-10-12 19:49:24 +00:00
Alvaro Herrera 53af86c55c Fix handling of autovacuum reloptions.
In the original coding, setting a single reloption would cause default
values to be used for all the other reloptions.  This is a problem
particularly for autovacuum reloptions.

Itagaki Takahiro
2009-08-27 17:18:44 +00:00
Tom Lane 67a5f8ff9e Department of marginal improvements: teach tupconvert.c to avoid doing a
physical conversion when there are dropped columns in the same places in
the input and output tupdescs.  This avoids possible performance loss from
the recent patch to improve dropped-column handling, in some cases where
the old code would have worked.
2009-08-17 20:34:31 +00:00
Tom Lane dcb2bda9b7 Improve plpgsql's ability to cope with rowtypes containing dropped columns,
by supporting conversions in places that used to demand exact rowtype match.

Since this issue is certain to come up elsewhere (in fact, already has,
in ExecEvalConvertRowtype), factor out the support code into new core
functions for tuple conversion.  I chose to put these in a new source
file since heaptuple.c is already overly long.

Heavily revised version of a patch by Pavel Stehule.
2009-08-06 20:44:32 +00:00
Tom Lane 9072592946 Add ALTER TABLE ... ALTER COLUMN ... SET STATISTICS DISTINCT
Robert Haas
2009-08-02 22:14:53 +00:00
Tom Lane b680ae4bdb Improve unique-constraint-violation error messages to include the exact
values being complained of.

In passing, also remove the arbitrary length limitation in the similar
error detail message for foreign key violations.

Itagaki Takahiro
2009-08-01 19:59:41 +00:00
Peter Eisentraut de160e2c00 Make backend header files C++ safe
This alters various incidental uses of C++ key words to use other similar
identifiers, so that a C++ compiler won't choke outright.  You still
(probably) need extern "C" { }; around the inclusion of backend headers.

based on a patch by Kurt Harriman <harriman@acm.org>

Also add a script cpluspluscheck to check for C++ compatibility in the
future.  As of right now, this passes without error for me.
2009-07-16 06:33:46 +00:00
Bruce Momjian d747140279 8.4 pgindent run, with new combined Linux/FreeBSD/MinGW typedef list
provided by Andrew.
2009-06-11 14:49:15 +00:00
Tom Lane c3707a4fcd Use more-portable coding for the check on handing out the last available
relopt_kind value in add_reloption_kind().  Per Zdenek Kotala.
2009-05-24 22:22:44 +00:00
Tom Lane 090173a3f9 Remove the recently added node types ReloptElem and OptionDefElem in favor
of adding optional namespace and action fields to DefElem.  Having three
node types that do essentially the same thing bloats the code and leads
to errors of confusion, such as in yesterday's bug report from Khee Chin.
2009-04-04 21:12:31 +00:00
Alvaro Herrera 1c855f01ea Disallow setting fillfactor for TOAST tables.
To implement this without almost duplicating the reloption table, treat
relopt_kind as a bitmask instead of an integer value.  This decreases the
range of allowed values, but it's not clear that there's need for that much
values anyway.

This patch also makes heap_reloptions explicitly a no-op for relation kinds
other than heap and TOAST tables.

Patch by ITAGAKI Takahiro with minor edits from me.  (In particular I removed
the bit about adding relation kind to an error message, which I intend to
commit separately.)
2009-04-04 00:45:02 +00:00
Tom Lane 793d5662e8 Fix an oversight in the support for storing/retrieving "minimal tuples" in
TupleTableSlots.  We have functions for retrieving a minimal tuple from a slot
after storing a regular tuple in it, or vice versa; but these were implemented
by converting the internal storage from one format to the other.  The problem
with that is it invalidates any pass-by-reference Datums that were already
fetched from the slot, since they'll be pointing into the just-freed version
of the tuple.  The known problem cases involve fetching both a whole-row
variable and a pass-by-reference value from a slot that is fed from a
tuplestore or tuplesort object.  The added regression tests illustrate some
simple cases, but there may be other failure scenarios traceable to the same
bug.  Note that the added tests probably only fail on unpatched code if it's
built with --enable-cassert; otherwise the bug leads to fetching from freed
memory, which will not have been overwritten without additional conditions.

Fix by allowing a slot to contain both formats simultaneously; which turns out
not to complicate the logic much at all, if anything it seems less contorted
than before.

Back-patch to 8.2, where minimal tuples were introduced.
2009-03-30 04:08:43 +00:00
Tom Lane ff301d6e69 Implement "fastupdate" support for GIN indexes, in which we try to accumulate
multiple index entries in a holding area before adding them to the main index
structure.  This helps because bulk insert is (usually) significantly faster
than retail insert for GIN.

This patch also removes GIN support for amgettuple-style index scans.  The
API defined for amgettuple is difficult to support with fastupdate, and
the previously committed partial-match feature didn't really work with
it either.  We might eventually figure a way to put back amgettuple
support, but it won't happen for 8.4.

catversion bumped because of change in GIN's pg_am entry, and because
the format of GIN indexes changed on-disk (there's a metapage now,
and possibly a pending list).

Teodor Sigaev
2009-03-24 20:17:18 +00:00
Tom Lane 1079564979 Const-ify the parse table passed to fillRelOptions. The previous coding
meant it had to be built on-the-fly at each entry to default_reloptions.
2009-03-23 16:36:27 +00:00
Tom Lane 640796ff41 Reduce the maximum value of vacuum_cost_delay and autovacuum_vacuum_cost_delay
to 100ms (from 1000).  This still seems to be comfortably larger than the
useful range of the parameter, and it should help discourage people from
picking uselessly large values.  Tweak the documentation to recommend small
values, too.  Per discussion of a couple weeks ago.
2009-02-28 00:10:52 +00:00
Alvaro Herrera 834a6da4f7 Update autovacuum to use reloptions instead of a system catalog, for
per-table overrides of parameters.

This removes a whole class of problems related to misusing the catalog,
and perhaps more importantly, gives us pg_dump support for the parameters.

Based on a patch by Euler Taveira de Oliveira, heavily reworked by me.
2009-02-09 20:57:59 +00:00
Alvaro Herrera 3a5b773715 Allow reloption names to have qualifiers, initially supporting a TOAST
qualifier, and add support for this in pg_dump.

This allows TOAST tables to have user-defined fillfactor, and will also
enable us to move the autovacuum parameters to reloptions without taking
away the possibility of setting values for TOAST tables.
2009-02-02 19:31:40 +00:00
Alvaro Herrera c0f92b57dc Allow extracting and parsing of reloptions from a bare pg_class tuple, and
refactor the relcache code that used to do that.  This allows other callers
(particularly autovacuum) to do the same without necessarily having to open
and lock a table.
2009-01-26 19:41:06 +00:00
Tom Lane 3cb5d6580a Support column-level privileges, as required by SQL standard.
Stephen Frost, with help from KaiGai Kohei and others
2009-01-22 20:16:10 +00:00
Alvaro Herrera 8ebe1e356c Simplify the writing of amoptions routines by introducing a convenience
fillRelOptions routine that stores the parsed values in the struct using a
table-based approach.  Per Tom suggestion.  Also remove the "continue"
in HANDLE_*_RELOPTION macros, which were useless and in spirit they were
assuming too much of how the macros were going to be used.  (Note that these
macros are now unused, but the intention is to introduce some usage in a
future autovacuum patch, which is why they weren't completely removed.)

Also, do not call the string validation routine when not validating.  It seems
less error-prone this way, per commentary on the amoptions SGML docs.
2009-01-12 21:02:15 +00:00
Alvaro Herrera b813c8daca A couple further reloptions improvements, per KaiGai Kohei: add a validation
function to the string type and add a couple of macros for string handling.

In passing, fix an off-by-one bug of mine.
2009-01-08 19:34:41 +00:00
Alvaro Herrera b25433da5d Fix string reloption handling, per KaiGai Kohei. 2009-01-06 14:47:37 +00:00
Bruce Momjian 1d2a185eae Suppress compiler warning in a different way, per Alvaro. 2009-01-06 03:15:51 +00:00
Bruce Momjian 02aa5f19c6 Supress compiler warning. 2009-01-06 02:44:17 +00:00
Alvaro Herrera ba748f7a11 Change the reloptions machinery to use a table-based parser, and provide
a more complete framework for writing custom option processing routines
by user-defined access methods.

Catalog version bumped due to the general API changes, which are going to
affect user-defined "amoptions" routines.
2009-01-05 17:14:28 +00:00
Bruce Momjian 511db38ace Update copyright for 2009. 2009-01-01 17:24:05 +00:00
Tom Lane c1f3073333 Clean up the API for DestReceiver objects by eliminating the assumption
that a Portal is a useful and sufficient additional argument for
CreateDestReceiver --- it just isn't, in most cases.  Instead formalize
the approach of passing any needed parameters to the receiver separately.

One unexpected benefit of this change is that we can declare typedef Portal
in a less surprising location.

This patch is just code rearrangement and doesn't change any functionality.
I'll tackle the HOLD-cursor-vs-toast problem in a follow-on patch.
2008-11-30 20:51:25 +00:00
Alvaro Herrera 03e5248d0f Replace the usage of heap_addheader to create pg_attribute tuples with regular
heap_form_tuple.  Since this removes the last remaining caller of
heap_addheader, remove it.

Extracted from the column privileges patch from Stephen Frost, with further
code cleanups by me.
2008-11-14 01:57:42 +00:00
Tom Lane 902d1cb35f Remove all uses of the deprecated functions heap_formtuple, heap_modifytuple,
and heap_deformtuple in favor of the newer functions heap_form_tuple et al
(which do the same things but use bool control flags instead of arbitrary
char values).  Eliminate the former duplicate coding of these functions,
reducing the deprecated functions to mere wrappers around the newer ones.
We can't get rid of them entirely because add-on modules probably still
contain many instances of the old coding style.

Kris Jurka
2008-11-02 01:45:28 +00:00
Tom Lane 11c794f224 Use guc.c's parse_int() instead of pg_atoi() to parse fillfactor in
default_reloptions().  The previous coding was really a bug because pg_atoi()
will always throw elog on bad input data, whereas default_reloptions is not
supposed to complain about bad input unless its validate parameter is true.
Right now you could only expose the problem by hand-modifying
pg_class.reloptions into an invalid state, so it doesn't seem worth
back-patching; but we should get it right in HEAD because there might be other
situations in future.  Noted while studying GIN fast-update patch.
2008-07-23 17:29:53 +00:00