Commit Graph

2101 Commits

Author SHA1 Message Date
Andres Freund 570bd2b3fd Add capability to suppress CONTEXT: messages to elog machinery.
Hiding context messages usually is not a good idea - except for rather
verbose debugging/development utensils like LOG_DEBUG. There the
amount of repeated context messages just bloat the log without adding
information.
2014-12-25 17:24:30 +01:00
Fujii Masao 60838df922 Move pg_lzcompress.c to src/common.
Exposing compression and decompression APIs of pglz makes possible its
use by extensions and contrib modules. pglz_decompress contained a call
to elog to emit an error message in case of corrupted data. This function
is changed to return a status code to let its callers return an error instead.

This commit is required for upcoming WAL compression feature so that
the WAL reader facility can decompress the WAL data by using pglz_decompress.

Michael Paquier
2014-12-25 20:46:14 +09:00
Alvaro Herrera a609d96778 Revert "Use a bitmask to represent role attributes"
This reverts commit 1826987a46.

The overall design was deemed unacceptable, in discussion following the
previous commit message; we might find some parts of it still
salvageable, but I don't want to be on the hook for fixing it, so let's
wait until we have a new patch.
2014-12-23 15:35:49 -03:00
Alvaro Herrera d7ee82e50f Add SQL-callable pg_get_object_address
This allows access to get_object_address from SQL, which is useful to
obtain OID addressing information from data equivalent to that emitted
by the parser.  This is necessary infrastructure of a project to let
replication systems propagate object dropping events to remote servers,
where the schema might be different than the server originating the
DROP.

This patch also adds support for OBJECT_DEFAULT to get_object_address;
that is, it is now possible to refer to a column's default value.

Catalog version bumped due to the new function.

Reviewed by Stephen Frost, Heikki Linnakangas, Robert Haas, Andres
Freund, Abhijit Menon-Sen, Adam Brightwell.
2014-12-23 15:31:29 -03:00
Alvaro Herrera 1826987a46 Use a bitmask to represent role attributes
The previous representation using a boolean column for each attribute
would not scale as well as we want to add further attributes.

Extra auxilliary functions are added to go along with this change, to
make up for the lost convenience of access of the old representation.

Catalog version bumped due to change in catalogs and the new functions.

Author: Adam Brightwell, minor tweaks by Álvaro
Reviewed by: Stephen Frost, Andres Freund, Álvaro Herrera
2014-12-23 10:22:09 -03:00
Heikki Linnakangas 955557ddcc Move rbtree.c from src/backend/utils/misc to src/backend/lib.
We have other general-purpose data structures in src/backend/lib, so it
seems like a better home for the red-black tree as well.
2014-12-22 17:52:08 +02:00
Tom Lane 4a14f13a0a Improve hash_create's API for selecting simple-binary-key hash functions.
Previously, if you wanted anything besides C-string hash keys, you had to
specify a custom hashing function to hash_create().  Nearly all such
callers were specifying tag_hash or oid_hash; which is tedious, and rather
error-prone, since a caller could easily miss the opportunity to optimize
by using hash_uint32 when appropriate.  Replace this with a design whereby
callers using simple binary-data keys just specify HASH_BLOBS and don't
need to mess with specific support functions.  hash_create() itself will
take care of optimizing when the key size is four bytes.

This nets out saving a few hundred bytes of code space, and offers
a measurable performance improvement in tidbitmap.c (which was not
exploiting the opportunity to use hash_uint32 for its 4-byte keys).
There might be some wins elsewhere too, I didn't analyze closely.

In future we could look into offering a similar optimized hashing function
for 8-byte keys.  Under this design that could be done in a centralized
and machine-independent fashion, whereas getting it right for keys of
platform-dependent sizes would've been notationally painful before.

For the moment, the old way still works fine, so as not to break source
code compatibility for loadable modules.  Eventually we might want to
remove tag_hash and friends from the exported API altogether, since there's
no real need for them to be explicitly referenced from outside dynahash.c.

Teodor Sigaev and Tom Lane
2014-12-18 13:36:36 -05:00
Heikki Linnakangas 4520ba6769 Add point <-> polygon distance operator.
Alexander Korotkov, reviewed by Emre Hasegeli.
2014-12-15 17:06:21 +02:00
Andrew Dunstan 7e354ab9fe Add several generator functions for jsonb that exist for json.
The functions are:
    to_jsonb()
    jsonb_object()
    jsonb_build_object()
    jsonb_build_array()
    jsonb_agg()
    jsonb_object_agg()

Also along the way some better logic is implemented in
json_categorize_type() to match that in the newly implemented
jsonb_categorize_type().

Andrew Dunstan, reviewed by Pavel Stehule and Alvaro Herrera.
2014-12-12 15:31:14 -05:00
Andrew Dunstan 237a882443 Add json_strip_nulls and jsonb_strip_nulls functions.
The functions remove object fields, including in nested objects, that
have null as a value. In certain cases this can lead to considerably
smaller datums, with no loss of semantic information.

Andrew Dunstan, reviewed by Pavel Stehule.
2014-12-12 09:00:43 -05:00
Simon Riggs 618c9430a8 Event Trigger for table_rewrite
Generate a table_rewrite event when ALTER TABLE
attempts to rewrite a table. Provide helper
functions to identify table and reason.

Intended use case is to help assess or to react
to schema changes that might hold exclusive locks
for long periods.

Dimitri Fontaine, triggering an edit by Simon Riggs

Reviewed in detail by Michael Paquier
2014-12-08 00:55:28 +09:00
Peter Eisentraut e86507d770 Move PG_AUTOCONF_FILENAME definition
Since this is not something that a user should change,
pg_config_manual.h was an inappropriate place for it.

In initdb.c, remove the use of the macro, because utils/guc.h can't be
included by non-backend code.  But we hardcode all the other
configuration file names there, so this isn't a disaster.
2014-12-03 19:54:01 -05:00
Alvaro Herrera 73c986adde Keep track of transaction commit timestamps
Transactions can now set their commit timestamp directly as they commit,
or an external transaction commit timestamp can be fed from an outside
system using the new function TransactionTreeSetCommitTsData().  This
data is crash-safe, and truncated at Xid freeze point, same as pg_clog.

This module is disabled by default because it causes a performance hit,
but can be enabled in postgresql.conf requiring only a server restart.

A new test in src/test/modules is included.

Catalog version bumped due to the new subdirectory within PGDATA and a
couple of new SQL functions.

Authors: Álvaro Herrera and Petr Jelínek

Reviewed to varying degrees by Michael Paquier, Andres Freund, Robert
Haas, Amit Kapila, Fujii Masao, Jaime Casanova, Simon Riggs, Steven
Singer, Peter Eisentraut
2014-12-03 11:53:02 -03:00
Andrew Dunstan e09996ff8d Fix hstore_to_json_loose's detection of valid JSON number values.
We expose a function IsValidJsonNumber that internally calls the lexer
for json numbers. That allows us to use the same test everywhere,
instead of inventing a broken test for hstore conversions. The new
function is also used in datum_to_json, replacing the code that is now
moved to the new function.

Backpatch to 9.3 where hstore_to_json_loose was introduced.
2014-12-01 11:28:45 -05:00
Stephen Frost 143b39c185 Rename pg_rowsecurity -> pg_policy and other fixes
As pointed out by Robert, we should really have named pg_rowsecurity
pg_policy, as the objects stored in that catalog are policies.  This
patch fixes that and updates the column names to start with 'pol' to
match the new catalog name.

The security consideration for COPY with row level security, also
pointed out by Robert, has also been addressed by remembering and
re-checking the OID of the relation initially referenced during COPY
processing, to make sure it hasn't changed under us by the time we
finish planning out the query which has been built.

Robert and Alvaro also commented on missing OCLASS and OBJECT entries
for POLICY (formerly ROWSECURITY or POLICY, depending) in various
places.  This patch fixes that too, which also happens to add the
ability to COMMENT on policies.

In passing, attempt to improve the consistency of messages, comments,
and documentation as well.  This removes various incarnations of
'row-security', 'row-level security', 'Row-security', etc, in favor
of 'policy', 'row level security' or 'row_security' as appropriate.

Happy Thanksgiving!
2014-11-27 01:15:57 -05:00
Tom Lane bac27394a1 Support arrays as input to array_agg() and ARRAY(SELECT ...).
These cases formerly failed with errors about "could not find array type
for data type".  Now they yield arrays of the same element type and one
higher dimension.

The implementation involves creating functions with API similar to the
existing accumArrayResult() family.  I (tgl) also extended the base family
by adding an initArrayResult() function, which allows callers to avoid
special-casing the zero-inputs case if they just want an empty array as
result.  (Not all do, so the previous calling convention remains valid.)
This allowed simplifying some existing code in xml.c and plperl.c.

Ali Akbar, reviewed by Pavel Stehule, significantly modified by me
2014-11-25 12:21:28 -05:00
Robert Haas f5d9698a84 Add infrastructure to save and restore GUC values.
This is further infrastructure for parallelism.

Amit Khandekar, Noah Misch, Robert Haas
2014-11-24 16:37:56 -05:00
Heikki Linnakangas 2c03216d83 Revamp the WAL record format.
Each WAL record now carries information about the modified relation and
block(s) in a standardized format. That makes it easier to write tools that
need that information, like pg_rewind, prefetching the blocks to speed up
recovery, etc.

There's a whole new API for building WAL records, replacing the XLogRecData
chains used previously. The new API consists of XLogRegister* functions,
which are called for each buffer and chunk of data that is added to the
record. The new API also gives more control over when a full-page image is
written, by passing flags to the XLogRegisterBuffer function.

This also simplifies the XLogReadBufferForRedo() calls. The function can dig
the relation and block number from the WAL record, so they no longer need to
be passed as arguments.

For the convenience of redo routines, XLogReader now disects each WAL record
after reading it, copying the main data part and the per-block data into
MAXALIGNed buffers. The data chunks are not aligned within the WAL record,
but the redo routines can assume that the pointers returned by XLogRecGet*
functions are. Redo routines are now passed the XLogReaderState, which
contains the record in the already-disected format, instead of the plain
XLogRecord.

The new record format also makes the fixed size XLogRecord header smaller,
by removing the xl_len field. The length of the "main data" portion is now
stored at the end of the WAL record, and there's a separate header after
XLogRecord for it. The alignment padding at the end of XLogRecord is also
removed. This compansates for the fact that the new format would otherwise
be more bulky than the old format.

Reviewed by Andres Freund, Amit Kapila, Michael Paquier, Alvaro Herrera,
Fujii Masao.
2014-11-20 18:46:41 +02:00
Alvaro Herrera 0f9692b40d Fix relpersistence setting in reindex_index
Buildfarm members with CLOBBER_CACHE_ALWAYS advised us that commit
85b506bbfc was mistaken in setting the relpersistence value of the
index directly in the relcache entry, within reindex_index.  The reason
for the failure is that an invalidation message that comes after mucking
with the relcache entry directly, but before writing it to the catalogs,
would cause the entry to become rebuilt in place from catalogs with the
old contents, losing the update.

Fix by passing the correct persistence value to
RelationSetNewRelfilenode instead; this routine also writes the updated
tuple to pg_class, avoiding the problem.  Suggested by Tom Lane.
2014-11-17 11:23:35 -03:00
Stephen Frost 80eacaa3cd Clean up includes from RLS patch
The initial patch for RLS mistakenly included headers associated with
the executor and planner bits in rewrite/rowsecurity.h.  Per policy and
general good sense, executor headers should not be included in planner
headers or vice versa.

The include of execnodes.h was a mistaken holdover from previous
versions, while the include of relation.h was used for Relation's
definition, which should have been coming from utils/relcache.h.  This
patch cleans these issues up, adds comments to the RowSecurityPolicy
struct and the RowSecurityConfigType enum, and changes Relation->rsdesc
to Relation->rd_rsdesc to follow Relation field naming convention.

Additionally, utils/rel.h was including rewrite/rowsecurity.h, which
wasn't a great idea since that was pulling in things not really needed
in utils/rel.h (which gets included in quite a few places).  Instead,
use 'struct RowSecurityDesc' for the rd_rsdesc field and add comments
explaining why.

Lastly, add an include into access/nbtree/nbtsort.c for
utils/sortsupport.h, which was evidently missed due to the above mess.

Pointed out by Tom in 16970.1415838651@sss.pgh.pa.us; note that the
concerns regarding a similar situation in the custom-path commit still
need to be addressed.
2014-11-14 17:05:17 -05:00
Robert Haas c0828b78e9 Move the guts of our Levenshtein implementation into core.
The hope is that we can use this to produce better diagnostics in
some cases.

Peter Geoghegan, reviewed by Michael Paquier, with some further
changes by me.
2014-11-13 12:33:26 -05:00
Tom Lane 677708032c Explicitly support the case that a plancache's raw_parse_tree is NULL.
This only happens if a client issues a Parse message with an empty query
string, which is a bit odd; but since it is explicitly called out as legal
by our FE/BE protocol spec, we'd probably better continue to allow it.

Fix by adding tests everywhere that the raw_parse_tree field is passed to
functions that don't or shouldn't accept NULL.  Also make it clear in the
relevant comments that NULL is an expected case.

This reverts commits a73c9dbab0 and
2e9650cbcf, which fixed specific crash
symptoms by hacking things at what now seems to be the wrong end, ie the
callee functions.  Making the callees allow NULL is superficially more
robust, but it's not always true that there is a defensible thing for the
callee to do in such cases.  The caller has more context and is better
able to decide what the empty-query case ought to do.

Per followup discussion of bug #11335.  Back-patch to 9.2.  The code
before that is sufficiently different that it would require development
of a separate patch, which doesn't seem worthwhile for what is believed
to be an essentially cosmetic change.
2014-11-12 15:59:01 -05:00
Fujii Masao 1871c89202 Add generate_series(numeric, numeric).
Платон Малюгин
Reviewed by Michael Paquier, Ali Akbar and Marti Raudsepp
2014-11-11 21:44:46 +09:00
Fujii Masao a1b395b6a2 Add GUC and storage parameter to set the maximum size of GIN pending list.
Previously the maximum size of GIN pending list was controlled only by
work_mem. But the reasonable value of work_mem and the reasonable size
of the list are basically not the same, so it was not appropriate to
control both of them by only one GUC, i.e., work_mem. This commit
separates new GUC, pending_list_cleanup_size, from work_mem to allow
users to control only the size of the list.

Also this commit adds pending_list_cleanup_size as new storage parameter
to allow users to specify the size of the list per index. This is useful,
for example, when users want to increase the size of the list only for
the GIN index which can be updated heavily, and decrease it otherwise.

Reviewed by Etsuro Fujita.
2014-11-11 21:08:21 +09:00
Bruce Momjian 67067f9ae3 C comment: mention 1500-02-29 as an invalid date
It is invalid because the Gregorian calendar is used for all years.
2014-11-09 20:50:15 -05:00
Robert Haas 5ea86e6e65 Use the sortsupport infrastructure in more cases.
This removes some fmgr overhead from cases such as btree index builds.

Peter Geoghegan, reviewed by Andreas Karlsson and me.
2014-11-07 15:50:55 -05:00
Alvaro Herrera 7516f52594 BRIN: Block Range Indexes
BRIN is a new index access method intended to accelerate scans of very
large tables, without the maintenance overhead of btrees or other
traditional indexes.  They work by maintaining "summary" data about
block ranges.  Bitmap index scans work by reading each summary tuple and
comparing them with the query quals; all pages in the range are returned
in a lossy TID bitmap if the quals are consistent with the values in the
summary tuple, otherwise not.  Normal index scans are not supported
because these indexes do not store TIDs.

As new tuples are added into the index, the summary information is
updated (if the block range in which the tuple is added is already
summarized) or not; in the latter case, a subsequent pass of VACUUM or
the brin_summarize_new_values() function will create the summary
information.

For data types with natural 1-D sort orders, the summary info consists
of the maximum and the minimum values of each indexed column within each
page range.  This type of operator class we call "Minmax", and we
supply a bunch of them for most data types with B-tree opclasses.
Since the BRIN code is generalized, other approaches are possible for
things such as arrays, geometric types, ranges, etc; even for things
such as enum types we could do something different than minmax with
better results.  In this commit I only include minmax.

Catalog version bumped due to new builtin catalog entries.

There's more that could be done here, but this is a good step forwards.

Loosely based on ideas from Simon Riggs; code mostly by Álvaro Herrera,
with contribution by Heikki Linnakangas.

Patch reviewed by: Amit Kapila, Heikki Linnakangas, Robert Haas.
Testing help from Jeff Janes, Erik Rijkers, Emanuel Calvo.

PS:
  The research leading to these results has received funding from the
  European Union's Seventh Framework Programme (FP7/2007-2013) under
  grant agreement n° 318633.
2014-11-07 16:38:14 -03:00
Heikki Linnakangas 2076db2aea Move the backup-block logic from XLogInsert to a new file, xloginsert.c.
xlog.c is huge, this makes it a little bit smaller, which is nice. Functions
related to putting together the WAL record are in xloginsert.c, and the
lower level stuff for managing WAL buffers and such are in xlog.c.

Also move the definition of XLogRecord to a separate header file. This
causes churn in the #includes of all the files that write WAL records, and
redo routines, but it avoids pulling in xlog.h into most places.

Reviewed by Michael Paquier, Alvaro Herrera, Andres Freund and Amit Kapila.
2014-11-06 13:55:36 +02:00
Bruce Momjian 171c377a0a C comment: mention why the Gregorian calendar is used pre-1582 2014-11-06 02:33:05 -05:00
Heikki Linnakangas 5028f22f6e Switch to CRC-32C in WAL and other places.
The old algorithm was found to not be the usual CRC-32 algorithm, used by
Ethernet et al. We were using a non-reflected lookup table with code meant
for a reflected lookup table. That's a strange combination that AFAICS does
not correspond to any bit-wise CRC calculation, which makes it difficult to
reason about its properties. Although it has worked well in practice, seems
safer to use a well-known algorithm.

Since we're changing the algorithm anyway, we might as well choose a
different polynomial. The Castagnoli polynomial has better error-correcting
properties than the traditional CRC-32 polynomial, even if we had
implemented it correctly. Another reason for picking that is that some new
CPUs have hardware support for calculating CRC-32C, but not CRC-32, let
alone our strange variant of it. This patch doesn't add any support for such
hardware, but a future patch could now do that.

The old algorithm is kept around for tsquery and pg_trgm, which use the
values in indexes that need to remain compatible so that pg_upgrade works.
While we're at it, share the old lookup table for CRC-32 calculation
between hstore, ltree and core. They all use the same table, so might as
well.
2014-11-04 11:39:48 +02:00
Heikki Linnakangas 404bc51cde Remove support for 64-bit CRC.
It hasn't been used for anything for a long time.
2014-11-04 11:33:08 +02:00
Robert Haas 2bd9e412f9 Support frontend-backend protocol communication using a shm_mq.
A background worker can use pq_redirect_to_shm_mq() to direct protocol
that would normally be sent to the frontend to a shm_mq so that another
process may read them.

The receiving process may use pq_parse_errornotice() to parse an
ErrorResponse or NoticeResponse from the background worker and, if
it wishes, ThrowErrorData() to propagate the error (with or without
further modification).

Patch by me.  Review by Andres Freund.
2014-10-31 12:02:40 -04:00
Robert Haas 6cb4afff33 Avoid setup work for invalidation messages at start-of-(sub)xact.
Instead of initializing a new TransInvalidationInfo for every
transaction or subtransaction, we can just do it for those
transactions or subtransactions that actually need to queue
invalidation messages.  That also avoids needing to free those
entries at the end of a transaction or subtransaction that does
not generate any invalidation messages, which is by far the
common case.

Patch by me.  Review by Simon Riggs and Andres Freund.
2014-10-29 12:35:19 -04:00
Tom Lane b2cbced9ee Support timezone abbreviations that sometimes change.
Up to now, PG has assumed that any given timezone abbreviation (such as
"EDT") represents a constant GMT offset in the usage of any particular
region; we had a way to configure what that offset was, but not for it
to be changeable over time.  But, as with most things horological, this
view of the world is too simplistic: there are numerous regions that have
at one time or another switched to a different GMT offset but kept using
the same timezone abbreviation.  Almost the entire Russian Federation did
that a few years ago, and later this month they're going to do it again.
And there are similar examples all over the world.

To cope with this, invent the notion of a "dynamic timezone abbreviation",
which is one that is referenced to a particular underlying timezone
(as defined in the IANA timezone database) and means whatever it currently
means in that zone.  For zones that use or have used daylight-savings time,
the standard and DST abbreviations continue to have the property that you
can specify standard or DST time and get that time offset whether or not
DST was theoretically in effect at the time.  However, the abbreviations
mean what they meant at the time in question (or most recently before that
time) rather than being absolutely fixed.

The standard abbreviation-list files have been changed to use this behavior
for abbreviations that have actually varied in meaning since 1970.  The
old simple-numeric definitions are kept for abbreviations that have not
changed, since they are a bit faster to resolve.

While this is clearly a new feature, it seems necessary to back-patch it
into all active branches, because otherwise use of Russian zone
abbreviations is going to become even more problematic than it already was.
This change supersedes the changes in commit 513d06ded et al to modify the
fixed meanings of the Russian abbreviations; since we've not shipped that
yet, this will avoid an undesirably incompatible (not to mention incorrect)
change in behavior for timestamps between 2011 and 2014.

This patch makes some cosmetic changes in ecpglib to keep its usage of
datetime lookup tables as similar as possible to the backend code, but
doesn't do anything about the increasingly obsolete set of timezone
abbreviation definitions that are hard-wired into ecpglib.  Whatever we
do about that will likely not be appropriate material for back-patching.
Also, a potential free() of a garbage pointer after an out-of-memory
failure in ecpglib has been fixed.

This patch also fixes pre-existing bugs in DetermineTimeZoneOffset() that
caused it to produce unexpected results near a timezone transition, if
both the "before" and "after" states are marked as standard time.  We'd
only ever thought about or tested transitions between standard and DST
time, but that's not what's happening when a zone simply redefines their
base GMT offset.

In passing, update the SGML documentation to refer to the Olson/zoneinfo/
zic timezone database as the "IANA" database, since it's now being
maintained under the auspices of IANA.
2014-10-16 15:22:10 -04:00
Alvaro Herrera 7b1c2a0f20 Split builtins.h to a new header ruleutils.h
The new header contains many prototypes for functions in ruleutils.c
that are not exposed to the SQL level.

Reviewed by Andres Freund and Michael Paquier.
2014-10-08 18:10:47 -03:00
Alvaro Herrera df630b0dd5 Implement SKIP LOCKED for row-level locks
This clause changes the behavior of SELECT locking clauses in the
presence of locked rows: instead of causing a process to block waiting
for the locks held by other processes (or raise an error, with NOWAIT),
SKIP LOCKED makes the new reader skip over such rows.  While this is not
appropriate behavior for general purposes, there are some cases in which
it is useful, such as queue-like tables.

Catalog version bumped because this patch changes the representation of
stored rules.

Reviewed by Craig Ringer (based on a previous attempt at an
implementation by Simon Riggs, who also provided input on the syntax
used in the current patch), David Rowley, and Álvaro Herrera.

Author: Thomas Munro
2014-10-07 17:23:34 -03:00
Stephen Frost c8a026e4f1 Revert 95d737ff to add 'ignore_nulls'
Per discussion, revert the commit which added 'ignore_nulls' to
row_to_json.  This capability would be better added as an independent
function rather than being bolted on to row_to_json.  Additionally,
the implementation didn't address complex JSON objects, and so was
incomplete anyway.

Pointed out by Tom and discussed with Andrew and Robert.
2014-09-29 13:32:22 -04:00
Tom Lane def4c28cf9 Change JSONB's on-disk format for improved performance.
The original design used an array of offsets into the variable-length
portion of a JSONB container.  However, such an array is basically
uncompressible by simple compression techniques such as TOAST's LZ
compressor.  That's bad enough, but because the offset array is at the
front, it tended to trigger the give-up-after-1KB heuristic in the TOAST
code, so that the entire JSONB object was stored uncompressed; which was
the root cause of bug #11109 from Larry White.

To fix without losing the ability to extract a random array element in O(1)
time, change this scheme so that most of the JEntry array elements hold
lengths rather than offsets.  With data that's compressible at all, there
tend to be fewer distinct element lengths, so that there is scope for
compression of the JEntry array.  Every N'th entry is still an offset.
To determine the length or offset of any specific element, we might have
to examine up to N preceding JEntrys, but that's still O(1) so far as the
total container size is concerned.  Testing shows that this cost is
negligible compared to other costs of accessing a JSONB field, and that
the method does largely fix the incompressible-data problem.

While at it, rearrange the order of elements in a JSONB object so that
it's "all the keys, then all the values" not alternating keys and values.
This doesn't really make much difference right at the moment, but it will
allow providing a fast path for extracting individual object fields from
large JSONB values stored EXTERNAL (ie, uncompressed), analogously to the
existing optimization for substring extraction from large EXTERNAL text
values.

Bump catversion to denote the incompatibility in on-disk format.
We will need to fix pg_upgrade to disallow upgrading jsonb data stored
with 9.4 betas 1 and 2.

Heikki Linnakangas and Tom Lane
2014-09-29 12:29:21 -04:00
Stephen Frost 491c029dbc Row-Level Security Policies (RLS)
Building on the updatable security-barrier views work, add the
ability to define policies on tables to limit the set of rows
which are returned from a query and which are allowed to be added
to a table.  Expressions defined by the policy for filtering are
added to the security barrier quals of the query, while expressions
defined to check records being added to a table are added to the
with-check options of the query.

New top-level commands are CREATE/ALTER/DROP POLICY and are
controlled by the table owner.  Row Security is able to be enabled
and disabled by the owner on a per-table basis using
ALTER TABLE .. ENABLE/DISABLE ROW SECURITY.

Per discussion, ROW SECURITY is disabled on tables by default and
must be enabled for policies on the table to be used.  If no
policies exist on a table with ROW SECURITY enabled, a default-deny
policy is used and no records will be visible.

By default, row security is applied at all times except for the
table owner and the superuser.  A new GUC, row_security, is added
which can be set to ON, OFF, or FORCE.  When set to FORCE, row
security will be applied even for the table owner and superusers.
When set to OFF, row security will be disabled when allowed and an
error will be thrown if the user does not have rights to bypass row
security.

Per discussion, pg_dump sets row_security = OFF by default to ensure
that exports and backups will have all data in the table or will
error if there are insufficient privileges to bypass row security.
A new option has been added to pg_dump, --enable-row-security, to
ask pg_dump to export with row security enabled.

A new role capability, BYPASSRLS, which can only be set by the
superuser, is added to allow other users to be able to bypass row
security using row_security = OFF.

Many thanks to the various individuals who have helped with the
design, particularly Robert Haas for his feedback.

Authors include Craig Ringer, KaiGai Kohei, Adam Brightwell, Dean
Rasheed, with additional changes and rework by me.

Reviewers have included all of the above, Greg Smith,
Jeff McCormick, and Robert Haas.
2014-09-19 11:18:35 -04:00
Andres Freund 728f152e07 Add rmgr callback to name xlog record types for display purposes.
This is primarily useful for the upcoming pg_xlogdump --stats feature,
but also allows to remove some duplicated code in the rmgr_desc
routines.

Due to the separation and harmonization, the output of dipsplayed
records changes somewhat. But since this isn't enduser oriented
content that's ok.

It's potentially desirable to further change pg_xlogdump's display of
records. It previously wasn't possible to show the record type
separately from the description forcing it to be in the last
column. But that's better done in a separate commit.

Author: Abhijit Menon-Sen, slightly editorialized by me
Reviewed-By: Álvaro Herrera, Andres Freund, and Heikki Linnakangas
Discussion: 20140604104716.GA3989@toroid.org
2014-09-19 16:20:29 +02:00
Tom Lane fe550b2ac2 Invent PGC_SU_BACKEND and mark log_connections/log_disconnections that way.
This new GUC context option allows GUC parameters to have the combined
properties of PGC_BACKEND and PGC_SUSET, ie, they don't change after
session start and non-superusers can't change them.  This is a more
appropriate choice for log_connections and log_disconnections than their
previous context of PGC_BACKEND, because we don't want non-superusers
to be able to affect whether their sessions get logged.

Note: the behavior for log_connections is still a bit odd, in that when
a superuser attempts to set it from PGOPTIONS, the setting takes effect
but it's too late to enable or suppress connection startup logging.
It's debatable whether that's worth fixing, and in any case there is
a reasonable argument for PGC_SU_BACKEND to exist.

In passing, re-pgindent the files touched by this commit.

Fujii Masao, reviewed by Joe Conway and Amit Kapila
2014-09-13 21:01:57 -04:00
Stephen Frost 95d737ff45 Add 'ignore_nulls' option to row_to_json
Provide an option to skip NULL values in a row when generating a JSON
object from that row with row_to_json.  This can reduce the size of the
JSON object in cases where columns are NULL without really reducing the
information in the JSON object.

This also makes row_to_json into a single function with default values,
rather than having multiple functions.  In passing, change array_to_json
to also be a single function with default values (we don't add an
'ignore_nulls' option yet- it's not clear that there is a sensible
use-case there, and it hasn't been asked for in any case).

Pavel Stehule
2014-09-11 21:23:51 -04:00
Bruce Momjian 36ad1a87a3 Implement mxid_age() to compute multi-xid age
Report by Josh Berkus
2014-09-10 17:13:04 -04:00
Tom Lane e80252d424 Add width_bucket(anyelement, anyarray).
This provides a convenient method of classifying input values into buckets
that are not necessarily equal-width.  It works on any sortable data type.

The choice of function name is a bit debatable, perhaps, but showing that
there's a relationship to the SQL standard's width_bucket() function seems
more attractive than the other proposals.

Petr Jelinek, reviewed by Pavel Stehule
2014-09-09 15:34:14 -04:00
Robert Haas d8d4965dc2 Update comment to reflect commit 1d41739e5a.
Peter Geoghegan
2014-09-04 12:17:10 -04:00
Tom Lane 6c40f8316e Add min and max aggregates for inet/cidr data types.
Haribabu Kommi, reviewed by Muhammad Asif Naeem
2014-08-28 22:37:58 -04:00
Heikki Linnakangas c07693f0c7 Remove remnants of a JENTRY_ISFIRST flag bit.
I removed the flag earlier, but missed a few references in jsonb.h.
2014-08-15 09:41:28 +03:00
Robert Haas b34e37bfef Add sortsupport routines for text.
This provides a small but worthwhile speedup when sorting text, at least
in cases to which the sortsupport machinery applies.

Robert Haas and Peter Geoghegan
2014-08-14 12:09:52 -04:00
Robert Haas 1d41739e5a Don't require sort support functions to provide a comparator.
This could be useful for datatypes like text, where we might want
to optimize for some collations but not others.  However, this patch
doesn't introduce any new sortsupport functions that work this way;
it merely revises the code so that future patches may do so.

Patch by me.  Review by Peter Geoghegan.
2014-08-06 16:06:06 -04:00
Alvaro Herrera 346d7be184 Move view reloptions into their own varlena struct
Per discussion after a gripe from me in
http://www.postgresql.org/message-id/20140611194633.GH18688@eldon.alvh.no-ip.org

Jaime Casanova
2014-07-14 17:24:40 -04:00
Robert Haas 9f03ca9151 Avoid copying index tuples when building an index.
The previous code, perhaps out of concern for avoid memory leaks, formed
the tuple in one memory context and then copied it to another memory
context.  However, this doesn't appear to be necessary, since
index_form_tuple and the functions it calls take precautions against
leaking memory.  In my testing, building the tuple directly inside the
sort context shaves several percent off the index build time.
Rearrange things so we do that.

Patch by me.  Review by Amit Kapila, Tom Lane, Andres Freund.
2014-07-01 10:34:42 -04:00
Heikki Linnakangas 1c6821be31 Fix and enhance the assertion of no palloc's in a critical section.
The assertion failed if WAL_DEBUG or LWLOCK_STATS was enabled; fix that by
using separate memory contexts for the allocations made within those code
blocks.

This patch introduces a mechanism for marking any memory context as allowed
in a critical section. Previously ErrorContext was exempt as a special case.

Instead of a blanket exception of the checkpointer process, only exempt the
memory context used for the pending ops hash table.
2014-06-30 10:26:00 +03:00
Andres Freund 51adcaa0df Add cluster_name GUC which is included in process titles if set.
When running several postgres clusters on one OS instance it's often
inconveniently hard to identify which "postgres" process belongs to
which postgres instance.

Add the cluster_name GUC, whose value will be included as part of the
process titles if set. With that processes can more easily identified
using tools like 'ps'.

To avoid problems with encoding mismatches between postgresql.conf,
consoles, and individual databases replace non-ASCII chars in the name
with question marks. The length is limited to NAMEDATALEN to make it
less likely to truncate important information at the end of the
status.

Thomas Munro, with some adjustments by me and review by a host of people.
2014-06-29 14:15:09 +02:00
Fujii Masao 9ba78fb0b9 Don't allow data_directory to be set in postgresql.auto.conf by ALTER SYSTEM.
data_directory could be set both in postgresql.conf and postgresql.auto.conf so far.
This could cause some problematic situations like circular definition. To avoid such
situations, this commit forbids a user to set data_directory in postgresql.auto.conf.

Backpatch this to 9.4 where ALTER SYSTEM command was introduced.

Amit Kapila, reviewed by Abhijit Menon-Sen, with minor adjustments by me.
2014-06-19 20:31:20 +09:00
Heikki Linnakangas 0ef0b6784c Change the signature of rm_desc so that it's passed a XLogRecord.
Just feels more natural, and is more consistent with rm_redo.
2014-06-14 10:46:48 +03:00
Tom Lane 4c8ab1b91d Add btree and hash opclasses for pg_lsn.
This is needed to allow ORDER BY, DISTINCT, etc to work as expected for
pg_lsn values.

We had previously decided to put this off for 9.5, but in view of commit
eeca4cd35e there's no reason to avoid a
catversion bump for 9.4beta2, and this does make a pretty significant
usability difference for pg_lsn.

Michael Paquier, with fixes from Andres Freund and Tom Lane
2014-06-04 20:45:56 -04:00
Fujii Masao 06db9cce22 Fix typo in comment.
Erik Rijkers
2014-05-22 16:31:55 +09:00
Tom Lane b23b0f5588 Code review for recent changes in relcache.c.
rd_replidindex should be managed the same as rd_oidindex, and rd_keyattr
and rd_idattr should be managed like rd_indexattr.  Omissions in this area
meant that the bitmapsets computed for rd_keyattr and rd_idattr would be
leaked during any relcache flush, resulting in a slow but permanent leak in
CacheMemoryContext.  There was also a tiny probability of relcache entry
corruption if we ran out of memory at just the wrong point in
RelationGetIndexAttrBitmap.  Otherwise, the fields were not zeroed where
expected, which would not bother the code any AFAICS but could greatly
confuse anyone examining the relcache entry while debugging.

Also, create an API function RelationGetReplicaIndex rather than letting
non-relcache code be intimate with the mechanisms underlying caching of
that value (we won't even mention the memory leak there).

Also, fix a relcache flush hazard identified by Andres Freund:
RelationGetIndexAttrBitmap must not assume that rd_replidindex stays valid
across index_open.

The aspects of this involving rd_keyattr date back to 9.3, so back-patch
those changes.
2014-05-14 14:56:08 -04:00
Tom Lane 12e611d43e Rename jsonb_hash_ops to jsonb_path_ops.
There's no longer much pressure to switch the default GIN opclass for
jsonb, but there was still some unhappiness with the name "jsonb_hash_ops",
since hashing is no longer a distinguishing property of that opclass,
and anyway it seems like a relatively minor detail.  At the suggestion of
Heikki Linnakangas, we'll use "jsonb_path_ops" instead; that captures the
important characteristic that each index entry depends on the entire path
from the document root to the indexed value.

Also add a user-facing explanation of the implementation properties of
these two opclasses.
2014-05-11 12:06:04 -04:00
Heikki Linnakangas d9daff0e0c More jsonb cleanup.
Fix JSONB_MAX_ELEMS and JSONB_MAX_PAIRS macros to use CB_MASK in the
calculation. JENTRY_POSMASK happens to have the same value at the moment,
but that's just coincidental.

Refactor jsonb iterator functions, for readability.

Get rid of the JENTRY_ISFIRST flag. Whenever we handle JEntrys, we have
access to the whole array and have enough context information to know
which entry is the first. This frees up one bit in the JEntry header for
future use. While we're at it, shuffle the JEntry bits so that boolean
true and false go together, for aesthetic reasons.

Bump catalog version as this changes the on-disk format slightly.
2014-05-09 15:55:56 +03:00
Tom Lane 46dddf7673 Improve key representation for GIN jsonb_ops, and fix existence-search bug.
Change the key representation so that values that would exceed 127 bytes
are hashed into short strings, and so that the original JSON datatype of
each value is recorded in the index.  The hashing rule eliminates the major
objection to having this opclass be the default for jsonb, namely that it
could fail for plausible input data (due to GIN's restrictions on maximum
key length).  Preserving datatype information doesn't really buy us much
right now, but it requires no extra space compared to the previous way,
and it might be useful later.

Also, change the consistency-checking functions to request recheck for
exists (jsonb ? text) and related operators.  The original analysis that
this is an exactly checkable query was incorrect, since the index does
not preserve information about whether a key appears at top level in
the indexed JSON object.  Add a test case demonstrating the problem.

Make some other, mostly cosmetic improvements to the code in jsonb_gin.c
as well.

catversion bump due to on-disk data format change in jsonb_ops indexes.
2014-05-09 08:41:26 -04:00
Heikki Linnakangas d3c72e23df Avoid some pnstrdup()s when constructing jsonb
This speeds up text to jsonb parsing and hstore to jsonb conversions
somewhat.
2014-05-09 12:46:21 +03:00
Tom Lane a16d421ca4 Revert "Auto-tune effective_cache size to be 4x shared buffers"
This reverts commit ee1e5662d8, as well as
a remarkably large number of followup commits, which were mostly concerned
with the fact that the implementation didn't work terribly well.  It still
doesn't: we probably need some rather basic work in the GUC infrastructure
if we want to fully support GUCs whose default varies depending on the
value of another GUC.  Meanwhile, it also emerged that there wasn't really
consensus in favor of the definition the patch tried to implement (ie,
effective_cache_size should default to 4 times shared_buffers).  So whack
it all back to where it was.  In a followup commit, I'll do what was
recently agreed to, which is to simply change the default to a higher
value.
2014-05-08 20:49:38 -04:00
Heikki Linnakangas 364ddc3e5c Clean up jsonb code.
The main target of this cleanup is the convertJsonb() function, but I also
touched a lot of other things that I spotted into in the process.

The new convertToJsonb() function uses an output buffer that's resized on
demand, so the code to estimate of the size of JsonbValue is removed.

The on-disk format was not changed, even though I refactored the structs
used to handle it. The term "superheader" is replaced with "container".

The jsonb_exists_any and jsonb_exists_all functions no longer sort the input
array. That was a premature optimization, the idea being that if there are
duplicates in the input array, you only need to check them once. Also,
sorting the array saves some effort in the binary search used to find a key
within an object. But there were drawbacks too: the sorting and
deduplicating obviously isn't free, and in the typical case there are no
duplicates to remove, and the gain in the binary search was minimal. Remove
all that, which makes the code simpler too.

This includes a bug-fix; the total length of the elements in a jsonb array
or object mustn't exceed 2^28. That is now checked.
2014-05-07 23:16:19 +03:00
Bruce Momjian 0a78320057 pgindent run for 9.4
This includes removing tabs after periods in C comments, which was
applied to back branches, so this change should not effect backpatching.
2014-05-06 12:12:18 -04:00
Tom Lane a9baeb361d Can't completely get rid of #ifndef FRONTEND in palloc.h :-(
pg_controldata includes postgres.h not postgres_fe.h, so utils/palloc.h
must be able to compile in a "#define FRONTEND" context.  It appears that
Solaris Studio is smart enough to persuade us to define PG_USE_INLINE,
but not smart enough to not make a copy of unreferenced static functions;
which leads to an unsatisfied reference to CurrentMemoryContext.  So we
need an #ifndef FRONTEND around that declaration.  Per buildfarm.
2014-04-27 21:24:19 -04:00
Tom Lane 528c454b2a Don't #include utils/palloc.h in common/fe_memutils.h.
This breaks the principle that common/ ought not depend on anything in the
server, not only code-wise but in the headers.  The only arguable advantage
is avoidance of duplication of half a dozen extern declarations, and even
that is rather dubious, considering that the previous coding was wrong
about which declarations to duplicate: it exposed pnstrdup() to frontend
code even though no such function is provided in fe_memutils.c.

On the same principle, don't #include utils/memutils.h in the frontend
build of psprintf.c.  This requires duplicating the definition of
MaxAllocSize, but that seems fine to me: there's no a-priori reason why
frontend code should use the same size limit as the backend anyway.

In passing, clean up some rather odd layout and ordering choices that
were imposed on palloc.h to reduce the number of #ifdefs required by
the previous approach.

Per gripe from Christoph Berg.  There's still more work to do to make
include/common/ clean, but this part seems reasonably noncontroversial.
2014-04-26 14:14:28 -04:00
Robert Haas dfc0219f64 Add to_regprocedure() and to_regoperator().
These are natural complements to the functions added by commit
0886fc6a5c, but they weren't included
in the original patch for some reason.  Add them.

Patch by me, per a complaint by Tom Lane.  Review by Tatsuo
Ishii.
2014-04-16 12:21:43 -04:00
Tom Lane e0c91a7ff0 Improve some O(N^2) behavior in window function evaluation.
Repositioning the tuplestore seek pointer in window_gettupleslot() turns
out to be a very significant expense when the window frame is sizable and
the frame end can move.  To fix, introduce a tuplestore function for
skipping an arbitrary number of tuples in one call, parallel to the one we
introduced for tuplesort objects in commit 8d65da1f.  This reduces the cost
of window_gettupleslot() to O(1) if the tuplestore has not spilled to disk.
As in the previous commit, I didn't try to do any real optimization of
tuplestore_skiptuples for the case where the tuplestore has spilled to
disk.  There is probably no practical way to get the cost to less than O(N)
anyway, but perhaps someone can think of something later.

Also fix PersistHoldablePortal() to make use of this API now that we have
it.

Based on a suggestion by Dean Rasheed, though this turns out not to look
much like his patch.
2014-04-13 13:59:17 -04:00
Tom Lane d95425c8b9 Provide moving-aggregate support for boolean aggregates.
David Rowley and Florian Pflug, reviewed by Dean Rasheed
2014-04-13 00:01:46 -04:00
Tom Lane 9d229f399e Provide moving-aggregate support for a bunch of numerical aggregates.
First installment of the promised moving-aggregate support in built-in
aggregates: count(), sum(), avg(), stddev() and variance() for
assorted datatypes, though not for float4/float8.

In passing, remove a 2001-vintage kluge in interval_accum(): interval
array elements have been properly aligned since around 2003, but
nobody remembered to take out this workaround.  Also, fix a thinko
in the opr_sanity tests for moving-aggregate catalog entries.

David Rowley and Florian Pflug, reviewed by Dean Rasheed
2014-04-12 20:33:09 -04:00
Tom Lane f23a5630eb Add an in-core GiST index opclass for inet/cidr types.
This operator class can accelerate subnet/supernet tests as well as
btree-equivalent ordered comparisons.  It also handles a new network
operator inet && inet (overlaps, a/k/a "is supernet or subnet of"),
which is expected to be useful in exclusion constraints.

Ideally this opclass would be the default for GiST with inet/cidr data,
but we can't mark it that way until we figure out how to do a more or
less graceful transition from the current situation, in which the
really-completely-bogus inet/cidr opclasses in contrib/btree_gist are
marked as default.  Having the opclass in core and not default is better
than not having it at all, though.

While at it, add new documentation sections to allow us to officially
document GiST/GIN/SP-GiST opclasses, something there was never a clear
place to do before.  I filled these in with some simple tables listing
the existing opclasses and the operators they support, but there's
certainly scope to put more information there.

Emre Hasegeli, reviewed by Andreas Karlsson, further hacking by me
2014-04-08 15:46:43 -04:00
Robert Haas 0886fc6a5c Add new to_reg* functions for error-free OID lookups.
These functions won't throw an error if the object doesn't exist,
or if (for functions and operators) there's more than one matching
object.

Yugo Nagata and Nozomi Anzai, reviewed by Amit Khandekar, Marti
Raudsepp, Amit Kapila, and me.
2014-04-08 10:27:56 -04:00
Robert Haas 59202fae04 Fix some compiler warnings that clang emits with -pedantic.
Andres Freund
2014-04-04 11:29:50 -04:00
Tom Lane f33a71a786 De-anonymize the union in JsonbValue.
Needed for strict C89 compliance.
2014-04-02 14:30:08 -04:00
Andrew Dunstan f9c6d72cbf Cleanup around json_to_record/json_to_recordset
Set function parameter names and defaults. Add jsonb versions (which the
code already provided for so the actual new code is trivial). Add jsonb
regression tests and docs.

Bump catalog version (which I apparently forgot to do when jsonb was
committed).
2014-03-26 10:18:24 -04:00
Andrew Dunstan d9134d0a35 Introduce jsonb, a structured format for storing json.
The new format accepts exactly the same data as the json type. However, it is
stored in a format that does not require reparsing the orgiginal text in order
to process it, making it much more suitable for indexing and other operations.
Insignificant whitespace is discarded, and the order of object keys is not
preserved. Neither are duplicate object keys kept - the later value for a given
key is the only one stored.

The new type has all the functions and operators that the json type has,
with the exception of the json generation functions (to_json, json_agg etc.)
and with identical semantics. In addition, there are operator classes for
hash and btree indexing, and two classes for GIN indexing, that have no
equivalent in the json type.

This feature grew out of previous work by Oleg Bartunov and Teodor Sigaev, which
was intended to provide similar facilities to a nested hstore type, but which
in the end proved to have some significant compatibility issues.

Authors: Oleg Bartunov,  Teodor Sigaev, Peter Geoghegan and Andrew Dunstan.
Review: Andres Freund
2014-03-23 16:40:19 -04:00
Fujii Masao 588fb50715 Show PIDs of lock holders and waiters in log_lock_waits log message.
Christian Kruse, reviewed by Kumar Rajeev Rastogi.
2014-03-13 03:26:47 +09:00
Heikki Linnakangas c5608ea26a Allow opclasses to provide tri-valued GIN consistent functions.
With the GIN "fast scan" feature, GIN can skip items without fetching all
the keys for them, if it can prove that they don't match regardless of
those keys. So far, it has done the proving by calling the boolean
consistent function with all combinations of TRUE/FALSE for the unfetched
keys, but since that's O(n^2), it becomes unfeasible with more than a few
keys. We can avoid calling consistent with all the combinations, if we can
tell the operator class implementation directly which keys are unknown.

This commit includes a triConsistent function for the built-in array and
tsvector opclasses.

Alexander Korotkov, with some changes by me.
2014-03-12 17:51:30 +02:00
Alvaro Herrera 84df54b22e Constructors for interval, timestamp, timestamptz
Author: Pavel Stěhule, editorialized somewhat by Álvaro Herrera
Reviewed-by: Tomáš Vondra, Marko Tiikkaja
With input from Fabrízio de Royes Mello, Jim Nasby
2014-03-04 15:09:43 -03:00
Robert Haas 7e8db2dc42 Minor corrections to logical decoding patch. 2014-03-04 11:07:54 -05:00
Robert Haas b89e151054 Introduce logical decoding.
This feature, building on previous commits, allows the write-ahead log
stream to be decoded into a series of logical changes; that is,
inserts, updates, and deletes and the transactions which contain them.
It is capable of handling decoding even across changes to the schema
of the effected tables.  The output format is controlled by a
so-called "output plugin"; an example is included.  To make use of
this in a real replication system, the output plugin will need to be
modified to produce output in the format appropriate to that system,
and to perform filtering.

Currently, information can be extracted from the logical decoding
system only via SQL; future commits will add the ability to stream
changes via walsender.

Andres Freund, with review and other contributions from many other
people, including Álvaro Herrera, Abhijit Menon-Sen, Peter Gheogegan,
Kevin Grittner, Robert Haas, Heikki Linnakangas, Fujii Masao, Abhijit
Menon-Sen, Michael Paquier, Simon Riggs, Craig Ringer, and Steve
Singer.
2014-03-03 16:32:18 -05:00
Robert Haas 694e3d139a Further code review for pg_lsn data type.
Change input function error messages to be more consistent with what is
done elsewhere.  Remove a bunch of redundant type casts, so that the
compiler will warn us if we screw up.  Don't pass LSNs by value on
platforms where a Datum is only 32 bytes, per buildfarm.  Move macros
for packing and unpacking LSNs to pg_lsn.h so that we can include
access/xlogdefs.h, to avoid an unsatisfied dependency on XLogRecPtr.
2014-02-19 10:06:59 -05:00
Robert Haas 7d03a83f4d Add a pg_lsn data type, to represent an LSN.
Robert Haas and Michael Paquier
2014-02-19 08:35:23 -05:00
Noah Misch 31400a6733 Predict integer overflow to avoid buffer overruns.
Several functions, mostly type input functions, calculated an allocation
size such that the calculation wrapped to a small positive value when
arguments implied a sufficiently-large requirement.  Writes past the end
of the inadvertent small allocation followed shortly thereafter.
Coverity identified the path_in() vulnerability; code inspection led to
the rest.  In passing, add check_stack_depth() to prevent stack overflow
in related functions.

Back-patch to 8.4 (all supported versions).  The non-comment hstore
changes touch code that did not exist in 8.4, so that part stops at 9.0.

Noah Misch and Heikki Linnakangas, reviewed by Tom Lane.

Security: CVE-2014-0064
2014-02-17 09:33:31 -05:00
Noah Misch 4318daecc9 Fix handling of wide datetime input/output.
Many server functions use the MAXDATELEN constant to size a buffer for
parsing or displaying a datetime value.  It was much too small for the
longest possible interval output and slightly too small for certain
valid timestamp input, particularly input with a long timezone name.
The long input was rejected needlessly; the long output caused
interval_out() to overrun its buffer.  ECPG's pgtypes library has a copy
of the vulnerable functions, which bore the same vulnerabilities along
with some of its own.  In contrast to the server, certain long inputs
caused stack overflow rather than failing cleanly.  Back-patch to 8.4
(all supported versions).

Reported by Daniel Schüssler, reviewed by Tom Lane.

Security: CVE-2014-0063
2014-02-17 09:33:31 -05:00
Alvaro Herrera 801c2dc72c Separate multixact freezing parameters from xid's
Previously we were piggybacking on transaction ID parameters to freeze
multixacts; but since there isn't necessarily any relationship between
rates of Xid and multixact consumption, this turns out not to be a good
idea.

Therefore, we now have multixact-specific freezing parameters:

vacuum_multixact_freeze_min_age: when to remove multis as we come across
them in vacuum (default to 5 million, i.e. early in comparison to Xid's
default of 50 million)

vacuum_multixact_freeze_table_age: when to force whole-table scans
instead of scanning only the pages marked as not all visible in
visibility map (default to 150 million, same as for Xids).  Whichever of
both which reaches the 150 million mark earlier will cause a whole-table
scan.

autovacuum_multixact_freeze_max_age: when for cause emergency,
uninterruptible whole-table scans (default to 400 million, double as
that for Xids).  This means there shouldn't be more frequent emergency
vacuuming than previously, unless multixacts are being used very
rapidly.

Backpatch to 9.3 where multixacts were made to persist enough to require
freezing.  To avoid an ABI break in 9.3, VacuumStmt has a couple of
fields in an unnatural place, and StdRdOptions is split in two so that
the newly added fields can go at the end.

Patch by me, reviewed by Robert Haas, with additional input from Andres
Freund and Tom Lane.
2014-02-13 19:36:31 -03:00
Andrew Dunstan 5264d91541 Add json_array_elements_text function.
This was a notable omission from the json functions added in 9.3 and
there have been numerous complaints about its absence.

Laurence Rowe.
2014-01-29 15:39:01 -05:00
Andrew Dunstan 105639900b New json functions.
json_build_array() and json_build_object allow for the construction of
arbitrarily complex json trees. json_object() turns a one or two
dimensional array, or two separate arrays, into a json_object of
name/value pairs, similarly to the hstore() function.
json_object_agg() aggregates its two arguments into a single json object
as name value pairs.

Catalog version bumped.

Andrew Dunstan, reviewed by Marko Tiikkaja.
2014-01-28 17:48:21 -05:00
Tom Lane 2850896961 Code review for auto-tuned effective_cache_size.
Fix integer overflow issue noted by Magnus Hagander, as well as a bunch
of other infelicities in commit ee1e5662d8
and its unreasonably large number of followups.
2014-01-27 00:05:56 -05:00
Robert Haas 01f7808b3e Add a cardinality function for arrays.
Unlike our other array functions, this considers the total number of
elements across all dimensions, and returns 0 rather than NULL when the
array has no elements.  But it seems that both of those behaviors are
almost universally disliked, so hopefully that's OK.

Marko Tiikkaja, reviewed by Dean Rasheed and Pavel Stehule
2014-01-21 12:38:53 -05:00
Tom Lane 0d79c0a8cc Make various variables const (read-only).
These changes should generally improve correctness/maintainability.
A nice side benefit is that several kilobytes move from initialized
data to text segment, allowing them to be shared across processes and
probably reducing copy-on-write overhead while forking a new backend.
Unfortunately this doesn't seem to help libpq in the same way (at least
not when it's compiled with -fpic on x86_64), but we can hope the linker
at least collects all nominally-const data together even if it's not
actually part of the text segment.

Also, make pg_encname_tbl[] static in encnames.c, since there seems
no very good reason for any other code to use it; per a suggestion
from Wim Lewis, who independently submitted a patch that was mostly
a subset of this one.

Oskari Saarenmaa, with some editorialization by me
2014-01-18 16:04:32 -05:00
Bruce Momjian 7e04792a1c Update copyright for 2014
Update all files in head, and files COPYRIGHT and legal.sgml in all back
branches.
2014-01-07 16:05:30 -05:00
Tom Lane 8d65da1f01 Support ordered-set (WITHIN GROUP) aggregates.
This patch introduces generic support for ordered-set and hypothetical-set
aggregate functions, as well as implementations of the instances defined in
SQL:2008 (percentile_cont(), percentile_disc(), rank(), dense_rank(),
percent_rank(), cume_dist()).  We also added mode() though it is not in the
spec, as well as versions of percentile_cont() and percentile_disc() that
can compute multiple percentile values in one pass over the data.

Unlike the original submission, this patch puts full control of the sorting
process in the hands of the aggregate's support functions.  To allow the
support functions to find out how they're supposed to sort, a new API
function AggGetAggref() is added to nodeAgg.c.  This allows retrieval of
the aggregate call's Aggref node, which may have other uses beyond the
immediate need.  There is also support for ordered-set aggregates to
install cleanup callback functions, so that they can be sure that
infrastructure such as tuplesort objects gets cleaned up.

In passing, make some fixes in the recently-added support for variadic
aggregates, and make some editorial adjustments in the recent FILTER
additions for aggregates.  Also, simplify use of IsBinaryCoercible() by
allowing it to succeed whenever the target type is ANY or ANYELEMENT.
It was inconsistent that it dealt with other polymorphic target types
but not these.

Atri Sharma and Andrew Gierth; reviewed by Pavel Stehule and Vik Fearing,
and rather heavily editorialized upon by Tom Lane
2013-12-23 16:11:35 -05:00
Tatsuo Ishii 65d6e4cb5c Add ALTER SYSTEM command to edit the server configuration file.
Patch contributed by Amit Kapila. Reviewed by Hari Babu, Masao Fujii,
Boszormenyi Zoltan, Andres Freund, Greg Smith and others.
2013-12-18 23:42:44 +09:00
Robert Haas 66abc2608c Add a new reloption, user_catalog_table.
When this reloption is set and wal_level=logical is configured,
we'll record the CIDs stamped by inserts, updates, and deletes to
the table just as we would for an actual catalog table.  This will
allow logical decoding to use historical MVCC snapshots to access
such tables just as they access ordinary catalog tables.

Replication solutions built around the logical decoding machinery
will likely need to set this operation for their configuration
tables; it might also be needed by extensions which perform table
access in their output functions.

Andres Freund, reviewed by myself and others.
2013-12-10 19:17:34 -05:00
Robert Haas e55704d8b2 Add new wal_level, logical, sufficient for logical decoding.
When wal_level=logical, we'll log columns from the old tuple as
configured by the REPLICA IDENTITY facility added in commit
07cacba983.  This makes it possible
a properly-configured logical replication solution to correctly
follow table updates even if they change the chosen key columns,
or, with REPLICA IDENTITY FULL, even if the table has no key at
all.  Note that updates which do not modify the replica identity
column won't log anything extra, making the choice of a good key
(i.e. one that will rarely be changed) important to performance
when wal_level=logical is configured.

Each insert, update, or delete to a catalog table will also log
the CMIN and/or CMAX values of stamped by the current transaction.
This is necessary because logical decoding will require access to
historical snapshots of the catalog in order to decode some data
types, and the CMIN/CMAX values that we may need in order to judge
row visibility may have been overwritten by the time we need them.

Andres Freund, reviewed in various versions by myself, Heikki
Linnakangas, KONDO Mitsumasa, and many others.
2013-12-10 19:01:40 -05:00
Tom Lane 16e1b7a1b7 Fix assorted race conditions in the new timeout infrastructure.
Prevent handle_sig_alarm from losing control partway through due to a query
cancel (either an asynchronous SIGINT, or a cancel triggered by one of the
timeout handler functions).  That would at least result in failure to
schedule any required future interrupt, and might result in actual
corruption of timeout.c's data structures, if the interrupt happened while
we were updating those.

We could still lose control if an asynchronous SIGINT arrives just as the
function is entered.  This wouldn't break any data structures, but it would
have the same effect as if the SIGALRM interrupt had been silently lost:
we'd not fire any currently-due handlers, nor schedule any new interrupt.
To forestall that scenario, forcibly reschedule any pending timer interrupt
during AbortTransaction and AbortSubTransaction.  We can avoid any extra
kernel call in most cases by not doing that until we've allowed
LockErrorCleanup to kill the DEADLOCK_TIMEOUT and LOCK_TIMEOUT events.

Another hazard is that some platforms (at least Linux and *BSD) block a
signal before calling its handler and then unblock it on return.  When we
longjmp out of the handler, the unblock doesn't happen, and the signal is
left blocked indefinitely.  Again, we can fix that by forcibly unblocking
signals during AbortTransaction and AbortSubTransaction.

These latter two problems do not manifest when the longjmp reaches
postgres.c, because the error recovery code there kills all pending timeout
events anyway, and it uses sigsetjmp(..., 1) so that the appropriate signal
mask is restored.  So errors thrown outside any transaction should be OK
already, and cleaning up in AbortTransaction and AbortSubTransaction should
be enough to fix these issues.  (We're assuming that any code that catches
a query cancel error and doesn't re-throw it will do at least a
subtransaction abort to clean up; but that was pretty much required already
by other subsystems.)

Lastly, ProcSleep should not clear the LOCK_TIMEOUT indicator flag when
disabling that event: if a lock timeout interrupt happened after the lock
was granted, the ensuing query cancel is still going to happen at the next
CHECK_FOR_INTERRUPTS, and we want to report it as a lock timeout not a user
cancel.

Per reports from Dan Wood.

Back-patch to 9.3 where the new timeout handling infrastructure was
introduced.  We may at some point decide to back-patch the signal
unblocking changes further, but I'll desist from that until we hear
actual field complaints about it.
2013-11-29 16:41:00 -05:00
Peter Eisentraut 85ed91ee7d Implement information_schema.parameters.parameter_default column
Reviewed-by: Ali Dar <ali.munir.dar@gmail.com>
Reviewed-by: Amit Khandekar <amit.khandekar@enterprisedb.com>
Reviewed-by: Rodolfo Campero <rodolfo.campero@anachronics.com>
2013-11-26 23:21:35 -05:00
Tom Lane f901bb50e3 Add make_date() and make_time() functions.
Pavel Stehule, reviewed by Jeevan Chalke and Atri Sharma
2013-11-17 15:06:50 -05:00
Tom Lane 69c8fbac20 Improve performance of numeric sum(), avg(), stddev(), variance(), etc.
This patch improves performance of most built-in aggregates that formerly
used a NUMERIC or NUMERIC array as their transition type; this includes
not only aggregates on numeric inputs, but some aggregates on integer
inputs where overflow of an int8 value is a possibility.  The code now
uses a special-purpose data structure to avoid array construction and
deconstruction overhead, as well as packing and unpacking overhead for
numeric values.

These aggregates' transition type is now declared as INTERNAL, since
it doesn't correspond to any SQL data type.  To keep the planner from
thinking that that means a lot of storage will be used, we make use
of the just-added pg_aggregate.aggtransspace feature.  The space estimate
is set to 128 bytes, which is at least in the right ballpark.

Hadi Moshayedi, reviewed by Pavel Stehule and Tomas Vondra
2013-11-16 18:46:34 -05:00
Robert Haas 07cacba983 Add the notion of REPLICA IDENTITY for a table.
Pending patches for logical replication will use this to determine
which columns of a tuple ought to be considered as its candidate key.

Andres Freund, with minor, mostly cosmetic adjustments by me
2013-11-08 12:30:43 -05:00
Tom Lane 3147acd63e Use improved vsnprintf calling logic in more places.
When we are using a C99-compliant vsnprintf implementation (which should be
most places, these days) it is worth the trouble to make use of its report
of how large the buffer needs to be to succeed.  This patch adjusts
stringinfo.c and some miscellaneous usages in pg_dump to do that, relying
on the logic recently added in libpgcommon's psprintf.c.  Since these
places want to know the number of bytes written once we succeed, modify the
API of pvsnprintf() to report that.

There remains near-duplicate logic in pqexpbuffer.c, but since that code
is in libpq, psprintf.c's approach of exit()-on-error isn't appropriate
for use there.  Also note that I didn't bother touching the multitude
of places that call (v)snprintf without any attempt to provide a resizable
buffer.

Release-note-worthy incompatibility: the API of appendStringInfoVA()
changed.  If there's any third-party code that's calling that directly,
it will need tweaking along the same lines as in this patch.

David Rowley and Tom Lane
2013-10-24 21:43:57 -04:00
Tom Lane 09a89cb5fc Get rid of use of asprintf() in favor of a more portable implementation.
asprintf(), aside from not being particularly portable, has a fundamentally
badly-designed API; the psprintf() function that was added in passing in
the previous patch has a much better API choice.  Moreover, the NetBSD
implementation that was borrowed for the previous patch doesn't work with
non-C99-compliant vsnprintf, which is something we still have to cope with
on some platforms; and it depends on va_copy which isn't all that portable
either.  Get rid of that code in favor of an implementation similar to what
we've used for many years in stringinfo.c.  Also, move it into libpgcommon
since it's not really libpgport material.

I think this patch will be enough to turn the buildfarm green again, but
there's still cosmetic work left to do, namely get rid of pg_asprintf()
in favor of using psprintf().  That will come in a followon patch.
2013-10-22 18:42:13 -04:00
Peter Eisentraut 5b6d08cd29 Add use of asprintf()
Add asprintf(), pg_asprintf(), and psprintf() to simplify string
allocation and composition.  Replacement implementations taken from
NetBSD.

Reviewed-by: Álvaro Herrera <alvherre@2ndquadrant.com>
Reviewed-by: Asif Naeem <anaeem.it@gmail.com>
2013-10-13 00:09:18 -04:00
Andrew Dunstan 4d212bac17 json_typeof function.
Andrew Tipton.
2013-10-10 12:21:59 -04:00
Peter Eisentraut 261c7d4b65 Revive line type
Change the input/output format to {A,B,C}, to match the internal
representation.

Complete the implementations of line_in, line_out, line_recv, line_send.
Remove comments and error messages about the line type not being
implemented.  Add regression tests for existing line operators and
functions.

Reviewed-by: rui hua <365507506hua@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@2ndquadrant.com>
Reviewed-by: Jeevan Chalke <jeevan.chalke@enterprisedb.com>
2013-10-09 22:34:38 -04:00
Robert Haas 0ac5e5a7e1 Allow dynamic allocation of shared memory segments.
Patch by myself and Amit Kapila.  Design help from Noah Misch.  Review
by Andres Freund.
2013-10-09 21:05:02 -04:00
Kevin Grittner f566515192 Add record_image_ops opclass for matview concurrent refresh.
REFRESH MATERIALIZED VIEW CONCURRENTLY was broken for any matview
containing a column of a type without a default btree operator
class.  It also did not produce results consistent with a non-
concurrent REFRESH or a normal view if any column was of a type
which allowed user-visible differences between values which
compared as equal according to the type's default btree opclass.
Concurrent matview refresh was modified to use the new operators
to solve these problems.

Documentation was added for record comparison, both for the
default btree operator class for record, and the newly added
operators.  Regression tests now check for proper behavior both
for a matview with a box column and a matview containing a citext
column.

Reviewed by Steve Singer, who suggested some of the doc language.
2013-10-09 14:26:09 -05:00
Bruce Momjian ee1e5662d8 Auto-tune effective_cache size to be 4x shared buffers 2013-10-08 12:12:24 -04:00
Bruce Momjian a54141aebc Issue error on SET outside transaction block in some cases
Issue error for SET LOCAL/CONSTRAINTS/TRANSACTION outside a transaction
block, as they have no effect.

Per suggestion from Morten Hustveit
2013-10-04 13:50:28 -04:00
Fujii Masao 71129b6fc5 Remove leftover function prototype.
The prototype for inval_twophase_postcommit wasn't removed when it's definition
was removed in efc16ea520 / the initial HS commit.

Andres Freund
2013-09-11 01:32:24 +09:00
Heikki Linnakangas 20cb18db46 Make catalog cache hash tables resizeable.
If the hash table backing a catalog cache becomes too full (fillfactor > 2),
enlarge it. A new buckets array, double the size of the old, is allocated,
and all entries in the old hash are moved to the right bucket in the new
hash.

This has two benefits. First, cache lookups don't get so expensive when
there are lots of entries in a cache, like if you access hundreds of
thousands of tables. Second, we can make the (initial) sizes of the caches
much smaller, which saves memory.

This patch dials down the initial sizes of the catcaches. The new sizes are
chosen so that a backend that only runs a few basic queries still won't need
to enlarge any of them.
2013-09-05 20:20:03 +03:00
Tom Lane 0c66a22377 Update comments concerning PGC_S_TEST.
This GUC context value was once only used by ALTER DATABASE SET and
ALTER USER SET.  That's not true anymore, though, so rewrite the
comments to be a bit more general.

Patch in HEAD only, since this is just an internal documentation issue.
2013-09-03 18:56:22 -04:00
Tom Lane 3d5282c6f0 Emit a log message if output is about to be redirected away from stderr.
We've seen multiple cases of people looking at the postmaster's original
stderr output to try to diagnose problems, not realizing/remembering that
their logging configuration is set up to send log messages somewhere else.
This seems particularly likely to happen in prepackaged distributions,
since many packagers patch the code to change the factory-standard logging
configuration to something more in line with their platform conventions.

In hopes of reducing confusion, emit a LOG message about this at the point
in startup where we are about to switch log output away from the original
stderr, providing a pointer to where to look instead.  This message will
appear as the last thing in the original stderr output.  (We might later
also try to emit such link messages when logging parameters are changed
on-the-fly; but that case seems to be both noticeably harder to do nicely,
and much less frequently a problem in practice.)

Per discussion, back-patch to 9.3 but not further.
2013-08-13 15:24:52 -04:00
Robert Haas 813fb03155 Remove SnapshotNow and HeapTupleSatisfiesNow.
We now use MVCC catalog scans, and, per discussion, have eliminated
all other remaining uses of SnapshotNow, so that we can now get rid of
it.  This will break third-party code which is still using it, which
is intentional, as we want such code to be updated to do things the
new way.
2013-08-01 10:46:19 -04:00
Stephen Frost ddef1a39c6 Allow a context to be passed in for error handling
As pointed out by Tom Lane, we can allow other users of the error
handler callbacks to provide their own memory context by adding
the context to use to ErrorData and using that instead of explicitly
using ErrorContext.

This then allows GetErrorContextStack() to be called from inside
exception handlers, so modify plpgsql to take advantage of that and
add an associated regression test for it.
2013-08-01 01:07:20 -04:00
Stephen Frost 8312832567 Add GET DIAGNOSTICS ... PG_CONTEXT in PL/PgSQL
This adds the ability to get the call stack as a string from within a
PL/PgSQL function, which can be handy for logging to a table, or to
include in a useful message to an end-user.

Pavel Stehule, reviewed by Rushabh Lathia and rather heavily whacked
around by Stephen Frost.
2013-07-24 18:53:27 -04:00
Robert Haas 0518eceec3 Adjust HeapTupleSatisfies* routines to take a HeapTuple.
Previously, these functions took a HeapTupleHeader, but upcoming
patches for logical replication will introduce new a new snapshot
type under which the tuple's TID will be used to lookup (CMIN, CMAX)
for visibility determination purposes.  This makes that information
available.  Code churn is minimal since HeapTupleSatisfiesVisibility
took the HeapTuple anyway, and deferenced it before calling the
satisfies function.

Independently of logical replication, this allows t_tableOid and
t_self to be cross-checked via assertions in tqual.c.  This seems
like a useful way to make sure that all callers are setting these
values properly, which has been previously put forward as
desirable.

Andres Freund, reviewed by Álvaro Herrera
2013-07-22 13:38:44 -04:00
Robert Haas f01d1ae3a1 Add infrastructure for mapping relfilenodes to relation OIDs.
Future patches are expected to introduce logical replication that
works by decoding WAL.  WAL contains relfilenodes rather than relation
OIDs, so this infrastructure will be needed to find the relation OID
based on WAL contents.

If logical replication does not make it into this release, we probably
should consider reverting this, since it will add some overhead to DDL
operations that create new relations.  One additional index insert per
pg_class row is not a large overhead, but it's more than zero.
Another way of meeting the needs of logical replication would be to
the relation OID to WAL, but that would burden DML operations, not
only DDL.

Andres Freund, with some changes by me.  Design review, in earlier
versions, by Álvaro Herrera.
2013-07-22 11:09:10 -04:00
Peter Eisentraut ff41a5de09 Clean up new JSON API typedefs
The new JSON API uses a bit of an unusual typedef scheme, where for
example OkeysState is a pointer to okeysState.  And that's not applied
consistently either.  Change that to the more usual PostgreSQL style
where struct typedefs are upper case, and use pointers explicitly.
2013-07-20 06:38:31 -04:00
Stephen Frost 4cbe3ac3e8 WITH CHECK OPTION support for auto-updatable VIEWs
For simple views which are automatically updatable, this patch allows
the user to specify what level of checking should be done on records
being inserted or updated.  For 'LOCAL CHECK', new tuples are validated
against the conditionals of the view they are being inserted into, while
for 'CASCADED CHECK' the new tuples are validated against the
conditionals for all views involved (from the top down).

This option is part of the SQL specification.

Dean Rasheed, reviewed by Pavel Stehule
2013-07-18 17:10:16 -04:00
Peter Eisentraut 070518ddab Add session_preload_libraries configuration parameter
This is like shared_preload_libraries except that it takes effect at
backend start and can be changed without a full postmaster restart.  It
is like local_preload_libraries except that it is still only settable by
a superuser.  This can be a better way to load modules such as
auto_explain.

Since there are now three preload parameters, regroup the documentation
a bit.  Put all parameters into one section, explain common
functionality only once, update the descriptions to reflect current and
future realities.

Reviewed-by: Dimitri Fontaine <dimitri@2ndQuadrant.fr>
2013-07-12 21:23:50 -04:00
Magnus Hagander 5a348fe077 Fix include-guard
Looks like a cut/paste error in the original addition of the file.

Andres Freund
2013-07-07 13:36:20 +02:00
Noah Misch 79e0f87a15 Use type "int64" for memory accounting in tuplesort.c/tuplestore.c.
Commit 263865a489 switched tuplesort.c and
tuplestore.c variables representing memory usage from type "long" to
type "Size".  This was unnecessary; I thought doing so avoided overflow
scenarios on 64-bit Windows, but guc.c already limited work_mem so as to
prevent the overflow.  It was also incomplete, not touching the logic
that assumed a signed data type.  Change the affected variables to
"int64".  This is perfect for 64-bit platforms, and it reduces the need
to contemplate platform-specific overflow scenarios.  It also puts us
close to being able to support work_mem over 2 GiB on 64-bit Windows.

Per report from Andres Freund.
2013-07-04 23:13:54 -04:00
Robert Haas 568d4138c6 Use an MVCC snapshot, rather than SnapshotNow, for catalog scans.
SnapshotNow scans have the undesirable property that, in the face of
concurrent updates, the scan can fail to see either the old or the new
versions of the row.  In many cases, we work around this by requiring
DDL operations to hold AccessExclusiveLock on the object being
modified; in some cases, the existing locking is inadequate and random
failures occur as a result.  This commit doesn't change anything
related to locking, but will hopefully pave the way to allowing lock
strength reductions in the future.

The major issue has held us back from making this change in the past
is that taking an MVCC snapshot is significantly more expensive than
using a static special snapshot such as SnapshotNow.  However, testing
of various worst-case scenarios reveals that this problem is not
severe except under fairly extreme workloads.  To mitigate those
problems, we avoid retaking the MVCC snapshot for each new scan;
instead, we take a new snapshot only when invalidation messages have
been processed.  The catcache machinery already requires that
invalidation messages be sent before releasing the related heavyweight
lock; else other backends might rely on locally-cached data rather
than scanning the catalog at all.  Thus, making snapshot reuse
dependent on the same guarantees shouldn't break anything that wasn't
already subtly broken.

Patch by me.  Review by Michael Paquier and Andres Freund.
2013-07-02 09:47:01 -04:00
Noah Misch 263865a489 Permit super-MaxAllocSize allocations with MemoryContextAllocHuge().
The MaxAllocSize guard is convenient for most callers, because it
reduces the need for careful attention to overflow, data type selection,
and the SET_VARSIZE() limit.  A handful of callers are happy to navigate
those hazards in exchange for the ability to allocate a larger chunk.
Introduce MemoryContextAllocHuge() and repalloc_huge().  Use this in
tuplesort.c and tuplestore.c, enabling internal sorts of up to INT_MAX
tuples, a factor-of-48 increase.  In particular, B-tree index builds can
now benefit from much-larger maintenance_work_mem settings.

Reviewed by Stephen Frost, Simon Riggs and Jeff Janes.
2013-06-27 14:53:57 -04:00
Noah Misch 19085116ee Cooperate with the Valgrind instrumentation framework.
Valgrind "client requests" in aset.c and mcxt.c teach Valgrind and its
Memcheck tool about the PostgreSQL allocator.  This makes Valgrind
roughly as sensitive to memory errors involving palloc chunks as it is
to memory errors involving malloc chunks.  Further client requests in
PageAddItem() and printtup() verify that all bits being added to a
buffer page or furnished to an output function are predictably-defined.
Those tests catch failures of C-language functions to fully initialize
the bits of a Datum, which in turn stymie optimizations that rely on
_equalConst().  Define the USE_VALGRIND symbol in pg_config_manual.h to
enable these additions.  An included "suppression file" silences nominal
errors we don't plan to fix.

Reviewed in earlier versions by Peter Geoghegan and Korry Douglas.
2013-06-26 20:22:25 -04:00
Tom Lane dc3eb56383 Improve updatability checking for views and foreign tables.
Extend the FDW API (which we already changed for 9.3) so that an FDW can
report whether specific foreign tables are insertable/updatable/deletable.
The default assumption continues to be that they're updatable if the
relevant executor callback function is supplied by the FDW, but finer
granularity is now possible.  As a test case, add an "updatable" option to
contrib/postgres_fdw.

This patch also fixes the information_schema views, which previously did
not think that foreign tables were ever updatable, and fixes
view_is_auto_updatable() so that a view on a foreign table can be
auto-updatable.

initdb forced due to changes in information_schema views and the functions
they rely on.  This is a bit unfortunate to do post-beta1, but if we don't
change this now then we'll have another API break for FDWs when we do
change it.

Dean Rasheed, somewhat editorialized on by Tom Lane
2013-06-12 17:53:33 -04:00
Bruce Momjian 9af4159fce pgindent run for release 9.3
This is the first run of the Perl-based pgindent script.  Also update
pgindent instructions.
2013-05-29 16:58:43 -04:00
Tom Lane 1d6c72a55b Move materialized views' is-populated status into their pg_class entries.
Previously this state was represented by whether the view's disk file had
zero or nonzero size, which is problematic for numerous reasons, since it's
breaking a fundamental assumption about heap storage.  This was done to
allow unlogged matviews to revert to unpopulated status after a crash
despite our lack of any ability to update catalog entries post-crash.
However, this poses enough risk of future problems that it seems better to
not support unlogged matviews until we can find another way.  Accordingly,
revert that choice as well as a number of existing kluges forced by it
in favor of creating a pg_class.relispopulated flag column.
2013-05-06 13:27:22 -04:00
Peter Eisentraut cc26ea9fe2 Clean up references to SQL92
In most cases, these were just references to the SQL standard in
general.  In a few cases, a contrast was made between SQL92 and later
standards -- those have been kept unchanged.
2013-04-20 11:04:41 -04:00
Heikki Linnakangas 87ae9e7265 Remove some unused and seldom used fields from RelationAmInfo.
This saves some memory from each index relcache entry. At least on a 64-bit
machine, it saves just enough to shrink a typical relcache entry's memory
usage from 2k to 1k. That's nice if you have a lot of backends and a lot of
indexes.
2013-04-16 15:07:58 +03:00
Kevin Grittner 52e6e33ab4 Create a distinction between a populated matview and a scannable one.
The intent was that being populated would, long term, be just one
of the conditions which could affect whether a matview was
scannable; being populated should be necessary but not always
sufficient to scan the relation.  Since only CREATE and REFRESH
currently determine the scannability, names and comments
accidentally conflated these concepts, leading to confusion.

Also add missing locking for the SQL function which allows a
test for scannability, and fix a modularity violatiion.

Per complaints from Tom Lane, although its not clear that these
will satisfy his concerns.  Hopefully this will at least better
frame the discussion.
2013-04-09 13:02:49 -05:00
Andrew Dunstan a570c98d7f Add new JSON processing functions and parser API.
The JSON parser is converted into a recursive descent parser, and
exposed for use by other modules such as extensions. The API provides
hooks for all the significant parser event such as the beginning and end
of objects and arrays, and providing functions to handle these hooks
allows for fairly simple construction of a wide variety of JSON
processing functions. A set of new basic processing functions and
operators is also added, which use this API, including operations to
extract array elements, object fields, get the length of arrays and the
set of keys of a field, deconstruct an object into a set of key/value
pairs, and create records from JSON objects and arrays of objects.

Catalog version bumped.

Andrew Dunstan, with some documentation assistance from Merlin Moncure.
2013-03-29 14:12:13 -04:00
Alvaro Herrera 473ab40c8b Add sql_drop event for event triggers
This event takes place just before ddl_command_end, and is fired if and
only if at least one object has been dropped by the command.  (For
instance, DROP TABLE IF EXISTS of a table that does not in fact exist
will not lead to such a trigger firing).  Commands that drop multiple
objects (such as DROP SCHEMA or DROP OWNED BY) will cause a single event
to fire.  Some firings might be surprising, such as
ALTER TABLE DROP COLUMN.

The trigger is fired after the drop has taken place, because that has
been deemed the safest design, to avoid exposing possibly-inconsistent
internal state (system catalogs as well as current transaction) to the
user function code.  This means that careful tracking of object
identification is required during the object removal phase.

Like other currently existing events, there is support for tag
filtering.

To support the new event, add a new pg_event_trigger_dropped_objects()
set-returning function, which returns a set of rows comprising the
objects affected by the command.  This is to be used within the user
function code, and is mostly modelled after the recently introduced
pg_identify_object() function.

Catalog version bumped due to the new function.

Dimitri Fontaine and Álvaro Herrera
Review by Robert Haas, Tom Lane
2013-03-28 13:05:48 -03:00
Simon Riggs 593c39d156 Revoke bc5334d867 2013-03-28 09:18:02 +00:00
Simon Riggs bc5334d867 Allow external recovery_config_directory
If required, recovery.conf can now be located outside of the data directory.
Server needs read/write permissions on this directory.
2013-03-27 11:45:42 +00:00
Alvaro Herrera f8348ea32e Allow extracting machine-readable object identity
Introduce pg_identify_object(oid,oid,int4), which is similar in spirit
to pg_describe_object but instead produces a row of machine-readable
information to uniquely identify the given object, without resorting to
OIDs or other internal representation.  This is intended to be used in
the event trigger implementation, to report objects being operated on;
but it has usefulness of its own.

Catalog version bumped because of the new function.
2013-03-20 18:19:19 -03:00
Tom Lane d43837d030 Add lock_timeout configuration parameter.
This GUC allows limiting the time spent waiting to acquire any one
heavyweight lock.

In support of this, improve the recently-added timeout infrastructure
to permit efficiently enabling or disabling multiple timeouts at once.
That reduces the performance hit from turning on lock_timeout, though
it's still not zero.

Zoltán Böszörményi, reviewed by Tom Lane,
Stephen Frost, and Hari Babu
2013-03-16 23:22:57 -04:00
Andrew Dunstan 38fb4d978c JSON generation improvements.
This adds the following:

    json_agg(anyrecord) -> json
    to_json(any) -> json
    hstore_to_json(hstore) -> json (also used as a cast)
    hstore_to_json_loose(hstore) -> json

The last provides heuristic treatment of numbers and booleans.

Also, in json generation, if any non-builtin type has a cast to json,
that function is used instead of the type's output function.

Andrew Dunstan, reviewed by Steve Singer.

Catalog version bumped.
2013-03-10 17:35:36 -04:00
Heikki Linnakangas 23f10b6473 SP-GiST support of the range adjacent operator -|-
Alexander Korotkov, reviewed by Jeff Davis.
2013-03-08 15:03:19 +02:00
Tom Lane 1908abc4a3 Arrange to cache FdwRoutine structs in foreign tables' relcache entries.
This saves several catalog lookups per reference.  It's not all that
exciting right now, because we'd managed to minimize the number of places
that need to fetch the data; but the upcoming writable-foreign-tables patch
needs this info in a lot more places.
2013-03-06 23:48:09 -05:00
Tom Lane 80b011ef0a Fix to_char() to use ASCII-only case-folding rules where appropriate.
formatting.c used locale-dependent case folding rules in some code paths
where the result isn't supposed to be locale-dependent, for example
to_char(timestamp, 'DAY').  Since the source data is always just ASCII
in these cases, that usually didn't matter ... but it does matter in
Turkish locales, which have unusual treatment of "i" and "I".  To confuse
matters even more, the misbehavior was only visible in UTF8 encoding,
because in single-byte encodings we used pg_toupper/pg_tolower which
don't have locale-specific behavior for ASCII characters.  Fix by providing
intentionally ASCII-only case-folding functions and using these where
appropriate.  Per bug #7913 from Adnan Dursun.  Back-patch to all active
branches, since it's been like this for a long time.
2013-03-05 13:02:30 -05:00
Kevin Grittner 3bf3ab8c56 Add a materialized view relations.
A materialized view has a rule just like a view and a heap and
other physical properties like a table.  The rule is only used to
populate the table, references in queries refer to the
materialized data.

This is a minimal implementation, but should still be useful in
many cases.  Currently data is only populated "on demand" by the
CREATE MATERIALIZED VIEW and REFRESH MATERIALIZED VIEW statements.
It is expected that future releases will add incremental updates
with various timings, and that a more refined concept of defining
what is "fresh" data will be developed.  At some point it may even
be possible to have queries use a materialized in place of
references to underlying tables, but that requires the other
above-mentioned features to be working first.

Much of the documentation work by Robert Haas.
Review by Noah Misch, Thom Brown, Robert Haas, Marko Tiikkaja
Security review by KaiGai Kohei, with a decision on how best to
implement sepgsql still pending.
2013-03-03 18:23:31 -06:00
Alvaro Herrera 639ed4e84b Add pg_xlogdump contrib program
This program relies on rm_desc backend routines and the xlogreader
infrastructure to emit human-readable rendering of WAL records.

Author: Andres Freund, with many reworks by Álvaro
Reviewed (in a much earlier version) by Peter Eisentraut
2013-02-22 16:56:55 -03:00
Peter Eisentraut 9475db3a4e Add ALTER ROLE ALL SET command
This generalizes the existing ALTER ROLE ... SET and ALTER DATABASE
... SET functionality to allow creating settings that apply to all users
in all databases.

reviewed by Pavel Stehule
2013-02-17 23:45:36 -05:00
Alvaro Herrera 8396447cdb Create libpgcommon, and move pg_malloc et al to it
libpgcommon is a new static library to allow sharing code among the
various frontend programs and backend; this lets us eliminate duplicate
implementations of common routines.  We avoid libpgport, because that's
intended as a place for porting issues; per discussion, it seems better
to keep them separate.

The first use case, and the only implemented by this patch, is pg_malloc
and friends, which many frontend programs were already using.

At the same time, we can use this to provide palloc emulation functions
for the frontend; this way, some palloc-using files in the backend can
also be used by the frontend cleanly.  To do this, we change palloc() in
the backend to be a function instead of a macro on top of
MemoryContextAlloc().  This was previously believed to cause loss of
performance, but this implementation has been tweaked by Tom and Andres
so that on modern compilers it provides a slight improvement over the
previous one.

This lets us clean up some places that were already with
localized hacks.

Most of the pg_malloc/palloc changes in this patch were authored by
Andres Freund. Zoltán Böszörményi also independently provided a form of
that.  libpgcommon infrastructure was authored by Álvaro.
2013-02-12 11:21:05 -03:00
Tom Lane 991f3e5ab3 Provide database object names as separate fields in error messages.
This patch addresses the problem that applications currently have to
extract object names from possibly-localized textual error messages,
if they want to know for example which index caused a UNIQUE_VIOLATION
failure.  It adds new error message fields to the wire protocol, which
can carry the name of a table, table column, data type, or constraint
associated with the error.  (Since the protocol spec has always instructed
clients to ignore unrecognized field types, this should not create any
compatibility problem.)

Support for providing these new fields has been added to just a limited set
of error reports (mainly, those in the "integrity constraint violation"
SQLSTATE class), but we will doubtless add them to more calls in future.

Pavel Stehule, reviewed and extensively revised by Peter Geoghegan, with
additional hacking by Tom Lane.
2013-01-29 17:08:26 -05:00
Tom Lane 0d5fbdc157 Change plan caching to honor, not resist, changes in search_path.
In the initial implementation of plan caching, we saved the active
search_path when a plan was first cached, then reinstalled that path
anytime we needed to reparse or replan.  The idea of that was to try to
reselect the same referenced objects, in somewhat the same way that views
continue to refer to the same objects in the face of schema or name
changes.  Of course, that analogy doesn't bear close inspection, since
holding the search_path fixed doesn't cope with object drops or renames.
Moreover sticking with the old path seems to create more surprises than
it avoids.  So instead of doing that, consider that the cached plan depends
on search_path, and force reparse/replan if the active search_path is
different than it was when we last saved the plan.

This gets us fairly close to having "transparency" of plan caching, in the
sense that the cached statement acts the same as if you'd just resubmitted
the original query text for another execution.  There are still some corner
cases where this fails though: a new object added in the search path
schema(s) might capture a reference in the query text, but we'd not realize
that and force a reparse.  We might try to fix that in the future, but for
the moment it looks too expensive and complicated.
2013-01-25 14:14:41 -05:00
Alvaro Herrera 0ac5ad5134 Improve concurrency of foreign key locking
This patch introduces two additional lock modes for tuples: "SELECT FOR
KEY SHARE" and "SELECT FOR NO KEY UPDATE".  These don't block each
other, in contrast with already existing "SELECT FOR SHARE" and "SELECT
FOR UPDATE".  UPDATE commands that do not modify the values stored in
the columns that are part of the key of the tuple now grab a SELECT FOR
NO KEY UPDATE lock on the tuple, allowing them to proceed concurrently
with tuple locks of the FOR KEY SHARE variety.

Foreign key triggers now use FOR KEY SHARE instead of FOR SHARE; this
means the concurrency improvement applies to them, which is the whole
point of this patch.

The added tuple lock semantics require some rejiggering of the multixact
module, so that the locking level that each transaction is holding can
be stored alongside its Xid.  Also, multixacts now need to persist
across server restarts and crashes, because they can now represent not
only tuple locks, but also tuple updates.  This means we need more
careful tracking of lifetime of pg_multixact SLRU files; since they now
persist longer, we require more infrastructure to figure out when they
can be removed.  pg_upgrade also needs to be careful to copy
pg_multixact files over from the old server to the new, or at least part
of multixact.c state, depending on the versions of the old and new
servers.

Tuple time qualification rules (HeapTupleSatisfies routines) need to be
careful not to consider tuples with the "is multi" infomask bit set as
being only locked; they might need to look up MultiXact values (i.e.
possibly do pg_multixact I/O) to find out the Xid that updated a tuple,
whereas they previously were assured to only use information readily
available from the tuple header.  This is considered acceptable, because
the extra I/O would involve cases that would previously cause some
commands to block waiting for concurrent transactions to finish.

Another important change is the fact that locking tuples that have
previously been updated causes the future versions to be marked as
locked, too; this is essential for correctness of foreign key checks.
This causes additional WAL-logging, also (there was previously a single
WAL record for a locked tuple; now there are as many as updated copies
of the tuple there exist.)

With all this in place, contention related to tuples being checked by
foreign key rules should be much reduced.

As a bonus, the old behavior that a subtransaction grabbing a stronger
tuple lock than the parent (sub)transaction held on a given tuple and
later aborting caused the weaker lock to be lost, has been fixed.

Many new spec files were added for isolation tester framework, to ensure
overall behavior is sane.  There's probably room for several more tests.

There were several reviewers of this patch; in particular, Noah Misch
and Andres Freund spent considerable time in it.  Original idea for the
patch came from Simon Riggs, after a problem report by Joel Jacobson.
Most code is from me, with contributions from Marti Raudsepp, Alexander
Shulgin, Noah Misch and Andres Freund.

This patch was discussed in several pgsql-hackers threads; the most
important start at the following message-ids:
	AANLkTimo9XVcEzfiBR-ut3KVNDkjm2Vxh+t8kAmWjPuv@mail.gmail.com
	1290721684-sup-3951@alvh.no-ip.org
	1294953201-sup-2099@alvh.no-ip.org
	1320343602-sup-2290@alvh.no-ip.org
	1339690386-sup-8927@alvh.no-ip.org
	4FE5FF020200002500048A3D@gw.wicourts.gov
	4FEAB90A0200002500048B7D@gw.wicourts.gov
2013-01-23 12:04:59 -03:00
Robert Haas 841a5150c5 Add ddl_command_end support for event triggers.
Dimitri Fontaine, with slight changes by me
2013-01-21 18:00:24 -05:00
Tom Lane 2065dd2834 Prevent very-low-probability PANIC during PREPARE TRANSACTION.
The code in PostPrepare_Locks supposed that it could reassign locks to
the prepared transaction's dummy PGPROC by deleting the PROCLOCK table
entries and immediately creating new ones.  This was safe when that code
was written, but since we invented partitioning of the shared lock table,
it's not safe --- another process could steal away the PROCLOCK entry in
the short interval when it's on the freelist.  Then, if we were otherwise
out of shared memory, PostPrepare_Locks would have to PANIC, since it's
too late to back out of the PREPARE at that point.

Fix by inventing a dynahash.c function to atomically update a hashtable
entry's key.  (This might possibly have other uses in future.)

This is an ancient bug that in principle we ought to back-patch, but the
odds of someone hitting it in the field seem really tiny, because (a) the
risk window is small, and (b) nobody runs servers with maxed-out lock
tables for long, because they'll be getting non-PANIC out-of-memory errors
anyway.  So fixing it in HEAD seems sufficient, at least until the new
code has gotten some testing.
2013-01-13 22:20:22 -05:00
Tom Lane b853eb9718 Improve handling of ereport(ERROR) and elog(ERROR).
In commit 71450d7fd6, we added code to inform
suitably-intelligent compilers that ereport() doesn't return if the elevel
is ERROR or higher.  This patch extends that to elog(), and also fixes a
double-evaluation hazard that the previous commit created in ereport(),
as well as reducing the emitted code size.

The elog() improvement requires the compiler to support __VA_ARGS__, which
should be available in just about anything nowadays since it's required by
C99.  But our minimum language baseline is still C89, so add a configure
test for that.

The previous commit assumed that ereport's elevel could be evaluated twice,
which isn't terribly safe --- there are already counterexamples in xlog.c.
On compilers that have __builtin_constant_p, we can use that to protect the
second test, since there's no possible optimization gain if the compiler
doesn't know the value of elevel.  Otherwise, use a local variable inside
the macros to prevent double evaluation.  The local-variable solution is
inferior because (a) it leads to useless code being emitted when elevel
isn't constant, and (b) it increases the optimization level needed for the
compiler to recognize that subsequent code is unreachable.  But it seems
better than not teaching non-gcc compilers about unreachability at all.

Lastly, if the compiler has __builtin_unreachable(), we can use that
instead of abort(), resulting in a noticeable code savings since no
function call is actually emitted.  However, it seems wise to do this only
in non-assert builds.  In an assert build, continue to use abort(), so that
the behavior will be predictable and debuggable if the "impossible"
happens.

These changes involve making the ereport and elog macros emit do-while
statement blocks not just expressions, which forces small changes in
a few call sites.

Andres Freund, Tom Lane, Heikki Linnakangas
2013-01-13 18:40:09 -05:00
Tom Lane 94afbd5831 Invent a "one-shot" variant of CachedPlans for better performance.
SPI_execute() and related functions create a CachedPlan, execute it once,
and immediately discard it, so that the functionality offered by
plancache.c is of no value in this code path.  And performance measurements
show that the extra data copying and invalidation checking done by
plancache.c slows down simple queries by 10% or more compared to 9.1.
However, enough of the SPI code is shared with functions that do need plan
caching that it seems impractical to bypass plancache.c altogether.
Instead, let's invent a variant version of cached plans that preserves
99% of the API but doesn't offer any of the actual functionality, nor the
overhead.  This puts SPI_execute() performance back on par, or maybe even
slightly better, than it was before.  This change should resolve recent
complaints of performance degradation from Dong Ye, Pavel Stehule, and
others.

By avoiding data copying, this change also reduces the amount of memory
needed to execute many-statement SPI_execute() strings, as for instance in
a recent complaint from Tomas Vondra.

An additional benefit of this change is that multi-statement SPI_execute()
query strings are now processed fully serially, that is we complete
execution of earlier statements before running parse analysis and planning
on following ones.  This eliminates a long-standing POLA violation, in that
DDL that affects the behavior of a later statement will now behave as
expected.

Back-patch to 9.2, since this was a performance regression compared to 9.1.
(In 9.2, place the added struct fields so as to avoid changing the offsets
of existing fields.)

Heikki Linnakangas and Tom Lane
2013-01-04 17:42:19 -05:00
Bruce Momjian bd61a623ac Update copyrights for 2013
Fully update git head, and update back branches in ./COPYRIGHT and
legal.sgml files.
2013-01-01 17:15:01 -05:00
Simon Riggs c2b3218064 Update comments on rd_newRelfilenodeSubid.
Ensure comments accurately reflect state of code
given new understanding, and recent changes.
Include example code from Noah Misch to
illustrate how rd_newRelfilenodeSubid can be
reset deterministically. No code changes.
2012-12-24 17:07:06 +00:00
Tom Lane 6919b7e329 Fix failure to ignore leftover temp tables after a server crash.
During crash recovery, we remove disk files belonging to temporary tables,
but the system catalog entries for such tables are intentionally not
cleaned up right away.  Instead, the first backend that uses a temp schema
is expected to clean out any leftover objects therein.  This approach
requires that we be careful to ignore leftover temp tables (since any
actual access attempt would fail), *even if their BackendId matches our
session*, if we have not yet established use of the session's corresponding
temp schema.  That worked fine in the past, but was broken by commit
debcec7dc3 which incorrectly removed the
rd_islocaltemp relcache flag.  Put it back, and undo various changes
that substituted tests like "rel->rd_backend == MyBackendId" for use
of a state-aware flag.  Per trouble report from Heikki Linnakangas.

Back-patch to 9.1 where the erroneous change was made.  In the back
branches, be careful to add rd_islocaltemp in a spot in the struct that
was alignment padding before, so as not to break existing add-on code.
2012-12-17 20:15:32 -05:00
Tom Lane a99c42f291 Support automatically-updatable views.
This patch makes "simple" views automatically updatable, without the need
to create either INSTEAD OF triggers or INSTEAD rules.  "Simple" views
are those classified as updatable according to SQL-92 rules.  The rewriter
transforms INSERT/UPDATE/DELETE commands on such views directly into an
equivalent command on the underlying table, which will generally have
noticeably better performance than is possible with either triggers or
user-written rules.  A view that has INSTEAD OF triggers or INSTEAD rules
continues to operate the same as before.

For the moment, security_barrier views are not considered simple.
Also, we do not support WITH CHECK OPTION.  These features may be
added in future.

Dean Rasheed, reviewed by Amit Kapila
2012-12-08 18:26:21 -05:00
Simon Riggs 8de72b66a2 COPY FREEZE and mark committed on fresh tables.
When a relfilenode is created in this subtransaction or
a committed child transaction and it cannot otherwise
be seen by our own process, mark tuples committed ahead
of transaction commit for all COPY commands in same
transaction. If FREEZE specified on COPY
and pre-conditions met then rows will also be frozen.
Both options designed to avoid revisiting rows after commit,
increasing performance of subsequent commands after
data load and upgrade. pg_restore changes later.

Simon Riggs, review comments from Heikki Linnakangas, Noah Misch and design
input from Tom Lane, Robert Haas and Kevin Grittner
2012-12-01 12:54:20 +00:00
Heikki Linnakangas dbdf9679d7 Use correct text domain for translating errcontext() messages.
errcontext() is typically used in an error context callback function, not
within an ereport() invocation like e.g errmsg and errdetail are. That means
that the message domain that the TEXTDOMAIN magic in ereport() determines
is not the right one for the errcontext() calls. The message domain needs to
be determined by the C file containing the errcontext() call, not the file
containing the ereport() call.

Fix by turning errcontext() into a macro that passes the TEXTDOMAIN to use
for the errcontext message. "errcontext" was used in a few places as a
variable or struct field name, I had to rename those out of the way, now
that errcontext is a macro.

We've had this problem all along, but this isn't doesn't seem worth
backporting. It's a fairly minor issue, and turning errcontext from a
function to a macro requires at least a recompile of any external code that
calls errcontext().
2012-11-12 17:07:29 +02:00
Heikki Linnakangas add6c3179a Make the streaming replication protocol messages architecture-independent.
We used to send structs wrapped in CopyData messages, which works as long as
the client and server agree on things like endianess, timestamp format and
alignment. That's good enough for running a standby server, which has to run
on the same platform anyway, but it's useful for tools like pg_receivexlog
to work across platforms.

This breaks protocol compatibility of streaming replication, but we never
promised that to be compatible across versions, anyway.
2012-11-07 19:09:13 +02:00
Tom Lane dc5aeca168 Remove unnecessary "head" arguments from some dlist/slist functions.
dlist_delete, dlist_insert_after, dlist_insert_before, slist_insert_after
do not need access to the list header, and indeed insisting on that negates
one of the main advantages of a doubly-linked list.

In consequence, revert addition of "cache_bucket" field to CatCTup.
2012-10-18 19:04:20 -04:00
Alvaro Herrera a66ee69add Embedded list interface
Provide a common implementation of embedded singly-linked and
doubly-linked lists.  "Embedded" in the sense that the nodes'
next/previous pointers exist within some larger struct; this design
choice reduces memory allocation overhead.

Most of the implementation uses inlineable functions (where supported),
for performance.

Some existing uses of both types of lists have been converted to the new
code, for demonstration purposes.  Other uses can (and probably will) be
converted in the future.  Since dllist.c is unused after this conversion,
it has been removed.

Author: Andres Freund
Some tweaks by me
Reviewed by Tom Lane, Peter Geoghegan
2012-10-17 11:31:20 -03:00
Alvaro Herrera f46baf601d Rename USE_INLINE to PG_USE_INLINE
The former name was too likely to conflict with symbols from external
headers; and, as seen in recent buildfarm failures in member spoonbill,
it has now happened at least in plpython.
2012-10-09 11:17:33 -03:00
Alvaro Herrera 976fa10d20 Add support for easily declaring static inline functions
We already had those, but they forced modules to spell out the function
bodies twice.  Eliminate some duplicates we had already grown.

Extracted from a somewhat larger patch from Andres Freund.
2012-10-08 16:28:01 -03:00
Heikki Linnakangas 2a0c81a12c Add support for include_dir in config file.
This allows easily splitting configuration into many files, deployed in a
directory.

Magnus Hagander, Greg Smith, Selena Deckelmann, reviewed by Noah Misch.
2012-09-24 18:07:53 +03:00
Tom Lane 11e131854f Improve ruleutils.c's heuristics for dealing with rangetable aliases.
The previous scheme had bugs in some corner cases involving tables that had
been renamed since a view was made.  This could result in dumped views that
failed to reload or reloaded incorrectly, as seen in bug #7553 from Lloyd
Albin, as well as in some pgsql-hackers discussion back in January.  Also,
its behavior for printing EXPLAIN plans was sometimes confusing because of
willingness to use the same alias for multiple RTEs (it was Ashutosh
Bapat's complaint about that aspect that started the January thread).

To fix, ensure that each RTE in the query has a unique unqualified alias,
by modifying the alias if necessary (we add "_" and digits as needed to
create a non-conflicting name).  Then we can just print its variables with
that alias, avoiding the confusing and bug-prone scheme of sometimes
schema-qualifying variable names.  In EXPLAIN, it proves to be expedient to
take the further step of only assigning such aliases to RTEs that are
actually referenced in the query, since the planner has a habit of
generating extra RTEs with the same alias in situations such as
inheritance-tree expansion.

Although this fixes a bug of very long standing, I'm hesitant to back-patch
such a noticeable behavioral change.  My experiments while creating a
regression test convinced me that actually incorrect output (as opposed to
confusing output) occurs only in very narrow cases, which is backed up by
the lack of previous complaints from the field.  So we may be better off
living with it in released branches; and in any case it'd be smart to let
this ripen awhile in HEAD before we consider back-patching it.
2012-09-21 19:03:10 -04:00
Bruce Momjian 015722fb36 Fix to_date() and to_timestamp() to allow specification of the day of
the week via ISO or Gregorian designations.  The fix is to store the
day-of-week consistently as 1-7, Sunday = 1.

Fixes bug reported by Marc Munro
2012-09-03 22:52:44 -04:00
Alvaro Herrera c219d9b0a5 Split tuple struct defs from htup.h to htup_details.h
This reduces unnecessary exposure of other headers through htup.h, which
is very widely included by many files.

I have chosen to move the function prototypes to the new file as well,
because that means htup.h no longer needs to include tupdesc.h.  In
itself this doesn't have much effect in indirect inclusion of tupdesc.h
throughout the tree, because it's also required by execnodes.h; but it's
something to explore in the future, and it seemed best to do the htup.h
change now while I'm busy with it.
2012-08-30 16:52:35 -04:00
Alvaro Herrera fda0594fc2 remove catcache.h from syscache.h
Instead, place a forward struct declaration for struct catclist in
syscache.h.  This reduces header proliferation somewhat.
2012-08-28 18:36:39 -04:00
Alvaro Herrera 45326c5a11 Split resowner.h
This lets files that are mere users of ResourceOwner not automatically
include the headers for stuff that is managed by the resowner mechanism.
2012-08-28 18:02:07 -04:00
Heikki Linnakangas 918eee0c49 Collect and use histograms of lower and upper bounds for range types.
This enables selectivity estimation of the <<, >>, &<, &> and && operators,
as well as the normal inequality operators: <, <=, >=, >. "range @> element"
is also supported, but the range-variant @> and <@ operators are not,
because they cannot be sensibly estimated with lower and upper bound
histograms alone. We would need to make some assumption about the lengths of
the ranges for that. Alexander's patch included a separate histogram of
lengths for that, but I left that out of the patch for simplicity. Hopefully
that will be added as a followup patch.

The fraction of empty ranges is also calculated and used in estimation.

Alexander Korotkov, heavily modified by me.
2012-08-27 15:58:46 +03:00
Peter Eisentraut 5c45d2f878 Mark DateTimeParseError() noreturn
This avoids a warning from clang 3.2 about an uninitialized variable
'dtype' in date_in().
2012-08-21 23:32:58 -04:00
Peter Eisentraut 71450d7fd6 Teach compiler that ereport(>=ERROR) does not return
When elevel >= ERROR, we add an abort() call to the ereport() macro to
give the compiler a hint that the ereport() expansion will not return,
but the abort() isn't actually reached because the longjmp happens in
errfinish().

Because the effect of ereport() varies with the elevel, we cannot use
standard compiler attributes such as noreturn for this.
2012-08-21 00:03:32 -04:00
Heikki Linnakangas 317dd55a9c Add SP-GiST support for range types.
The implementation is a quad-tree, largely copied from the quad-tree
implementation for points. The lower and upper bound of ranges are the 2d
coordinates, with some extra code to handle empty ranges.

I left out the support for adjacent operator, -|-, from the original patch.
Not because there was necessarily anything wrong with it, but it was more
complicated than the other operators, and I only have limited time for
reviewing. That will follow as a separate patch.

Alexander Korotkov, reviewed by Jeff Davis and me.
2012-08-16 14:30:45 +03:00
Tom Lane c9b0cbe98b Support having multiple Unix-domain sockets per postmaster.
Replace unix_socket_directory with unix_socket_directories, which is a list
of socket directories, and adjust postmaster's code to allow zero or more
Unix-domain sockets to be created.

This is mostly a straightforward change, but since the Unix sockets ought
to be created after the TCP/IP sockets for safety reasons (better chance
of detecting a port number conflict), AddToDataDirLockFile needs to be
fixed to support out-of-order updates of data directory lockfile lines.
That's a change that had been foreseen to be necessary someday anyway.

Honza Horak, reviewed and revised by Tom Lane
2012-08-10 17:27:15 -04:00
Robert Haas 3a0e4d36eb Make new event trigger facility actually do something.
Commit 3855968f32 added syntax, pg_dump,
psql support, and documentation, but the triggers didn't actually fire.
With this commit, they now do.  This is still a pretty basic facility
overall because event triggers do not get a whole lot of information
about what the user is trying to do unless you write them in C; and
there's still no option to fire them anywhere except at the very
beginning of the execution sequence, but it's better than nothing,
and a good building block for future work.

Along the way, add a regression test for ALTER LARGE OBJECT, since
testing of event triggers reveals that we haven't got one.

Dimitri Fontaine and Robert Haas
2012-07-20 11:39:01 -04:00
Heikki Linnakangas a7a4add6c4 Refactor the way code is shared between some range type functions.
Functions like range_eq, range_before etc. are exposed at the SQL-level, but
they're also used internally by the GiST consistent support function. The
code sharing was done by a hack, TrickFunctionCall2, which relied on the
knowledge that all the functions used fn_extra the same way. This commit
splits the functions into internal versions that take a TypeCacheEntry as
argument, and thin wrappers to expose the functions at the SQL-level. The
internal versions can then be called directly and in a less hacky way from
the GiST consistent function.

This is just cosmetic, but backpatch to 9.2 anyway, to avoid having a
different version of this code in the 9.2 branch. That would make
backpatching fixes in this area more difficult.

Alexander Korotkov
2012-07-18 23:14:56 +03:00
Robert Haas 3855968f32 Syntax support and documentation for event triggers.
They don't actually do anything yet; that will get fixed in a
follow-on commit.  But this gets the basic infrastructure in place,
including CREATE/ALTER/DROP EVENT TRIGGER; support for COMMENT,
SECURITY LABEL, and ALTER EXTENSION .. ADD/DROP EVENT TRIGGER;
pg_dump and psql support; and documentation for the anticipated
initial feature set.

Dimitri Fontaine, with review and a bunch of additional hacking by me.
Thom Brown extensively reviewed earlier versions of this patch set,
but there's not a whole lot of that code left in this commit, as it
turns out.
2012-07-18 10:16:16 -04:00
Alvaro Herrera f34c68f096 Introduce timeout handling framework
Management of timeouts was getting a little cumbersome; what we
originally had was more than enough back when we were only concerned
about deadlocks and query cancel; however, when we added timeouts for
standby processes, the code got considerably messier.  Since there are
plans to add more complex timeouts, this seems a good time to introduce
a central timeout handling module.

External modules register their timeout handlers during process
initialization, and later enable and disable them as they see fit using
a simple API; timeout.c is in charge of keeping track of which timeouts
are in effect at any time, installing a common SIGALRM signal handler,
and calling setitimer() as appropriate to ensure timely firing of
external handlers.

timeout.c additionally supports pluggable modules to add their own
timeouts, though this capability isn't exercised anywhere yet.

Additionally, as of this commit, walsender processes are aware of
timeouts; we had a preexisting bug there that made those ignore SIGALRM,
thus being subject to unhandled deadlocks, particularly during the
authentication phase.  This has already been fixed in back branches in
commit 0bf8eb2a, which see for more details.

Main author: Zoltán Böszörményi
Some review and cleanup by Álvaro Herrera
Extensive reworking by Tom Lane
2012-07-16 22:55:33 -04:00
Tom Lane 84a42560c8 Add array_remove() and array_replace() functions.
These functions support removing or replacing array element value(s)
matching a given search value.  Although intended mainly to support a
future array-foreign-key feature, they seem useful in their own right.

Marco Nenciarini and Gabriele Bartolini, reviewed by Alex Hunsaker
2012-07-11 13:59:35 -04:00
Tom Lane 628cbb50ba Re-implement extraction of fixed prefixes from regular expressions.
To generate btree-indexable conditions from regex WHERE conditions (such as
WHERE indexed_col ~ '^foo'), we need to be able to identify any fixed
prefix that a regex might have; that is, find any string that must be a
prefix of all strings satisfying the regex.  We used to do that with
entirely ad-hoc code that looked at the source text of the regex.  It
didn't know very much about regex syntax, which mostly meant that it would
fail to identify some optimizable cases; but Viktor Rosenfeld reported that
it would produce actively wrong answers for quantified parenthesized
subexpressions, such as '^(foo)?bar'.  Rather than trying to extend the
ad-hoc code to cover this, let's get rid of it altogether in favor of
identifying prefixes by examining the compiled form of a regex.

To do this, I've added a new entry point "pg_regprefix" to the regex library;
hopefully it is defined in a sufficiently general fashion that it can remain
in the library when/if that code gets split out as a standalone project.

Since this bug has been there for a very long time, this fix needs to get
back-patched.  However it depends on some other recent commits (particularly
the addition of wchar-to-database-encoding conversion), so I'll commit this
separately and then go to work on back-porting the necessary fixes.
2012-07-10 14:54:37 -04:00
Tom Lane 00dac6000d Refactor pattern_fixed_prefix() to avoid dealing in incomplete patterns.
Previously, pattern_fixed_prefix() was defined to return whatever fixed
prefix it could extract from the pattern, plus the "rest" of the pattern.
That definition was sensible for LIKE patterns, but not so much for
regexes, where reconstituting a valid pattern minus the prefix could be
quite tricky (certainly the existing code wasn't doing that correctly).
Since the only thing that callers ever did with the "rest" of the pattern
was to pass it to like_selectivity() or regex_selectivity(), let's cut out
the middle-man and just have pattern_fixed_prefix's subroutines do this
directly.  Then pattern_fixed_prefix can return a simple selectivity
number, and the question of how to cope with partial patterns is removed
from its API specification.

While at it, adjust the API spec so that callers who don't actually care
about the pattern's selectivity (which is a lot of them) can pass NULL for
the selectivity pointer to skip doing the work of computing a selectivity
estimate.

This patch is only an API refactoring that doesn't actually change any
processing, other than allowing a little bit of useless work to be skipped.
However, it's necessary infrastructure for my upcoming fix to regex prefix
extraction, because after that change there won't be any simple way to
identify the "rest" of the regex, not even to the low level of fidelity
needed by regex_selectivity.  We can cope with that if regex_fixed_prefix
and regex_selectivity communicate directly, but not if we have to work
within the old API.  Hence, back-patch to all active branches.
2012-07-09 23:22:55 -04:00
Peter Eisentraut eeece9e609 Unify calling conventions for postgres/postmaster sub-main functions
There was a wild mix of calling conventions: Some were declared to
return void and didn't return, some returned an int exit code, some
claimed to return an exit code, which the callers checked, but
actually never returned, and so on.

Now all of these functions are declared to return void and decorated
with attribute noreturn and don't return.  That's easiest, and most
code already worked that way.
2012-06-25 21:30:12 +03:00
Peter Eisentraut b8b2e3b2de Replace int2/int4 in C code with int16/int32
The latter was already the dominant use, and it's preferable because
in C the convention is that intXX means XX bits.  Therefore, allowing
mixed use of int2, int4, int8, int16, int32 is obviously confusing.

Remove the typedefs for int2 and int4 for now.  They don't seem to be
widely used outside of the PostgreSQL source tree, and the few uses
can probably be cleaned up by the time this ships.
2012-06-25 01:51:46 +03:00
Heikki Linnakangas eeb6f37d89 Add a small cache of locks owned by a resource owner in ResourceOwner.
This speeds up reassigning locks to the parent owner, when the transaction
holds a lot of locks, but only a few of them belong to the current resource
owner. This is particularly helps pg_dump when dumping a large number of
objects.

The cache can hold up to 15 locks in each resource owner. After that, the
cache is marked as overflowed, and we fall back to the old method of
scanning the whole local lock table. The tradeoff here is that the cache has
to be scanned whenever a lock is released, so if the cache is too large,
lock release becomes more expensive. 15 seems enough to cover pg_dump, and
doesn't have much impact on lock release.

Jeff Janes, reviewed by Amit Kapila and Heikki Linnakangas.
2012-06-21 15:30:26 +03:00
Peter Eisentraut 15b1918e7d Improve reporting of permission errors for array types
Because permissions are assigned to element types, not array types,
complaining about permission denied on an array type would be
misleading to users.  So adjust the reporting to refer to the element
type instead.

In order not to duplicate the required logic in two dozen places,
refactor the permission denied reporting for types a bit.

pointed out by Yeb Havinga during the review of the type privilege
feature
2012-06-15 22:55:03 +03:00
Robert Haas d2c86a1ccd Remove RELKIND_UNCATALOGED.
This may have been important at some point in the past, but it no
longer does anything useful.

Review by Tom Lane.
2012-06-14 09:47:30 -04:00
Bruce Momjian 927d61eeff Run pgindent on 9.2 source tree in preparation for first 9.3
commit-fest.
2012-06-10 15:20:04 -04:00
Peter Eisentraut 8570114dc1 Make include files work without having to include other ones first 2012-06-10 12:46:14 +03:00
Robert Haas 0038110421 Avoid repeated CLOG access from heap_hot_search_buffer.
At the time we check whether the tuple is dead to all running
transactions, we've already verified that it isn't visible to our
scan, setting hint bits if appropriate.  So there's no need to
recheck CLOG for the all-dead test we do just a moment later.
So, add HeapTupleIsSurelyDead() to test the appropriate condition
under the assumption that all relevant hit bits are already set.

Review by Tom Lane.
2012-05-02 12:40:07 -04:00
Peter Eisentraut 26471a51fc Mark ReThrowError() with attribute noreturn
All related functions were already so marked.
2012-04-30 20:23:41 +03:00
Robert Haas 4a2d7ad76f pg_size_pretty(numeric)
The output of the new pg_xlog_location_diff function is of type numeric,
since it could theoretically overflow an int8 due to signedness; this
provides a convenient way to format such values.

Fujii Masao, with some beautification by me.
2012-04-14 08:07:25 -04:00
Peter Eisentraut c0cc526e8b Rename bytea_agg to string_agg and add delimiter argument
Per mailing list discussion, we would like to keep the bytea functions
parallel to the text functions, so rename bytea_agg to string_agg,
which already exists for text.

Also, to satisfy the rule that we don't want aggregate functions of
the same name with a different number of arguments, add a delimiter
argument, just like string_agg for text already has.
2012-04-13 21:36:59 +03:00
Tom Lane c7cea267de Replace empty locale name with implied value in CREATE DATABASE and initdb.
setlocale() accepts locale name "" as meaning "the locale specified by the
process's environment variables".  Historically we've accepted that for
Postgres' locale settings, too.  However, it's fairly unsafe to store an
empty string in a new database's pg_database.datcollate or datctype fields,
because then the interpretation could vary across postmaster restarts,
possibly resulting in index corruption and other unpleasantness.

Instead, we should expand "" to whatever it means at the moment of calling
CREATE DATABASE, which we can do by saving the value returned by
setlocale().

For consistency, make initdb set up the initial lc_xxx parameter values the
same way.  initdb was already doing the right thing for empty locale names,
but it did not replace non-empty names with setlocale results.  On a
platform where setlocale chooses to canonicalize the spellings of locale
names, this would result in annoying inconsistency.  (It seems that popular
implementations of setlocale don't do such canonicalization, which is a
pity, but the POSIX spec certainly allows it to be done.)  The same risk
of inconsistency leads me to not venture back-patching this, although it
could certainly be seen as a longstanding bug.

Per report from Jeff Davis, though this is not his proposed patch.
2012-03-25 21:47:22 -04:00
Peter Eisentraut eb990a2b9e Add const qualifier to tzn returned by timestamp2tm()
The tzn value might come from tm->tm_zone, which libc declares as
const, so it's prudent that the upper layers know about this as well.
2012-03-15 21:17:19 +02:00
Peter Eisentraut ad4fb0d0d2 Improve EncodeDateTime and EncodeTimeOnly APIs
Use an explicit argument to tell whether to include the time zone in
the output, rather than using some undocumented pointer magic.
2012-03-14 23:03:34 +02:00
Tom Lane d4bf3c9c94 Expose an API for calculating catcache hash values.
Now that cache invalidation callbacks get only a hash value, and not a
tuple TID (per commits 632ae6829f and
b5282aa893), the only way they can restrict
what they invalidate is to know what the hash values mean.  setrefs.c was
doing this via a hard-wired assumption but that seems pretty grotty, and
it'll only get worse as more cases come up.  So let's expose a calculation
function that takes the same parameters as SearchSysCache.  Per complaint
from Marko Kreen.
2012-03-07 14:51:13 -05:00
Tom Lane 19dbc34631 Add a hook for processing messages due to be sent to the server log.
Use-cases for this include custom log filtering rules and custom log
message transmission mechanisms (for instance, lossy log message
collection, which has been discussed several times recently).

As is our common practice for hooks, there's no regression test nor
user-facing documentation for this, though the author did exhibit a
sample module using the hook.

Martin Pihlak, reviewed by Marti Raudsepp
2012-03-06 15:35:41 -05:00
Tom Lane 0e5e167aae Collect and use element-frequency statistics for arrays.
This patch improves selectivity estimation for the array <@, &&, and @>
(containment and overlaps) operators.  It enables collection of statistics
about individual array element values by ANALYZE, and introduces
operator-specific estimators that use these stats.  In addition,
ScalarArrayOpExpr constructs of the forms "const = ANY/ALL (array_column)"
and "const <> ANY/ALL (array_column)" are estimated by treating them as
variants of the containment operators.

Since we still collect scalar-style stats about the array values as a
whole, the pg_stats view is expanded to show both these stats and the
array-style stats in separate columns.  This creates an incompatible change
in how stats for tsvector columns are displayed in pg_stats: the stats
about lexemes are now displayed in the array-related columns instead of the
original scalar-related columns.

There are a few loose ends here, notably that it'd be nice to be able to
suppress either the scalar-style stats or the array-element stats for
columns for which they're not useful.  But the patch is in good enough
shape to commit for wider testing.

Alexander Korotkov, reviewed by Noah Misch and Nathan Boley
2012-03-03 20:20:57 -05:00
Peter Eisentraut 6688d2878e Add COLLATION FOR expression
reviewed by Jaime Casanova
2012-03-02 21:12:16 +02:00
Alvaro Herrera 58e9f974dc Fix typo in comment
Haifeng Liu
2012-02-28 23:52:52 -03:00
Tom Lane 5c02a00d44 Move CRC tables to libpgport, and provide them in a separate include file.
This makes it much more convenient to build tools for Postgres that are
separately compiled and require a matching CRC implementation.

To prevent multiple copies of the CRC polynomial tables being introduced
into the postgres binaries, they are now included in the static library
libpgport that is mainly meant for replacement system functions.  That
seems like a bit of a kludge, but there's no better place.

This cleans up building of the tools pg_controldata and pg_resetxlog,
which previously had to build their own copies of pg_crc.o.

In the future, external programs that need access to the CRC tables can
include the tables directly from the new header file pg_crc_tables.h.

Daniel Farina, reviewed by Abhijit Menon-Sen and Tom Lane
2012-02-28 19:53:39 -05:00
Peter Eisentraut 973e9fb294 Add const qualifiers where they are accidentally cast away
This only produces warnings under -Wcast-qual, but it's more correct
and consistent in any case.
2012-02-28 12:42:08 +02:00
Andrew Dunstan 2f582f76b1 Improve pretty printing of viewdefs.
Some line feeds are added to target lists and from lists to make
them more readable. By default they wrap at 80 columns if possible,
but the wrap column is also selectable - if 0 it wraps after every
item.

Andrew Dunstan, reviewed by Hitoshi Harada.
2012-02-19 11:43:46 -05:00
Tom Lane 4767bc8ff2 Improve statistics estimation to make some use of DISTINCT in sub-queries.
Formerly, we just punted when trying to estimate stats for variables coming
out of sub-queries using DISTINCT, on the grounds that whatever stats we
might have for underlying table columns would be inapplicable.  But if the
sub-query has only one DISTINCT column, we can consider its output variable
as being unique, which is useful information all by itself.  The scope of
this improvement is pretty narrow, but it costs nearly nothing, so we might
as well do it.  Per discussion with Andres Freund.

This patch differs from the draft I submitted yesterday in updating various
comments about vardata.isunique (to reflect its extended meaning) and in
tweaking the interaction with security_barrier views.  There does not seem
to be a reason why we can't use this sort of knowledge even when the
sub-query is such a view.
2012-02-16 17:34:00 -05:00
Tom Lane 4bfe68dfab Run a portal's cleanup hook immediately when pushing it to FAILED state.
This extends the changes of commit 6252c4f9e2
so that we run the cleanup hook earlier for failure cases as well as
success cases.  As before, the point is to avoid an assertion failure from
an Assert I added in commit a874fe7b4c, which
was meant to check that no user-written code can be called during portal
cleanup.  This fixes a case reported by Pavan Deolasee in which the Assert
could be triggered during backend exit (see the new regression test case),
and also prevents the possibility that the cleanup hook is run after
portions of the portal's state have already been recycled.  That doesn't
really matter in current usage, but it foreseeably could matter in the
future.

Back-patch to 9.1 where the Assert in question was added.
2012-02-15 16:19:01 -05:00
Robert Haas cd30728fb2 Allow LEAKPROOF functions for better performance of security views.
We don't normally allow quals to be pushed down into a view created
with the security_barrier option, but functions without side effects
are an exception: they're OK.  This allows much better performance in
common cases, such as when using an equality operator (that might
even be indexable).

There is an outstanding issue here with the CREATE FUNCTION / ALTER
FUNCTION syntax: there's no way to use ALTER FUNCTION to unset the
leakproof flag.  But I'm committing this as-is so that it doesn't
have to be rebased again; we can fix up the grammar in a future
commit.

KaiGai Kohei, with some wordsmithing by me.
2012-02-13 22:21:14 -05:00
Robert Haas c13897983a Add transform functions for various temporal typmod coercisions.
This enables ALTER TABLE to skip table and index rebuilds in some cases.

Noah Misch, with trivial changes by me.
2012-02-08 09:33:37 -05:00
Robert Haas f7d7dade8a Add a transform function for varbit typmod coercisions.
This enables ALTER TABLE to skip table and index rebuilds when the
new type is unconstraint varbit, or when the allowable number of bits
is not decreasing.

Noah Misch, with review and a fix for an OID collision by me.
2012-02-07 12:42:50 -05:00
Robert Haas 3cc0800829 Add a transform function for numeric typmod coercisions.
This enables ALTER TABLE to skip table and index rebuilds when a column
is changed to an unconstrained numeric, or when the scale is unchanged
and the precision does not decrease.

Noah Misch, with a few stylistic changes and a fix for an OID
collision by me.
2012-02-07 12:08:26 -05:00
Andrew Dunstan 39909d1d39 Add array_to_json and row_to_json functions.
Also move the escape_json function from explain.c to json.c where it
seems to belong.

Andrew Dunstan, Reviewd by Abhijit Menon-Sen.
2012-02-03 12:11:16 -05:00
Robert Haas 5384a73f98 Built-in JSON data type.
Like the XML data type, we simply store JSON data as text, after checking
that it is valid.  More complex operations such as canonicalization and
comparison may come later, but this is enough for not.

There are a few open issues here, such as whether we should attempt to
detect UTF-8 surrogate pairs represented as \uXXXX\uYYYY, but this gets
the basic framework in place.
2012-01-31 11:48:23 -05:00
Peter Eisentraut b376ec6fa5 Show default privileges in information schema
Hitherto, the information schema only showed explicitly granted
privileges that were visible in the *acl catalog columns.  If no
privileges had been granted, the implicit privileges were not shown.

To fix that, add an SQL-accessible version of the acldefault()
function, and use that inside the aclexplode() calls to substitute the
catalog-specific default privilege set for null values.

reviewed by Abhijit Menon-Sen
2012-01-27 21:58:51 +02:00
Robert Haas cc53a1e7cc Add bitwise AND, OR, and NOT operators for macaddr data type.
Brendan Jurd, reviewed by Fujii Masao
2012-01-19 15:25:14 -05:00
Tom Lane 21b446dd09 Fix CLUSTER/VACUUM FULL for toast values owned by recently-updated rows.
In commit 7b0d0e9356, I made CLUSTER and
VACUUM FULL try to preserve toast value OIDs from the original toast table
to the new one.  However, if we have to copy both live and recently-dead
versions of a row that has a toasted column, those versions may well
reference the same toast value with the same OID.  The patch then led to
duplicate-key failures as we tried to insert the toast value twice with the
same OID.  (The previous behavior was not very desirable either, since it
would have silently inserted the same value twice with different OIDs.
That wastes space, but what's worse is that the toast values inserted for
already-dead heap rows would not be reclaimed by subsequent ordinary
VACUUMs, since they go into the new toast table marked live not deleted.)

To fix, check if the copied OID already exists in the new toast table, and
if so, assume that it stores the desired value.  This is reasonably safe
since the only case where we will copy an OID from a previous toast pointer
is when toast_insert_or_update was given that toast pointer and so we just
pulled the data from the old table; if we got two different values that way
then we have big problems anyway.  We do have to assume that no other
backend is inserting items into the new toast table concurrently, but
that's surely safe for CLUSTER and VACUUM FULL.

Per bug #6393 from Maxim Boguk.  Back-patch to 9.0, same as the previous
patch.
2012-01-12 16:40:14 -05:00
Robert Haas 1fc3d18faa Slightly reorganize struct SnapshotData.
This squeezes out a bunch of alignment padding, reducing the size
from 72 to 56 bytes on my machine.  At least in my testing, this
didn't produce any measurable performance improvement, but the space
savings seem like enough justification.

Andres Freund
2012-01-06 22:56:00 -05:00
Bruce Momjian e126958c2e Update copyright notices for year 2012. 2012-01-01 18:01:58 -05:00
Robert Haas d5448c7d31 Add bytea_agg, parallel to string_agg.
Pavel Stehule
2011-12-23 08:40:25 -05:00
Robert Haas 0e4611c023 Add a security_barrier option for views.
When a view is marked as a security barrier, it will not be pulled up
into the containing query, and no quals will be pushed down into it,
so that no function or operator chosen by the user can be applied to
rows not exposed by the view.  Views not configured with this
option cannot provide robust row-level security, but will perform far
better.

Patch by KaiGai Kohei; original problem report by Heikki Linnakangas
(in October 2009!).  Review (in earlier versions) by Noah Misch and
others.  Design advice by Tom Lane and myself.  Further review and
cleanup by me.
2011-12-22 16:16:31 -05:00
Peter Eisentraut 729205571e Add support for privileges on types
This adds support for the more or less SQL-conforming USAGE privilege
on types and domains.  The intent is to be able restrict which users
can create dependencies on types, which restricts the way in which
owners can alter types.

reviewed by Yeb Havinga
2011-12-20 00:05:19 +02:00
Tom Lane 3695a55513 Replace simple constant pg_am.amcanreturn with an AM support function.
The need for this was debated when we put in the index-only-scan feature,
but at the time we had no near-term expectation of having AMs that could
support such scans for only some indexes; so we kept it simple.  However,
the SP-GiST AM forces the issue, so let's fix it.

This patch only installs the new API; no behavior actually changes.
2011-12-18 15:50:37 -05:00
Tom Lane 8daeb5ddd6 Add SP-GiST (space-partitioned GiST) index access method.
SP-GiST is comparable to GiST in flexibility, but supports non-balanced
partitioned search structures rather than balanced trees.  As described at
PGCon 2011, this new indexing structure can beat GiST in both index build
time and query speed for search problems that it is well matched to.

There are a number of areas that could still use improvement, but at this
point the code seems committable.

Teodor Sigaev and Oleg Bartunov, with considerable revisions by Tom Lane
2011-12-17 16:42:30 -05:00
Andrew Dunstan 6d09b2105f include_if_exists facility for config file.
This works the same as include, except that an error is not thrown
if the file is missing. Instead the fact that it's missing is
logged.

Greg Smith, reviewed by Euler Taveira de Oliveira.
2011-12-15 19:40:58 -05:00
Heikki Linnakangas 8409b60476 Revert the behavior of inet/cidr functions to not unpack the arguments.
I forgot to change the functions to use the PG_GETARG_INET_PP() macro,
when I changed DatumGetInetP() to unpack the datum, like Datum*P macros
usually do. Also, I screwed up the definition of the PG_GETARG_INET_PP()
macro, and didn't notice because it wasn't used.

This fixes the memory leak when sorting inet values, as reported
by Jochen Erwied and debugged by Andres Freund. Backpatch to 8.3, like
the previous patch that broke it.
2011-12-12 10:10:53 +02:00
Magnus Hagander 16d8e594ac Remove spclocation field from pg_tablespace
Instead, add a function pg_tablespace_location(oid) used to return
the same information, and do this by reading the symbolic link.

Doing it this way makes it possible to relocate a tablespace when the
database is down by simply changing the symbolic link.
2011-12-07 10:37:33 +01:00
Tom Lane c6e3ac11b6 Create a "sort support" interface API for faster sorting.
This patch creates an API whereby a btree index opclass can optionally
provide non-SQL-callable support functions for sorting.  In the initial
patch, we only use this to provide a directly-callable comparator function,
which can be invoked with a bit less overhead than the traditional
SQL-callable comparator.  While that should be of value in itself, the real
reason for doing this is to provide a datatype-extensible framework for
more aggressive optimizations, as in Peter Geoghegan's recent work.

Robert Haas and Tom Lane
2011-12-07 00:19:39 -05:00
Tom Lane c66e4f138b Improve GiST range-contained-by searches by adding a flag for empty ranges.
In the original implementation, a range-contained-by search had to scan
the entire index because an empty range could be lurking anywhere.
Improve that by adding a flag to upper GiST entries that says whether the
represented subtree contains any empty ranges.

Also, make a simple mod to the penalty function to discourage empty ranges
from getting pushed into subtrees without any.  This needs more work, and
the picksplit function should be taught about it too, but that code can be
improved without causing an on-disk compatibility break; so we'll leave it
for another day.

Since we're breaking on-disk compatibility of range values anyway, I took
the opportunity to reorganize the range flags bits; the unused
RANGE_xB_NULL bits are now adjacent, which might open the door for using
them in some other way later.

In passing, remove the GiST range opclass entry for <>, which doesn't seem
like it can really be indexed usefully.

Alexander Korotkov, with some editorializing by Tom
2011-11-27 16:51:29 -05:00
Tom Lane 74c1723fc8 Remove user-selectable ANALYZE option for range types.
It's not clear that a per-datatype typanalyze function would be any more
useful than a generic typanalyze for ranges.  What *is* clear is that
letting unprivileged users select typanalyze functions is a crash risk or
worse.  So remove the option from CREATE TYPE AS RANGE, and instead put in
a generic typanalyze function for ranges.  The generic function does
nothing as yet, but hopefully we'll improve that before 9.2 release.
2011-11-23 00:03:22 -05:00
Tom Lane df73584431 Remove zero- and one-argument range constructor functions.
Per discussion, the zero-argument forms aren't really worth the catalog
space (just write 'empty' instead).  The one-argument forms have some use,
but they also have a serious problem with looking too much like functional
cast notation; to the point where in many real use-cases, the parser would
misinterpret what was wanted.

Committing this as a separate patch, with the thought that we might want
to revert part or all of it if we can think of some way around the cast
ambiguity.
2011-11-22 20:45:05 -05:00
Tom Lane 766948bedd Still more review for range-types patch.
Per discussion, relax the range input/construction rules so that the
only hard error is lower bound > upper bound.  Cases where the lower
bound is <= upper bound, but the range nonetheless normalizes to empty,
are now permitted.

Fix core dump in range_adjacent when bounds are infinite.  Marginal
cleanup of regression test cases, some more code commenting.
2011-11-22 16:06:26 -05:00
Tom Lane b985d48779 Further code review for range types patch.
Fix some bugs in coercion logic and pg_dump; more comment cleanup;
minor cosmetic improvements.
2011-11-20 23:50:27 -05:00
Tom Lane 37ee4b75db Restructure function-internal caching in the range type code.
Move the responsibility for caching specialized information about range
types into the type cache, so that the catalog lookups only have to occur
once per session.  Rearrange APIs a bit so that fn_extra caching is
actually effective in the GiST support code.  (Use of OidFunctionCallN is
bad enough for performance in itself, but it also prevents the function
from exploiting fn_extra caching.)

The range I/O functions are still not very bright about caching repeated
lookups, but that seems like material for a separate patch.

Also, avoid unnecessary use of memcpy to fetch/store the range type OID and
flags, and don't use the full range_deserialize machinery when all we need
to see is the flags value.

Also fix API error in range_gist_penalty --- it was failing to set *penalty
for any case involving an empty range.
2011-11-15 13:05:45 -05:00
Tom Lane f158536285 Fix copyright notices, other minor editing in new range-types code.
No functional changes in this commit (except I could not resist the
temptation to re-word a couple of error messages).  This is just manual
cleanup after pgindent to make the code look reasonably like other PG
code, in preparation for more detailed code review to come.
2011-11-14 13:59:34 -05:00
Heikki Linnakangas 3b8161723c Make DatumGetInetP() unpack inet datums with a 1-byte header, and add
a new macro, DatumGetInetPP(), that does not. This brings these macros
in line with other DatumGet*P() macros.

Backpatch to 8.3, where 1-byte header varlenas were introduced.
2011-11-08 22:39:43 +02:00
Heikki Linnakangas 4429f6a9e3 Support range data types.
Selectivity estimation functions are missing for some range type operators,
which is a TODO.

Jeff Davis
2011-11-03 13:42:15 +02:00
Tom Lane bb446b689b Support synchronization of snapshots through an export/import procedure.
A transaction can export a snapshot with pg_export_snapshot(), and then
others can import it with SET TRANSACTION SNAPSHOT.  The data does not
leave the server so there are not security issues.  A snapshot can only
be imported while the exporting transaction is still running, and there
are some other restrictions.

I'm not totally convinced that we've covered all the bases for SSI (true
serializable) mode, but it works fine for lesser isolation modes.

Joachim Wieland, reviewed by Marko Tiikkaja, and rather heavily modified
by Tom Lane
2011-10-22 18:23:30 -04:00
Tom Lane f9c92a5a3e Code review for pgstat_get_crashed_backend_activity patch.
Avoid possibly dumping core when pgstat_track_activity_query_size has a
less-than-default value; avoid uselessly searching for the query string
of a successfully-exited backend; don't bother putting out an ERRDETAIL if
we don't have a query to show; some other minor stylistic improvements.
2011-10-21 16:36:04 -04:00
Robert Haas c8e8b5a6e2 Try to log current the query string when a backend crashes.
To avoid minimize risk inside the postmaster, we subject this feature
to a number of significant limitations.  We very much wish to avoid
doing any complex processing inside the postmaster, due to the
posssibility that the crashed backend has completely corrupted shared
memory.  To that end, no encoding conversion is done; instead, we just
replace anything that doesn't look like an ASCII character with a
question mark.  We limit the amount of data copied to 1024 characters,
and carefully sanity check the source of that data.  While these
restrictions would doubtless be unacceptable in a general-purpose
logging facility, even this limited facility seems like an improvement
over the status quo ante.

Marti Raudsepp, reviewed by PDXPUG and myself
2011-10-21 13:26:40 -04:00
Tom Lane ba6f629326 Improve and simplify CREATE EXTENSION's management of GUC variables.
CREATE EXTENSION needs to transiently set search_path, as well as
client_min_messages and log_min_messages.  We were doing this by the
expedient of saving the current string value of each variable, doing a
SET LOCAL, and then doing another SET LOCAL with the previous value at
the end of the command.  This is a bit expensive though, and it also fails
badly if there is anything funny about the existing search_path value,
as seen in a recent report from Roger Niederland.  Fortunately, there's a
much better way, which is to piggyback on the GUC infrastructure previously
developed for functions with SET options.  We just open a new GUC nesting
level, do our assignments with GUC_ACTION_SAVE, and then close the nesting
level when done.  This automatically restores the prior settings without a
re-parsing pass, so (in principle anyway) there can't be an error.  And
guc.c still takes care of cleanup in event of an error abort.

The CREATE EXTENSION code for this was modeled on some much older code in
ri_triggers.c, which I also changed to use the better method, even though
there wasn't really much risk of failure there.  Also improve the comments
in guc.c to reflect this additional usage.
2011-10-05 20:44:16 -04:00
Tom Lane 41e461d36f Improve define_custom_variable's handling of pre-existing settings.
Arrange for any problems with pre-existing settings to be reported as
WARNING not ERROR, so that we don't undesirably abort the loading of the
incoming add-on module.  The bad setting is just discarded, as though it
had never been applied at all.  (This requires a change in the API of
set_config_option.  After some thought I decided the most potentially
useful addition was to allow callers to just pass in a desired elevel.)

Arrange to restore the complete stacked state of the variable, rather than
cheesily reinstalling only the active value.  This ensures that custom GUCs
will behave unsurprisingly even when the module loading operation occurs
within nested subtransactions that have changed the active value.  Since a
module load could occur as a result of, eg, a PL function call, this is not
an unlikely scenario.
2011-10-04 19:57:21 -04:00
Tom Lane 9f5836d224 Remember the source GucContext for each GUC parameter.
We used to just remember the GucSource, but saving GucContext too provides
a little more information --- notably, whether a SET was done by a
superuser or regular user.  This allows us to rip out the fairly dodgy code
that define_custom_variable used to use to try to infer the context to
re-install a pre-existing setting with.  In particular, it now works for
a superuser to SET a extension's SUSET custom variable before loading the
associated extension, because GUC can remember whether the SET was done as
a superuser or not.  The plperl regression tests contain an example where
this is useful.
2011-10-04 16:13:50 -04:00
Tom Lane d56b3afc03 Restructure error handling in reading of postgresql.conf.
This patch has two distinct purposes: to report multiple problems in
postgresql.conf rather than always bailing out after the first one,
and to change the policy for whether changes are applied when there are
unrelated errors in postgresql.conf.

Formerly the policy was to apply no changes if any errors could be
detected, but that had a significant consistency problem, because in some
cases specific values might be seen as valid by some processes but invalid
by others.  This meant that the latter processes would fail to adopt
changes in other parameters even though the former processes had done so.

The new policy is that during SIGHUP, the file is rejected as a whole
if there are any errors in the "name = value" syntax, or if any lines
attempt to set nonexistent built-in parameters, or if any lines attempt
to set custom parameters whose prefix is not listed in (the new value of)
custom_variable_classes.  These tests should always give the same results
in all processes, and provide what seems a reasonably robust defense
against loading values from badly corrupted config files.  If these tests
pass, all processes will apply all settings that they individually see as
good, ignoring (but logging) any they don't.

In addition, the postmaster does not abandon reading a configuration file
after the first syntax error, but continues to read the file and report
syntax errors (up to a maximum of 100 syntax errors per file).

The postmaster will still refuse to start up if the configuration file
contains any errors at startup time, but these changes allow multiple
errors to be detected and reported before quitting.

Alexey Klyukin, reviewed by Andy Colson and av (Alexander ?)
with some additional hacking by Tom Lane
2011-10-02 16:50:04 -04:00
Tom Lane 57eb009092 Allow snapshot references to still work during transaction abort.
In REPEATABLE READ (nee SERIALIZABLE) mode, an attempt to do
GetTransactionSnapshot() between AbortTransaction and CleanupTransaction
failed, because GetTransactionSnapshot would recompute the transaction
snapshot (which is already wrong, given the isolation mode) and then
re-register it in the TopTransactionResourceOwner, leading to an Assert
because the TopTransactionResourceOwner should be empty of resources after
AbortTransaction.  This is the root cause of bug #6218 from Yamamoto
Takashi.  While changing plancache.c to avoid requesting a snapshot when
handling a ROLLBACK masks the problem, I think this is really a snapmgr.c
bug: it's lower-level than the resource manager mechanism and should not be
shutting itself down before we unwind resource manager resources.  However,
just postponing the release of the transaction snapshot until cleanup time
didn't work because of the circular dependency with
TopTransactionResourceOwner.  Fix by managing the internal reference to
that snapshot manually instead of depending on TopTransactionResourceOwner.
This saves a few cycles as well as making the module layering more
straightforward.  predicate.c's dependencies on TopTransactionResourceOwner
go away too.

I think this is a longstanding bug, but there's no evidence that it's more
than a latent bug, so it doesn't seem worth any risk of back-patching.
2011-09-26 22:25:28 -04:00
Tom Lane e6faf910d7 Redesign the plancache mechanism for more flexibility and efficiency.
Rewrite plancache.c so that a "cached plan" (which is rather a misnomer
at this point) can support generation of custom, parameter-value-dependent
plans, and can make an intelligent choice between using custom plans and
the traditional generic-plan approach.  The specific choice algorithm
implemented here can probably be improved in future, but this commit is
all about getting the mechanism in place, not the policy.

In addition, restructure the API to greatly reduce the amount of extraneous
data copying needed.  The main compromise needed to make that possible was
to split the initial creation of a CachedPlanSource into two steps.  It's
worth noting in particular that SPI_saveplan is now deprecated in favor of
SPI_keepplan, which accomplishes the same end result with zero data
copying, and no need to then spend even more cycles throwing away the
original SPIPlan.  The risk of long-term memory leaks while manipulating
SPIPlans has also been greatly reduced.  Most of this improvement is based
on use of the recently-added MemoryContextSetParent primitive.
2011-09-16 00:43:52 -04:00
Tom Lane b0025bd957 Invent a new memory context primitive, MemoryContextSetParent.
This function will be useful for altering the lifespan of a context after
creation (for example, by creating it under a transient context and later
reparenting it to belong to a long-lived context).  It costs almost no new
code, since we can refactor what was there.  Per my proposal of yesterday.
2011-09-11 16:29:42 -04:00
Tom Lane ca4af308c3 Simplify handling of the timezone GUC by making initdb choose the default.
We were doing some amazingly complicated things in order to avoid running
the very expensive identify_system_timezone() procedure during GUC
initialization.  But there is an obvious fix for that, which is to do it
once during initdb and have initdb install the system-specific default into
postgresql.conf, as it already does for most other GUC variables that need
system-environment-dependent defaults.  This means that the timezone (and
log_timezone) settings no longer have any magic behavior in the server.
Per discussion.
2011-09-09 17:59:11 -04:00
Tom Lane a7801b62f2 Move Timestamp/Interval typedefs and basic macros into datatype/timestamp.h.
As per my recent proposal, this refactors things so that these typedefs and
macros are available in a header that can be included in frontend-ish code.
I also changed various headers that were undesirably including
utils/timestamp.h to include datatype/timestamp.h instead.  Unsurprisingly,
this showed that half the system was getting utils/timestamp.h by way of
xlog.h.

No actual code changes here, just header refactoring.
2011-09-09 13:23:41 -04:00
Tom Lane 4c2777d0b7 Change get_variable_numdistinct's API to flag default estimates explicitly.
Formerly, callers tested for DEFAULT_NUM_DISTINCT, which had the problem
that a perfectly solid estimate might be mistaken for a content-free
default.
2011-09-04 15:41:49 -04:00
Bruce Momjian 6416a82a62 Remove unnecessary #include references, per pgrminclude script. 2011-09-01 10:04:27 -04:00
Tom Lane b5282aa893 Revise sinval code to remove no-longer-used tuple TID from inval messages.
This requires adjusting the API for syscache callback functions: they now
get a hash value, not a TID, to identify the target tuple.  Most of them
weren't paying any attention to that argument anyway, but plancache did
require a small amount of fixing.

Also, improve performance a trifle by avoiding sending duplicate inval
messages when a heap_update isn't changing the catcache lookup columns.
2011-08-16 19:27:46 -04:00
Tom Lane 2ada6779c5 Fix race condition in relcache init file invalidation.
The previous code tried to synchronize by unlinking the init file twice,
but that doesn't actually work: it leaves a window wherein a third process
could read the already-stale init file but miss the SI messages that would
tell it the data is stale.  The result would be bizarre failures in catalog
accesses, typically "could not read block 0 in file ..." later during
startup.

Instead, hold RelCacheInitLock across both the unlink and the sending of
the SI messages.  This is more straightforward, and might even be a bit
faster since only one unlink call is needed.

This has been wrong since it was put in (in 2002!), so back-patch to all
supported releases.
2011-08-16 13:11:54 -04:00
Bruce Momjian 6d7bd5dec9 Make USECS_PER_* timestamp macros visible even when we are not using
integer timestamps.
2011-08-12 21:32:19 -04:00
Tom Lane cacd42d62c Rewrite libxml error handling to be more robust.
libxml reports some errors (like invalid xmlns attributes) via the error
handler hook, but still returns a success indicator to the library caller.
This causes us to miss some errors that are important to report.  Since the
"generic" error handler hook doesn't know whether the message it's getting
is for an error, warning, or notice, stop using that and instead start
using the "structured" error handler hook, which gets enough information
to be useful.

While at it, arrange to save and restore the error handler hook setting in
each libxml-using function, rather than assuming we can set and forget the
hook.  This should improve the odds of working nicely with third-party
libraries that also use libxml.

In passing, volatile-ize some local variables that get modified within
PG_TRY blocks.  I noticed this while testing with an older gcc version
than I'd previously tried to compile xml.c with.

Florian Pflug and Tom Lane, with extensive review/testing by Noah Misch
2011-07-20 13:03:49 -04:00
Simon Riggs 4bd8ed31b7 Introduce sending servers as new category for replication params
Fujii Masao
2011-07-19 08:59:55 +01:00
Robert Haas 367bc426a1 Avoid index rebuild for no-rewrite ALTER TABLE .. ALTER TYPE.
Noah Misch.  Review and minor cosmetic changes by me.
2011-07-18 11:04:43 -04:00
Tom Lane 23e5b16c71 Add temp_file_limit GUC parameter to constrain temporary file space usage.
The limit is enforced against the total amount of temp file space used by
each session.

Mark Kirkwood, reviewed by Cédric Villemain and Tatsuo Ishii
2011-07-17 14:19:31 -04:00
Tom Lane ed7ed76712 Add an errdetail_internal() ereport auxiliary routine.
This function supports untranslated detail messages, in the same way that
errmsg_internal supports untranslated primary messages.  We've needed this
for some time IMO, but discussion of some cases in the SSI code provided
the impetus to actually add it.

Kevin Grittner, with minor adjustments by me
2011-07-16 14:22:15 -04:00
Tom Lane 9d522cb35d Fix another oversight in logging of changes in postgresql.conf settings.
We were using GetConfigOption to collect the old value of each setting,
overlooking the possibility that it didn't exist yet.  This does happen
in the case of adding a new entry within a custom variable class, as
exhibited in bug #6097 from Maxim Boguk.

To fix, add a missing_ok parameter to GetConfigOption, but only in 9.1
and HEAD --- it seems possible that some third-party code is using that
function, so changing its API in a minor release would cause problems.
In 9.0, create a near-duplicate function instead.
2011-07-08 17:02:58 -04:00
Tom Lane 60a81ad133 Reclassify replication-related GUC variables as "master" and "standby".
Per discussion, this structure seems more understandable than what was
there before.  Make config.sgml and postgresql.conf.sample agree.

In passing do a bit of editorial work on the variable descriptions.
2011-07-07 15:11:41 -04:00
Tom Lane 14f67192c2 Remove assumptions that not-equals operators cannot be in any opclass.
get_op_btree_interpretation assumed this in order to save some duplication
of code, but it's not true in general anymore because we added <> support
to btree_gist.  (We still assume it for btree opclasses, though.)

Also, essentially the same logic was baked into predtest.c.  Get rid of
that duplication by generalizing get_op_btree_interpretation so that it
can be used by predtest.c.

Per bug report from Denis de Bernardy and investigation by Jeff Davis,
though I didn't use Jeff's patch exactly as-is.

Back-patch to 9.1; we do not support this usage before that.
2011-07-06 14:53:16 -04:00
Alvaro Herrera b93f5a5673 Move Trigger and TriggerDesc structs out of rel.h into a new reltrigger.h
This lets us stop including rel.h into execnodes.h, which is a widely
used header.
2011-07-04 14:35:58 -04:00
Robert Haas 8f9fe6edce Add notion of a "transform function" that can simplify function calls.
Initially, we use this only to eliminate calls to the varchar()
function in cases where the length is not being reduced and, therefore,
the function call is equivalent to a RelabelType operation.  The most
significant effect of this is that we can avoid a table rewrite when
changing a varchar(X) column to a varchar(Y) column, where Y > X.

Noah Misch, reviewed by me and Alexey Klyukin
2011-06-21 22:21:24 -04:00
Bruce Momjian 6560407c7d Pgindent run before 9.1 beta2. 2011-06-09 14:32:50 -04:00
Tom Lane ea8e42f3a0 Fix failure to check whether a rowtype's component types are sortable.
The existence of a btree opclass accepting composite types caused us to
assume that every composite type is sortable.  This isn't true of course;
we need to check if the column types are all sortable.  There was logic
for this for the case of array comparison (ie, check that the element
type is sortable), but we missed the point for rowtypes.  Per Teodor's
report of an ANALYZE failure for an unsortable composite type.

Rather than just add some more ad-hoc logic for this, I moved knowledge of
the issue into typcache.c.  The typcache will now only report out array_eq,
record_cmp, and friends as usable operators if the array or composite type
will work with those functions.

Unfortunately we don't have enough info to do this for anonymous RECORD
types; in that case, just assume it will work, and take the runtime failure
as before if it doesn't.

This patch might be a candidate for back-patching at some point, but
given the lack of complaints from the field, I'd rather just test it in
HEAD for now.

Note: most of the places touched in this patch will need further work
when we get around to supporting hashing of record types.
2011-06-03 15:39:17 -04:00
Tom Lane e05b866447 Split PGC_S_DEFAULT into two values, for true boot_val vs computed default.
Failure to distinguish these cases is the real cause behind the recent
reports of Windows builds crashing on 'infinity'::timestamp, which was
directly due to failure to establish a value of timezone_abbreviations
in postmaster child processes.  The postmaster had the desired value,
but write_one_nondefault_variable() didn't transmit it to backends.

To fix that, invent a new value PGC_S_DYNAMIC_DEFAULT, and be sure to use
that or PGC_S_ENV_VAR (as appropriate) for "default" settings that are
computed during initialization.  (We need both because there's at least
one variable that could receive a value from either source.)

This commit also fixes ProcessConfigFile's failure to restore the correct
default value for certain GUC variables if they are set in postgresql.conf
and then removed/commented out of the file.  We have to recompute and
reinstall the value for any GUC variable that could have received a value
from PGC_S_DYNAMIC_DEFAULT or PGC_S_ENV_VAR sources, and there were a
number of oversights.  (That whole thing is a crock that needs to be
redesigned, but not today.)

However, I intentionally didn't make it work "exactly right" for the cases
of timezone and log_timezone.  The exactly right behavior would involve
running select_default_timezone, which we'd have to do independently in
each postgres process, causing the whole database to become entirely
unresponsive for as much as several seconds.  That didn't seem like a good
idea, especially since the variable's removal from postgresql.conf might be
just an accidental edit.  Instead the behavior is to adopt the previously
active setting as if it were default.

Note that this patch creates an ABI break for extensions that use any of
the PGC_S_XXX constants; they'll need to be recompiled.
2011-05-11 19:57:38 -04:00
Andrew Dunstan c02d5b7c27 Use a macro variable PG_PRINTF_ATTRIBUTE for the style used for checking printf type functions.
The style is set to "printf" for backwards compatibility everywhere except
on Windows, where it is set to "gnu_printf", which eliminates hundreds of
false error messages from modern versions of gcc arising from  %m and %ll{d,u}
formats.
2011-04-28 10:56:14 -04:00
Robert Haas be90032e0d Remove partial and undocumented GRANT .. FOREIGN TABLE support.
Instead, foreign tables are treated just like views: permissions can
be granted using GRANT privilege ON [TABLE] foreign_table_name TO role,
and revoked similarly.  GRANT/REVOKE .. FOREIGN TABLE is no longer
supported, just as we don't support GRANT/REVOKE .. VIEW.  The set of
accepted permissions for foreign tables is now identical to the set for
regular tables, and views.

Per report from Thom Brown, and subsequent discussion.
2011-04-25 16:39:18 -04:00
Tom Lane 2ab0796d7a Fix char2wchar/wchar2char to support collations properly.
These functions should take a pg_locale_t, not a collation OID, and should
call mbstowcs_l/wcstombs_l where available.  Where those functions are not
available, temporarily select the correct locale with uselocale().

This change removes the bogus assumption that all locales selectable in
a given database have the same wide-character conversion method; in
particular, the collate.linux.utf8 regression test now passes with
LC_CTYPE=C, so long as the database encoding is UTF8.

I decided to move the char2wchar/wchar2char functions out of mbutils.c and
into pg_locale.c, because they work on wchar_t not pg_wchar_t and thus
don't really belong with the mbutils.c functions.  Keeping them where they
were would have required importing pg_locale_t into pg_wchar.h somehow,
which did not seem like a good plan.
2011-04-23 12:35:41 -04:00
Tom Lane d64713df7e Pass collations to functions in FunctionCallInfoData, not FmgrInfo.
Since collation is effectively an argument, not a property of the function,
FmgrInfo is really the wrong place for it; and this becomes critical in
cases where a cached FmgrInfo is used for varying purposes that might need
different collation settings.  Fix by passing it in FunctionCallInfoData
instead.  In particular this allows a clean fix for bug #5970 (record_cmp
not working).  This requires touching a bit more code than the original
method, but nobody ever thought that collations would not be an invasive
patch...
2011-04-12 19:19:24 -04:00
Tom Lane 3c381a55b0 Teach pattern_fixed_prefix() about collations.
This is necessary, not optional, now that ILIKE and regexes are collation
aware --- else we might derive a wrong comparison constant for index
optimized pattern matches.
2011-04-11 12:28:28 -04:00
Heikki Linnakangas 7c797e7194 Fix the size of predicate lock manager's shared memory hash tables at creation.
This way they don't compete with the regular lock manager for the slack shared
memory, making the behavior more predictable.
2011-04-11 13:43:31 +03:00
Bruce Momjian bf50caf105 pgindent run before PG 9.1 beta 1. 2011-04-10 11:42:00 -04:00
Tom Lane 2594cf0e8c Revise the API for GUC variable assign hooks.
The previous functions of assign hooks are now split between check hooks
and assign hooks, where the former can fail but the latter shouldn't.
Aside from being conceptually clearer, this approach exposes the
"canonicalized" form of the variable value to guc.c without having to do
an actual assignment.  And that lets us fix the problem recently noted by
Bernd Helmle that the auto-tune patch for wal_buffers resulted in bogus
log messages about "parameter "wal_buffers" cannot be changed without
restarting the server".  There may be some speed advantage too, because
this design lets hook functions avoid re-parsing variable values when
restoring a previous state after a rollback (they can store a pre-parsed
representation of the value instead).  This patch also resolves a
longstanding annoyance about custom error messages from variable assign
hooks: they should modify, not appear separately from, guc.c's own message
about "invalid parameter value".
2011-04-07 00:12:02 -04:00
Robert Haas f5e524d92b Add casts from int4 and int8 to numeric.
Joey Adams, per gripe from Ramanujam.  Review by myself and Tom Lane.
2011-04-05 09:35:43 -04:00
Robert Haas 50533a6dc5 Support comments on FOREIGN DATA WRAPPER and SERVER objects.
This mostly involves making it work with the objectaddress.c framework,
which does most of the heavy lifting.  In that vein, change
GetForeignDataWrapperOidByName to get_foreign_data_wrapper_oid and
GetForeignServerOidByName to get_foreign_server_oid, to match the
pattern we use for other object types.

Robert Haas and Shigeru Hanada
2011-04-01 11:28:28 -04:00
Tom Lane b23c9fa929 Clean up a few failures to set collation fields in expression nodes.
I'm not sure these have any non-cosmetic implications, but I'm not sure
they don't, either.  In particular, ensure the CaseTestExpr generated
by transformAssignmentIndirection to represent the base target column
carries the correct collation, because parse_collate.c won't fix that.
Tweak lsyscache.c API so that we can get the appropriate collation
without an extra syscache lookup.
2011-03-26 14:25:48 -04:00
Bruce Momjian b051a34fd8 Remove duplicate time-based macros recently added. 2011-03-14 10:40:14 -04:00
Tom Lane 696d1f7f06 Make all comparisons done for/with statistics use the default collation.
While this will give wrong answers when estimating selectivity for a
comparison operator that's using a non-default collation, the estimation
error probably won't be large; and anyway the former approach created
estimation errors of its own by trying to use a histogram that might have
been computed with some other collation.  So we'll adopt this simplified
approach for now and perhaps improve it sometime in the future.

This patch incorporates changes from Andres Freund to make sure that
selfuncs.c passes a valid collation OID to any datatype-specific function
it calls, in case that function wants collation information.  Said OID will
now always be DEFAULT_COLLATION_OID, but at least we won't get errors.
2011-03-12 16:30:36 -05:00
Bruce Momjian 3a3f39fdc0 Use macros for time-based constants, rather than constants. 2011-03-12 09:35:56 -05:00
Tom Lane 49a08ca1e9 Adjust the permissions required for COMMENT ON ROLE.
Formerly, any member of a role could change the role's comment, as of
course could superusers; but holders of CREATEROLE privilege could not,
unless they were also members.  This led to the odd situation that a
CREATEROLE holder could create a role but then could not comment on it.
It also seems a bit dubious to let an unprivileged user change his own
comment, let alone those of group roles he belongs to.  So, change the
rule to be "you must be superuser to comment on a superuser role, or
hold CREATEROLE to comment on non-superuser roles".  This is the same
as the privilege check for creating/dropping roles, and thus fits much
better with the rule for other object types, namely that only the owner
of an object can comment on it.

In passing, clean up the documentation for COMMENT a little bit.

Per complaint from Owen Jacobson and subsequent discussion.
2011-03-09 11:28:34 -05:00
Tom Lane 8d3b421f5f Allow non-superusers to create (some) extensions.
Remove the unconditional superuser permissions check in CREATE EXTENSION,
and instead define a "superuser" extension property, which when false
(not the default) skips the superuser permissions check.  In this case
the calling user only needs enough permissions to execute the commands
in the extension's installation script.  The superuser property is also
enforced in the same way for ALTER EXTENSION UPDATE cases.

In other ALTER EXTENSION cases and DROP EXTENSION, test ownership of
the extension rather than superuserness.  ALTER EXTENSION ADD/DROP needs
to insist on ownership of the target object as well; to do that without
duplicating code, refactor comment.c's big switch for permissions checks
into a separate function in objectaddress.c.

I also removed the superuserness checks in pg_available_extensions and
related functions; there's no strong reason why everybody shouldn't
be able to see that info.

Also invent an IF NOT EXISTS variant of CREATE EXTENSION, and use that
in pg_dump, so that dumps won't fail for installed-by-default extensions.
We don't have any of those yet, but we will soon.

This is all per discussion of wrapping the standard procedural languages
into extensions.  I'll make those changes in a separate commit; this is
just putting the core infrastructure in place.
2011-03-04 16:08:53 -05:00
Tom Lane 6252c4f9e2 Run a portal's cleanup hook immediately when pushing it to DONE state.
This works around the problem noted by Yamamoto Takashi in bug #5906,
that there were code paths whereby we could reach AtCleanup_Portals
with a portal's cleanup hook still unexecuted.  The changes I made
a few days ago were intended to prevent that from happening, and
I think that on balance it's still a good thing to avoid, so I don't
want to remove the Assert in AtCleanup_Portals.  Hence do this instead.
2011-03-03 13:04:06 -05:00
Tom Lane c0b0076036 Rearrange snapshot handling to make rule expansion more consistent.
With this patch, portals, SQL functions, and SPI all agree that there
should be only a CommandCounterIncrement between the queries that are
generated from a single SQL command by rule expansion.  Fetching a whole
new snapshot now happens only between original queries.  This is equivalent
to the existing behavior of EXPLAIN ANALYZE, and it was judged to be the
best choice since it eliminates one source of concurrency hazards for
rules.  The patch should also make things marginally faster by reducing the
number of snapshot push/pop operations.

The patch removes pg_parse_and_rewrite(), which is no longer used anywhere.
There was considerable discussion about more aggressive refactoring of the
query-processing functions exported by postgres.c, but for the moment
nothing more has been done there.

I also took the opportunity to refactor snapmgr.c's API slightly: the
former PushUpdatedSnapshot() has been split into two functions.

Marko Tiikkaja, reviewed by Steve Singer and Tom Lane
2011-02-28 23:28:06 -05:00
Tom Lane a874fe7b4c Refactor the executor's API to support data-modifying CTEs better.
The originally committed patch for modifying CTEs didn't interact well
with EXPLAIN, as noted by myself, and also had corner-case problems with
triggers, as noted by Dean Rasheed.  Those problems show it is really not
practical for ExecutorEnd to call any user-defined code; so split the
cleanup duties out into a new function ExecutorFinish, which must be called
between the last ExecutorRun call and ExecutorEnd.  Some Asserts have been
added to these functions to help verify correct usage.

It is no longer necessary for callers of the executor to call
AfterTriggerBeginQuery/AfterTriggerEndQuery for themselves, as this is now
done by ExecutorStart/ExecutorFinish respectively.  If you really need to
suppress that and do it for yourself, pass EXEC_FLAG_SKIP_TRIGGERS to
ExecutorStart.

Also, refactor portal commit processing to allow for the possibility that
PortalDrop will invoke user-defined code.  I think this is not actually
necessary just yet, since the portal-execution-strategy logic forces any
non-pure-SELECT query to be run to completion before we will consider
committing.  But it seems like good future-proofing.
2011-02-27 13:44:12 -05:00
Tom Lane 389af95155 Support data-modifying commands (INSERT/UPDATE/DELETE) in WITH.
This patch implements data-modifying WITH queries according to the
semantics that the updates all happen with the same command counter value,
and in an unspecified order.  Therefore one WITH clause can't see the
effects of another, nor can the outer query see the effects other than
through the RETURNING values.  And attempts to do conflicting updates will
have unpredictable results.  We'll need to document all that.

This commit just fixes the code; documentation updates are waiting on
author.

Marko Tiikkaja and Hitoshi Harada
2011-02-25 18:58:02 -05:00
Peter Eisentraut 1c51c7d5ff Add PL/Python functions for quoting strings
Add functions plpy.quote_ident, plpy.quote_literal,
plpy.quote_nullable, which wrap the equivalent SQL functions.

To be able to propagate char * constness properly, make the argument
of quote_literal_cstr() const char *.  This also makes it more
consistent with quote_identifier().

Jan Urbański, reviewed by Hitoshi Harada, some refinements by Peter
Eisentraut
2011-02-22 23:41:23 +02:00
Tom Lane 1ab9b012bd Allow binary I/O of type "void".
void_send is useful for the same reason that void_out doesn't throw error,
namely that someone might do "select void_returning_func(...)"  from a
client that prefers to operate in binary mode.  The void_recv function may
or may not have any practical use, but we provide it for symmetry.

Radosław Smogura
2011-02-22 13:08:22 -05:00
Tom Lane 327e025071 Create the catalog infrastructure for foreign-data-wrapper handlers.
Add a fdwhandler column to pg_foreign_data_wrapper, plus HANDLER options
in the CREATE FOREIGN DATA WRAPPER and ALTER FOREIGN DATA WRAPPER commands,
plus pg_dump support for same.  Also invent a new pseudotype fdw_handler
with properties similar to language_handler.

This is split out of the "FDW API" patch for ease of review; it's all stuff
we will certainly need, regardless of any other details of the FDW API.
FDW handler functions will not actually get called yet.

In passing, fix some omissions and infelicities in foreigncmds.c.

Shigeru Hanada, Jan Urbanski, Heikki Linnakangas
2011-02-19 00:07:15 -05:00
Itagaki Takahiro 62c7bd31c8 Add transaction-level advisory locks.
They share the same locking namespace with the existing session-level
advisory locks, but they are automatically released at the end of the
current transaction and cannot be released explicitly via unlock
functions.

Marko Tiikkaja, reviewed by me.
2011-02-18 14:05:12 +09:00
Tom Lane 6595dd04d1 Add backwards-compatible declarations of some core GIN support functions.
These are needed to support reloading dumps of 9.0 installations containing
contrib/intarray or contrib/tsearch2.  Since not only regular dump/reload
but binary upgrade would fail, it seems worth the trouble to carry these
stubs for awhile.  Note that the contrib opclasses referencing these
functions will still work fine, since GIN doesn't actually pay any
attention to the declared signature of a support function.
2011-02-16 17:24:46 -05:00
Tom Lane 6e02755b22 Add FOREACH IN ARRAY looping to plpgsql.
(I'm not entirely sure that we've finished bikeshedding the syntax details,
but the functionality seems OK.)

Pavel Stehule, reviewed by Stephen Frost and Tom Lane
2011-02-16 01:53:03 -05:00
Tom Lane 887dd041a6 Fix obsolete comment.
Comment about MaxAllocSize was not updated when the TOAST-header macros
were replaced in 8.3 "varvarlena" changes.  Per report from Frederik Ramm.
2011-02-15 13:27:54 -05:00
Tom Lane 555353c0c5 Rearrange extension-related views as per recent discussion.
The original design of pg_available_extensions did not consider the
possibility of version-specific control files.  Split it into two views:
pg_available_extensions shows information that is generic about an
extension, while pg_available_extension_versions shows all available
versions together with information that could be version-dependent.
Also, add an SRF pg_extension_update_paths() to assist in checking that
a collection of update scripts provide sane update path sequences.
2011-02-14 19:22:36 -05:00
Peter Eisentraut b313bca0af DDL support for collations
- collowner field
- CREATE COLLATION
- ALTER COLLATION
- DROP COLLATION
- COMMENT ON COLLATION
- integration with extensions
- pg_dump support for the above
- dependency management
- psql tab completion
- psql \dO command
2011-02-12 15:55:18 +02:00
Tom Lane d9572c4e3b Core support for "extensions", which are packages of SQL objects.
This patch adds the server infrastructure to support extensions.
There is still one significant loose end, namely how to make it play nice
with pg_upgrade, so I am not yet committing the changes that would make
all the contrib modules depend on this feature.

In passing, fix a disturbingly large amount of breakage in
AlterObjectNamespace() and callers.

Dimitri Fontaine, reviewed by Anssi Kääriäinen,
Itagaki Takahiro, Tom Lane, and numerous others
2011-02-08 16:13:22 -05:00
Peter Eisentraut 414c5a2ea6 Per-column collation support
This adds collation support for columns and domains, a COLLATE clause
to override it per expression, and B-tree index support.

Peter Eisentraut
reviewed by Pavel Stehule, Itagaki Takahiro, Robert Haas, Noah Misch
2011-02-08 23:04:18 +02:00
Robert Haas ddfe26f644 Avoid maintaining three separate copies of the error codes list.
src/pl/plpgsql/src/plerrcodes.h, src/include/utils/errcodes.h, and a
big chunk of errcodes.sgml are now automatically generated from a single
file, src/backend/utils/errcodes.txt.

Jan Urbański, reviewed by Tom Lane.
2011-02-03 22:32:49 -05:00
Simon Riggs 56b21b7ae3 Re-classify ERRCODE_DATABASE_DROPPED to 57P04 2011-02-01 08:44:01 +00:00
Simon Riggs 9e95c9ad55 Create new errcode for recovery conflict caused by db drop on master.
Previously reported as ERRCODE_ADMIN_SHUTDOWN, this case is now
reported as ERRCODE_T_R_DATABASE_DROPPED. No message text change.
Unlikely to happen on most servers, so low impact change to allow
session poolers to correctly handle this situation.

Tatsuo Ishii, edits by me, review by Robert Haas
2011-02-01 00:20:53 +00:00
Robert Haas fb4c5d2798 Code cleanup for assign_XactIsoLevel.
The new coding avoids a spurious debug message when a transaction
that has changed the isolation level has been rolled back.  It also
allows the property to be freely changed to the current value within
a subtransaction.

Kevin Grittner, with one small change by me.
2011-01-21 21:49:19 -05:00
Tom Lane adf328c0e1 Add array_contains_nulls() function in arrayfuncs.c.
This will support fixing contrib/intarray (and probably other places)
so that they don't have to fail on arrays that contain a null bitmap
but no live null entries.
2011-01-08 20:26:14 -05:00
Tom Lane 73912e7fbd Fix GIN to support null keys, empty and null items, and full index scans.
Per my recent proposal(s).  Null key datums can now be returned by
extractValue and extractQuery functions, and will be stored in the index.
Also, placeholder entries are made for indexable items that are NULL or
contain no keys according to extractValue.  This means that the index is
now always complete, having at least one entry for every indexed heap TID,
and so we can get rid of the prohibition on full-index scans.  A full-index
scan is implemented much the same way as partial-match scans were already:
we build a bitmap representing all the TIDs found in the index, and then
drive the results off that.

Also, introduce a concept of a "search mode" that can be requested by
extractQuery when the operator requires matching to empty items (this is
just as cheap as matching to a single key) or requires a full index scan
(which is not so cheap, but it sure beats failing or giving wrong answers).
The behavior remains backward compatible for opclasses that don't return
any null keys or request a non-default search mode.

Using these features, we can now make the GIN index opclass for anyarray
behave in a way that matches the actual anyarray operators for &&, <@, @>,
and = ... which it failed to do before in assorted corner cases.

This commit fixes the core GIN code and ginarrayprocs.c, updates the
documentation, and adds some simple regression test cases for the new
behaviors using the array operators.  The tsearch and contrib GIN opclass
support functions still need to be looked over and probably fixed.

Another thing I intend to fix separately is that this is pretty inefficient
for cases where more than one scan condition needs a full-index search:
we'll run duplicate GinScanEntrys, each one of which builds a large bitmap.
There is some existing logic to merge duplicate GinScanEntrys but it needs
refactoring to make it work for entries belonging to different scan keys.

Note that most of gin.h has been split out into a new file gin_private.h,
so that gin.h doesn't export anything that's not supposed to be used by GIN
opclasses or the rest of the backend.  I did quite a bit of other code
beautification work as well, mostly fixing comments and choosing more
appropriate names for things.
2011-01-07 19:16:24 -05:00
Robert Haas 0d692a0dc9 Basic foreign table support.
Foreign tables are a core component of SQL/MED.  This commit does
not provide a working SQL/MED infrastructure, because foreign tables
cannot yet be queried.  Support for foreign table scans will need to
be added in a future patch.  However, this patch creates the necessary
system catalog structure, syntax support, and support for ancillary
operations such as COMMENT and SECURITY LABEL.

Shigeru Hanada, heavily revised by Robert Haas
2011-01-01 23:48:11 -05:00
Bruce Momjian 5d950e3b0c Stamp copyrights for year 2011. 2011-01-01 13:18:15 -05:00
Robert Haas 53dbc27c62 Support unlogged tables.
The contents of an unlogged table are WAL-logged; thus, they are not
available on standby servers and are truncated whenever the database
system enters recovery.  Indexes on unlogged tables are also unlogged.
Unlogged GiST indexes are not currently supported.
2010-12-29 06:48:53 -05:00
Tom Lane f2ba1e994c Avoid unexpected conversion overflow in planner for distant date values.
The "date" type supports a wider range of dates than int64 timestamps do.
However, there is pre-int64-timestamp code in the planner that assumes that
all date values can be converted to timestamp with impunity.  Fortunately,
what we really need out of the conversion is always a double (float8)
value; so even when the date is out of timestamp's range it's possible to
produce a sane answer.  All we need is a code path that doesn't try to
force the result into int64.  Per trouble report from David Rericha.

Back-patch to all supported versions.  Although this is surely a corner
case, there's not much point in advertising a date range wider than
timestamp's if we will choke on such values in unexpected places.
2010-12-28 22:49:57 -05:00
Tom Lane 84fc571395 Rename the C functions bitand(), bitor() to bit_and(), bit_or().
This is to avoid use of the C++ keywords "bitand" and "bitor" in
the header file utils/varbit.h.  Note the functions' SQL-level
names are not changed, only their C-level names.

In passing, make some comments in varbit.c conform to project-standard
layout.
2010-12-27 14:57:41 -05:00
Robert Haas 63676ebff4 Corrections to patch adding SQL/MED error codes.
My previous commit, 85cff3ce7f on
2010-12-25, failed to update errcodes.sgml or plerrcodes.h.  This patch
corrects that oversight, per a gripe from Tom Lane, and also corrects
a typographical error.
2010-12-26 21:35:25 -05:00
Robert Haas 85cff3ce7f Add foreign data wrapper error code values for SQL/MED.
Extracted from a much larger patch by Shigeru Hanada.
2010-12-25 13:57:39 -05:00
Itagaki Takahiro 03db44eae3 Add pg_read_binary_file() and whole-file-at-once versions of pg_read_file().
One of the usages of the binary version is to read files in a different
encoding from the server encoding.

Dimitri Fontaine and Itagaki Takahiro.
2010-12-16 06:56:28 +09:00
Robert Haas 5f7b58fad8 Generalize concept of temporary relations to "relation persistence".
This commit replaces pg_class.relistemp with pg_class.relpersistence;
and also modifies the RangeVar node type to carry relpersistence rather
than istemp.  It also removes removes rd_istemp from RelationData and
instead performs the correct computation based on relpersistence.

For clarity, we add three new macros: RelationNeedsWAL(),
RelationUsesLocalBuffers(), and RelationUsesTempNamespace(), so that we
can clarify the purpose of each check that previous depended on
rd_istemp.

This is intended as infrastructure for the upcoming unlogged tables
patch, as well as for future possible work on global temporary tables.
2010-12-13 12:34:26 -05:00
Tom Lane 554506871b KNNGIST, otherwise known as order-by-operator support for GIST.
This commit represents a rather heavily editorialized version of
Teodor's builtin_knngist_itself-0.8.2 and builtin_knngist_proc-0.8.1
patches.  I redid the opclass API to add a separate Distance method
instead of turning the Consistent method into an illogical mess,
fixed some bit-rot in the rbtree interfaces, and generally worked over
the code style and comments.

There's still no non-code documentation to speak of, but I'll work on
that separately.  Some contrib-module changes are also yet to come
(right now, point <-> point is the only KNN-ified operator).

Teodor Sigaev and Tom Lane
2010-12-03 20:53:29 -05:00
Robert Haas 970a18687f Use GUC lexer for recovery.conf parsing.
This eliminates some crufty, special-purpose code and, as a non-trivial
side benefit, allows recovery.conf parameters to be unquoted.

Dimitri Fontaine, with review and cleanup by Alvaro Herrera, Itagaki
Takahiro, and me.
2010-12-03 08:56:44 -05:00
Tom Lane d583f10b7e Create core infrastructure for KNNGIST.
This is a heavily revised version of builtin_knngist_core-0.9.  The
ordering operators are no longer mixed in with actual quals, which would
have confused not only humans but significant parts of the planner.
Instead, ordering operators are carried separately throughout planning and
execution.

Since the API for ambeginscan and amrescan functions had to be changed
anyway, this commit takes the opportunity to rationalize that a bit.
RelationGetIndexScan no longer forces a premature index_rescan call;
instead, callers of index_beginscan must call index_rescan too.  Aside from
making the AM-side initialization logic a bit less peculiar, this has the
advantage that we do not make a useless extra am_rescan call when there are
runtime key values.  AMs formerly could not assume that the key values
passed to amrescan were actually valid; now they can.

Teodor Sigaev and Tom Lane
2010-12-02 20:51:37 -05:00
Tom Lane c0b5fac701 Simplify and speed up mapping of index opfamilies to pathkeys.
Formerly we looked up the operators associated with each index (caching
them in relcache) and then the planner looked up the btree opfamily
containing such operators in order to build the btree-centric pathkey
representation that describes the index's sort order.  This is quite
pointless for btree indexes: we might as well just use the index's opfamily
information directly.  That saves syscache lookup cycles during planning,
and furthermore allows us to eliminate the relcache's caching of operators
altogether, which may help in reducing backend startup time.

I added code to plancat.c to perform the same type of double lookup
on-the-fly if it's ever faced with a non-btree amcanorder index AM.
If such a thing actually becomes interesting for production, we should
replace that logic with some more-direct method for identifying the
corresponding btree opfamily; but it's not worth spending effort on now.

There is considerably more to do pursuant to my recent proposal to get rid
of sort-operator-based representations of sort orderings, but this patch
grabs some of the low-hanging fruit.  I'll look at the remainder of that
work after the current commitfest.
2010-11-29 12:30:43 -05:00
Bruce Momjian ba11258ccb When reporting the server as not responding, if the hostname was
supplied, also print the IP address.  This allows IPv4 and IPv6 failures
to be distinguished.  Also useful when a hostname resolves to multiple
IP addresses.

Also, remove use of inet_ntoa() and use our own inet_net_ntop() in all
places, including in libpq, because it is thread-safe.
2010-11-24 17:04:19 -05:00
Robert Haas 7504870778 Add new SQL function, format(text).
Currently, three conversion format specifiers are supported: %s for a
string, %L for an SQL literal, and %I for an SQL identifier.  The latter
two are deliberately designed not to overlap with what sprintf() already
supports, in case we want to add more of sprintf()'s functionality here
later.

Patch by Pavel Stehule, heavily revised by me.  Reviewed by Jeff Janes
and, in earlier versions, by Itagaki Takahiro and Tom Lane.
2010-11-20 22:33:27 -05:00
Robert Haas 4343c0e546 Expose quote_literal_cstr() from core.
This eliminates the need for inefficient implementions of this
functionality in both contrib/dblink and contrib/tablefunc, so remove
them.  The upcoming patch implementing an in-core format() function
will also require this functionality.

In passing, add some regression tests.
2010-11-20 10:04:48 -05:00
Robert Haas 4fc115b2e9 Speed up conversion of signed integers to C strings.
A hand-coded implementation turns out to be much faster than calling
printf().  In passing, add a few more regresion tests.

Andres Freund, with assorted, mostly cosmetic changes.
2010-11-19 22:13:11 -05:00
Alvaro Herrera 6cc2deb86e Add pg_describe_object function
This function is useful to obtain textual descriptions of objects as
stored in pg_depend.
2010-11-18 17:06:19 -03:00
Tom Lane 186cbbda8f Provide hashing support for arrays.
The core of this patch is hash_array() and associated typcache
infrastructure, which works just about exactly like the existing support
for array comparison.

In addition I did some work to ensure that the planner won't think that an
array type is hashable unless its element type is hashable, and similarly
for sorting.  This includes adding a datatype parameter to op_hashjoinable
and op_mergejoinable, and adding an explicit "hashable" flag to
SortGroupClause.  The lack of a cross-check on the element type was a
pre-existing bug in mergejoin support --- but it didn't matter so much
before, because if you couldn't sort the element type there wasn't any good
alternative to failing anyhow.  Now that we have the alternative of hashing
the array type, there are cases where we can avoid a failure by being picky
at the planner stage, so it's time to be picky.

The issue of exactly how to combine the per-element hash values to produce
an array hash is still open for discussion, but the rest of this is pretty
solid, so I'll commit it as-is.
2010-10-30 21:56:11 -04:00
Tom Lane 84c123be1d Allow new values to be added to an existing enum type.
After much expenditure of effort, we've got this to the point where the
performance penalty is pretty minimal in typical cases.

Andrew Dunstan, reviewed by Brendan Jurd, Dean Rasheed, and Tom Lane
2010-10-24 23:05:41 -04:00
Tom Lane 529cb267a6 Improve handling of domains over arrays.
This patch eliminates various bizarre behaviors caused by sloppy thinking
about the difference between a domain type and its underlying array type.
In particular, the operation of updating one element of such an array
has to be considered as yielding a value of the underlying array type,
*not* a value of the domain, because there's no assurance that the
domain's CHECK constraints are still satisfied.  If we're intending to
store the result back into a domain column, we have to re-cast to the
domain type so that constraints are re-checked.

For similar reasons, such a domain can't be blindly matched to an ANYARRAY
polymorphic parameter, because the polymorphic function is likely to apply
array-ish operations that could invalidate the domain constraints.  For the
moment, we just forbid such matching.  We might later wish to insert an
automatic downcast to the underlying array type, but such a change should
also change matching of domains to ANYELEMENT for consistency.

To ensure that all such logic is rechecked, this patch removes the original
hack of setting a domain's pg_type.typelem field to match its base type;
the typelem will always be zero instead.  In those places where it's really
okay to look through the domain type with no other logic changes, use the
newly added get_base_element_type function in place of get_element_type.
catversion bumped due to change in pg_type contents.

Per bug #5717 from Richard Huxton and subsequent discussion.
2010-10-21 16:07:17 -04:00
Tom Lane 2ec993a7cb Support triggers on views.
This patch adds the SQL-standard concept of an INSTEAD OF trigger, which
is fired instead of performing a physical insert/update/delete.  The
trigger function is passed the entire old and/or new rows of the view,
and must figure out what to do to the underlying tables to implement
the update.  So this feature can be used to implement updatable views
using trigger programming style rather than rule hacking.

In passing, this patch corrects the names of some columns in the
information_schema.triggers view.  It seems the SQL committee renamed
them somewhere between SQL:99 and SQL:2003.

Dean Rasheed, reviewed by Bernd Helmle; some additional hacking by me.
2010-10-10 13:45:07 -04:00
Tom Lane 3ba11d3df2 Teach CLUSTER to use seqscan-and-sort when it's faster than indexscan.
... or at least, when the planner's cost estimates say it will be faster.

Leonardo Francalanci, reviewed by Itagaki Takahiro and Tom Lane
2010-10-07 20:00:28 -04:00
Magnus Hagander fe9b36fd59 Convert cvsignore to gitignore, and add .gitignore for build targets. 2010-09-22 12:57:04 +02:00
Magnus Hagander 9f2e211386 Remove cvs keywords from all files. 2010-09-20 22:08:53 +02:00
Tom Lane 303696c3b4 Install a data-type-based solution for protecting pg_get_expr().
Since the code underlying pg_get_expr() is not secure against malformed
input, and can't practically be made so, we need to prevent miscreants
from feeding arbitrary data to it.  We can do this securely by declaring
pg_get_expr() to take a new datatype "pg_node_tree" and declaring the
system catalog columns that hold nodeToString output to be of that type.
There is no way at SQL level to create a non-null value of type pg_node_tree.
Since the backend-internal operations that fill those catalog columns
operate below the SQL level, they are oblivious to the datatype relabeling
and don't need any changes.
2010-09-03 01:34:55 +00:00
Tom Lane 9513918c6c Fix up flushing of composite-type typcache entries to be driven directly by
SI invalidation events, rather than indirectly through the relcache.

In the previous coding, we had to flush a composite-type typcache entry
whenever we discarded the corresponding relcache entry.  This caused problems
at least when testing with RELCACHE_FORCE_RELEASE, as shown in recent report
from Jeff Davis, and might result in real-world problems given the kind of
unexpected relcache flush that that test mechanism is intended to model.

The new coding decouples relcache and typcache management, which is a good
thing anyway from a structural perspective.  The cost is that we have to
search the typcache linearly to find entries that need to be flushed.  There
are a couple of ways we could avoid that, but at the moment it's not clear
it's worth any extra trouble, because the typcache contains very few entries
in typical operation.

Back-patch to 8.2, the same as some other recent fixes in this general area.
The patch could be carried back to 8.0 with some additional work, but given
that it's only hypothetical whether we're fixing any problem observable in
the field, it doesn't seem worth the work now.
2010-09-02 03:16:46 +00:00
Itagaki Takahiro 49b27ab551 Add string functions: concat(), concat_ws(), left(), right(), and reverse().
Pavel Stehule, reviewed by me.
2010-08-24 06:30:44 +00:00
Peter Eisentraut 3f11971916 Remove extra newlines at end and beginning of files, add missing newlines
at end of files.
2010-08-19 05:57:36 +00:00
Robert Haas debcec7dc3 Include the backend ID in the relpath of temporary relations.
This allows us to reliably remove all leftover temporary relation
files on cluster startup without reference to system catalogs or WAL;
therefore, we no longer include temporary relations in XLOG_XACT_COMMIT
and XLOG_XACT_ABORT WAL records.

Since these changes require including a backend ID in each
SharedInvalSmgrMsg, the size of the SharedInvalidationMessage.id
field has been reduced from two bytes to one, and the maximum number
of connections has been reduced from INT_MAX / 4 to 2^23-1.  It would
be possible to remove these restrictions by increasing the size of
SharedInvalidationMessage by 4 bytes, but right now that doesn't seem
like a good trade-off.

Review by Jaime Casanova and Tom Lane.
2010-08-13 20:10:54 +00:00
Tom Lane a0b7b717a4 Add xml_is_well_formed, xml_is_well_formed_document, xml_is_well_formed_content
functions to the core XML code.  Per discussion, the former depends on
XMLOPTION while the others do not.  These supersede a version previously
offered by contrib/xml2.

Mike Fowler, reviewed by Pavel Stehule
2010-08-13 18:36:26 +00:00
Tom Lane 33f43725fb Add three-parameter forms of array_to_string and string_to_array, to allow
better handling of NULL elements within the arrays.  The third parameter
is a string that should be used to represent a NULL element, or should
be translated into a NULL element, respectively.  If the third parameter
is NULL it behaves the same as the two-parameter form.

There are two incompatible changes in the behavior of the two-parameter form
of string_to_array.  First, it will return an empty (zero-element) array
rather than NULL when the input string is of zero length.  Second, if the
field separator is NULL, the function splits the string into individual
characters, rather than returning NULL as before.  These two changes make
this form fully compatible with the behavior of the new three-parameter form.

Pavel Stehule, reviewed by Brendan Jurd
2010-08-10 21:51:00 +00:00
Tom Lane 4dfc457854 Add an xpath_exists() function. This is equivalent to XMLEXISTS except that
it offers support for namespace mapping.

Mike Fowler, reviewed by David Fetter
2010-08-08 19:15:27 +00:00
Tom Lane b0c451e145 Remove the single-argument form of string_agg(). It added nothing much in
functionality, while creating an ambiguity in usage with ORDER BY that at
least two people have already gotten seriously confused by.  Also, add an
opr_sanity test to check that we don't in future violate the newly minted
policy of not having built-in aggregates with the same name and different
numbers of parameters.  Per discussion of a complaint from Thom Brown.
2010-08-05 18:21:19 +00:00
Robert Haas 2a6ef3445c Standardize get_whatever_oid functions for object types with
unqualified names.

- Add a missing_ok parameter to get_tablespace_oid.
- Avoid duplicating get_tablespace_od guts in objectNamesToOids.
- Add a missing_ok parameter to get_database_oid.
- Replace get_roleid and get_role_checked with get_role_oid.
- Add get_namespace_oid, get_language_oid, get_am_oid.
- Refactor existing code to use new interfaces.

Thanks to KaiGai Kohei for the review.
2010-08-05 14:45:09 +00:00
Peter Eisentraut 641459f269 Add xmlexists function
by Mike Fowler, reviewed by Peter Eisentraut
2010-08-05 04:21:54 +00:00
Robert Haas 26e47efb66 Fix declared argument name for numeric_maximum_size.
The previous commit changed the function to say 'typmod' rather than
'typemod', but I forgot to update the header file.
2010-08-04 17:35:59 +00:00
Tom Lane db04f2b322 Replace the naive HYPOT() macro with a standards-conformant hypotenuse
function.  This avoids unnecessary overflows and probably gives a more
accurate result as well.

Paul Matthews, reviewed by Andrew Geery
2010-08-03 21:21:03 +00:00
Tom Lane 0454f13161 Rewrite the rbtree routines so that an RBNode is the first field of the
struct representing a tree entry, rather than being a separately allocated
piece of storage.  This API is at least as clean as the old one (if not
more so --- there were some bizarre choices in there) and it permits a
very substantial memory savings, on the order of 2X in ginbulk.c's usage.

Also, fix minor memory leaks in code called by ginEntryInsert, in
particular in ginInsertValue and entryFillRoot, as well as ginEntryInsert
itself.  These leaks resulted in the GIN index build context continuing
to bloat even after we'd filled it to maintenance_work_mem and started
to dump data out to the index.

In combination these fixes restore the GIN index build code to honoring
the maintenance_work_mem limit about as well as it did in 8.4.  Speed
seems on par with 8.4 too, maybe even a bit faster, for a non-pathological
case in which HEAD was formerly slower.

Back-patch to 9.0 so we don't have a performance regression from 8.4.
2010-08-01 02:12:42 +00:00
Robert Haas 8a4dc94ca0 Make details of the Numeric representation private to numeric.c.
Review by Tom Lane.
2010-07-30 04:30:23 +00:00
Robert Haas ce68df468a Add options to force quoting of all identifiers.
I've added a quote_all_identifiers GUC which affects the behavior
of the backend, and a --quote-all-identifiers argument to pg_dump
and pg_dumpall which sets the GUC and also affects the quoting done
internally by those applications.

Design by Tom Lane; review by Alex Hunsaker; in response to bug #5488
filed by Hartmut Goebel.
2010-07-22 01:22:35 +00:00
Robert Haas 5ffaa9005c Add restart_after_crash GUC.
Normally, we automatically restart after a backend crash, but in some
cases when PostgreSQL is invoked by clusterware it may be desirable to
suppress this behavior, so we provide an option which does this.
Since no existing GUC group quite fits, create a new group called
"error handling options" for this and the previously undocumented GUC
exit_on_error, which is now documented.

Review by Fujii Masao.
2010-07-20 00:47:53 +00:00
Tom Lane 7590ddb3eb Add support for dividing money by money (yielding a float8 result) and for
casting between money and numeric.

Andy Balholm, reviewed by Kevin Grittner
2010-07-16 02:15:56 +00:00
Tom Lane 1cc29fe7c6 Teach EXPLAIN to print PARAM_EXEC Params as the referenced expressions,
rather than just $N.  This brings the display of nestloop-inner-indexscan
plans back to where it's been, and incidentally improves the display of
SubPlan parameters as well.  In passing, simplify the EXPLAIN code by
having it deal primarily in the PlanState tree rather than separately
searching Plan and PlanState trees.  This is noticeably cleaner for
subplans, and about a wash elsewhere.

One small difference from previous behavior is that EXPLAIN will no longer
qualify local variable references in inner-indexscan plan nodes, since it
no longer sees such nodes as possibly referencing multiple tables.  Vars
referenced through PARAM_EXEC Params are still forcibly qualified, though,
so I don't think the display is any more confusing than before.  Adjust a
couple of examples in the documentation to match this behavior.
2010-07-13 20:57:19 +00:00
Bruce Momjian 239d769e7e pgindent run for 9.0, second run 2010-07-06 19:19:02 +00:00
Heikki Linnakangas eb81b6509f The previous fix in CVS HEAD and 8.4 for handling the case where a cursor
being used in a PL/pgSQL FOR loop is closed was inadequate, as Tom Lane
pointed out. The bug affects FOR statement variants too, because you can
close an implicitly created cursor too by guessing the "<unnamed portal X>"
name created for it.

To fix that, "pin" the portal to prevent it from being dropped while it's
being used in a PL/pgSQL FOR loop. Backpatch all the way to 7.4 which is
the oldest supported version.
2010-07-05 09:27:18 +00:00
Itagaki Takahiro 41f302b52a Add new GUC categories corresponding to sections in docs, and move
description for vacuum_defer_cleanup_age to the correct category.
Sections in postgresql.conf are also sorted in the same order with docs.

Per gripe by Fujii Masao, suggestion by Heikki Linnakangas, and patch by me.
2010-06-15 07:52:11 +00:00
Robert Haas 26b7abfa32 Fix ALTER LARGE OBJECT and GRANT ... ON LARGE OBJECT for large OIDs.
The previous coding failed for OIDs too large to be represented by
a signed integer.
2010-06-13 17:43:13 +00:00
Robert Haas dd6fcd35e3 Change typedef for rb_appendator to avoid conflict with C++ reserved words.
Fixes a complaint from src/tools/pginclude/cpluspluscheck reported by
Peter Eisentraut.
2010-05-11 18:14:01 +00:00
Simon Riggs 90e04bab39 Patch revoked because of objections. 2010-04-24 16:20:32 +00:00
Simon Riggs 473af39737 Add missing optimizer hooks for function cost and number of rows.
Closely follow design of other optimizer hooks: if hook exists
retrieve value from plugin; if still not set then get from cache.
2010-04-23 22:23:39 +00:00
Alvaro Herrera be8cebc717 Prevent ALTER USER f RESET ALL from removing the settings that were put there
by a superuser -- "ALTER USER f RESET setting" already disallows removing such a
setting.

Apply the same treatment to ALTER DATABASE d RESET ALL when run by a database
owner that's not superuser.
2010-03-25 14:44:34 +00:00
Bruce Momjian a6c1cea2b7 Add libpq warning message if the .pgpass-retrieved password fails.
Add ERRCODE_INVALID_PASSWORD sqlstate error code.
2010-03-13 14:55:57 +00:00
Tom Lane 8bf14182cf Export xml.c's libxml-error-handling support so that contrib/xml2 can use it
too, instead of duplicating the functionality (badly).

I renamed xml_init to pg_xml_init, because the former seemed just a bit too
generic to be safe as a global symbol.  I considered likewise renaming
xml_ereport to pg_xml_ereport, but felt that the reference to ereport probably
made it sufficiently PG-centric already.
2010-03-03 17:29:45 +00:00
Bruce Momjian 65e806cba1 pgindent run for 9.0 2010-02-26 02:01:40 +00:00
Robert Haas e26c539e9f Wrap calls to SearchSysCache and related functions using macros.
The purpose of this change is to eliminate the need for every caller
of SearchSysCache, SearchSysCacheCopy, SearchSysCacheExists,
GetSysCacheOid, and SearchSysCacheList to know the maximum number
of allowable keys for a syscache entry (currently 4).  This will
make it far easier to increase the maximum number of keys in a
future release should we choose to do so, and it makes the code
shorter, too.

Design and review by Tom Lane.
2010-02-14 18:42:19 +00:00
Tom Lane 7507b193bc Don't expose the inline definition of MemoryContextSwitchTo when FRONTEND is
defined.  Its reference to CurrentMemoryContext causes link failures on some
platforms, evidently because the inline function gets compiled despite lack of
use.  Per buildfarm member warthog.
2010-02-13 20:46:52 +00:00
Tom Lane e08ab7c312 Support inlining various small performance-critical functions on non-GCC
compilers, by applying a configure check to see if the compiler will accept
an unreferenced "static inline foo ..." function without warnings.  It is
believed that such warnings are the only reason not to declare inlined
functions in headers, if the compiler understands "inline" at all.

Kurt Harriman
2010-02-13 02:34:16 +00:00
Teodor Sigaev 5209c084a6 Generic implementation of red-black binary tree. It's planned to use in
several places, but for now only GIN uses it during index creation.
Using self-balanced tree greatly speeds up index creation in corner cases
with preordered data.
2010-02-11 14:29:50 +00:00
Tom Lane cbe9d6beb4 Fix up rickety handling of relation-truncation interlocks.
Move rd_targblock, rd_fsm_nblocks, and rd_vm_nblocks from relcache to the smgr
relation entries, so that they will get reset to InvalidBlockNumber whenever
an smgr-level flush happens.  Because we now send smgr invalidation messages
immediately (not at end of transaction) when a relation truncation occurs,
this ensures that other backends will reset their values before they next
access the relation.  We no longer need the unreliable assumption that a
VACUUM that's doing a truncation will hold its AccessExclusive lock until
commit --- in fact, we can intentionally release that lock as soon as we've
completed the truncation.  This patch therefore reverts (most of) Alvaro's
patch of 2009-11-10, as well as my marginal hacking on it yesterday.  We can
also get rid of assorted no-longer-needed relcache flushes, which are far more
expensive than an smgr flush because they kill a lot more state.

In passing this patch fixes smgr_redo's failure to perform visibility-map
truncation, and cleans up some rather dubious assumptions in freespace.c and
visibilitymap.c about when rd_fsm_nblocks and rd_vm_nblocks can be out of
date.
2010-02-09 21:43:30 +00:00
Tom Lane 9a75803b1a Remove CatalogCacheFlushRelation, and the reloidattr infrastructure that was
needed by nothing else.

The restructuring I just finished doing on cache management exposed to me how
silly this routine was.  Its function was to go into the catcache and blow
away all entries related to a given relation when there was a relcache flush
on that relation.  However, there is no point in removing a catcache entry
if the catalog row it represents is still valid --- and if it isn't valid,
there must have been a catcache entry flush on it, because that's triggered
directly by heap_update or heap_delete on the catalog row.  So this routine
accomplished nothing except to blow away valid cache entries that we'd very
likely be wanting in the near future to help reconstruct the relcache entry.
Dumb.

On top of which, it required a subtle and easy-to-get-wrong attribute in
syscache definitions, ie, the column containing the OID of the related
relation if any.  Removing that is a very useful maintenance simplification.
2010-02-08 05:53:55 +00:00
Tom Lane 0a469c8769 Remove old-style VACUUM FULL (which was known for a little while as
VACUUM FULL INPLACE), along with a boatload of subsidiary code and complexity.
Per discussion, the use case for this method of vacuuming is no longer large
enough to justify maintaining it; not to mention that we don't wish to invest
the work that would be needed to make it play nicely with Hot Standby.

Aside from the code directly related to old-style VACUUM FULL, this commit
removes support for certain WAL record types that could only be generated
within VACUUM FULL, redirect-pointer removal in heap_page_prune, and
nontransactional generation of cache invalidation sinval messages (the last
being the sticking point for Hot Standby).

We still have to retain all code that copes with finding HEAP_MOVED_OFF and
HEAP_MOVED_IN flag bits on existing tuples.  This can't be removed as long
as we want to support in-place update from pre-9.0 databases.
2010-02-08 04:33:55 +00:00
Tom Lane b9b8831ad6 Create a "relation mapping" infrastructure to support changing the relfilenodes
of shared or nailed system catalogs.  This has two key benefits:

* The new CLUSTER-based VACUUM FULL can be applied safely to all catalogs.

* We no longer have to use an unsafe reindex-in-place approach for reindexing
  shared catalogs.

CLUSTER on nailed catalogs now works too, although I left it disabled on
shared catalogs because the resulting pg_index.indisclustered update would
only be visible in one database.

Since reindexing shared system catalogs is now fully transactional and
crash-safe, the former special cases in REINDEX behavior have been removed;
shared catalogs are treated the same as non-shared.

This commit does not do anything about the recently-discussed problem of
deadlocks between VACUUM FULL/CLUSTER on a system catalog and other
concurrent queries; will address that in a separate patch.  As a stopgap,
parallel_schedule has been tweaked to run vacuum.sql by itself, to avoid
such failures during the regression tests.
2010-02-07 20:48:13 +00:00
Tom Lane 9727c583fe Restructure CLUSTER/newstyle VACUUM FULL/ALTER TABLE support so that swapping
of old and new toast tables can be done either at the logical level (by
swapping the heaps' reltoastrelid links) or at the physical level (by swapping
the relfilenodes of the toast tables and their indexes).  This is necessary
infrastructure for upcoming changes to support CLUSTER/VAC FULL on shared
system catalogs, where we cannot change reltoastrelid.  The physical swap
saves a few catalog updates too.

We unfortunately have to keep the logical-level swap logic because in some
cases we will be adding or deleting a toast table, so there's no possibility
of a physical swap.  However, that only happens as a consequence of schema
changes in the table, which we do not need to support for system catalogs,
so such cases aren't an obstacle for that.

In passing, refactor the cluster support functions a little bit to eliminate
unnecessarily-duplicated code; and fix the problem that while CLUSTER had
been taught to rename the final toast table at need, ALTER TABLE had not.
2010-02-04 00:09:14 +00:00
Tom Lane 70a2b05a59 Assorted cleanups in preparation for using a map file to support altering
the relfilenode of currently-not-relocatable system catalogs.

1. Get rid of inval.c's dependency on relfilenode, by not having it emit
smgr invalidations as a result of relcache flushes.  Instead, smgr sinval
messages are sent directly from smgr.c when an actual relation delete or
truncate is done.  This makes considerably more structural sense and allows
elimination of a large number of useless smgr inval messages that were
formerly sent even in cases where nothing was changing at the
physical-relation level.  Note that this reintroduces the concept of
nontransactional inval messages, but that's okay --- because the messages
are sent by smgr.c, they will be sent in Hot Standby slaves, just from a
lower logical level than before.

2. Move setNewRelfilenode out of catalog/index.c, where it never logically
belonged, into relcache.c; which is a somewhat debatable choice as well but
better than before.  (I considered catalog/storage.c, but that seemed too
low level.)  Rename to RelationSetNewRelfilenode.

3. Cosmetic cleanups of some other relfilenode manipulations.
2010-02-03 01:14:17 +00:00
Itagaki Takahiro 9ea9918e37 Add string_agg aggregate functions. The one argument version concatenates
the input values into a string. The two argument version also does the same
thing, but inserts delimiters between elements.

Original patch by Pavel Stehule, reviewed by David E. Wheeler and me.
2010-02-01 03:14:45 +00:00
Tom Lane d879697cd2 Remove the default_do_language parameter, instead making DO use a hardwired
default of "plpgsql".  This is more reasonable than it was when the DO patch
was written, because we have since decided that plpgsql should be installed
by default.  Per discussion, having a parameter for this doesn't seem useful
enough to justify the risk of application breakage if the value is changed
unexpectedly.
2010-01-26 16:33:40 +00:00
Tom Lane 9507c8a1db Add get_bit/set_bit functions for bit strings, paralleling those for bytea,
and implement OVERLAY() for bit strings and bytea.

In passing also convert text OVERLAY() to a true built-in, instead of
relying on a SQL function.

Leonardo F, reviewed by Kevin Grittner
2010-01-25 20:55:32 +00:00
Robert Haas d779199175 Fix several oversights in previous commit - attribute options patch.
I failed to 'cvs add' the new files and also neglected to bump catversion.
2010-01-22 16:42:31 +00:00
Tom Lane 4f15699d70 Add pg_table_size() and pg_indexes_size() to provide more user-friendly
wrappers around the pg_relation_size() function.

Bernd Helmle, reviewed by Greg Smith
2010-01-19 05:50:18 +00:00
Tom Lane 9a915e596f Improve the handling of SET CONSTRAINTS commands by having them search
pg_constraint before searching pg_trigger.  This allows saner handling of
corner cases; in particular we now say "constraint is not deferrable"
rather than "constraint does not exist" when the command is applied to
a constraint that's inherently non-deferrable.  Per a gripe several months
ago from hubert depesz lubaczewski.

To make this work without breaking user-defined constraint triggers,
we have to add entries for them to pg_constraint.  However, in return
we can remove the pgconstrname column from pg_constraint, which represents
a fairly sizable space savings.  I also replaced the tgisconstraint column
with tgisinternal; the old meaning of tgisconstraint can now be had by
testing for nonzero tgconstraint, while there is no other way to get
the old meaning of nonzero tgconstraint, namely that the trigger was
internally generated rather than being user-created.

In passing, fix an old misstatement in the docs and comments, namely that
pg_trigger.tgdeferrable is exactly redundant with pg_constraint.condeferrable.
Actually, we mark RI action triggers as nondeferrable even when they belong to
a nominally deferrable FK constraint.  The SET CONSTRAINTS code now relies on
that instead of hard-coding a list of exception OIDs.
2010-01-17 22:56:23 +00:00
Heikki Linnakangas 40f908bdcd Introduce Streaming Replication.
This includes two new kinds of postmaster processes, walsenders and
walreceiver. Walreceiver is responsible for connecting to the primary server
and streaming WAL to disk, while walsender runs in the primary server and
streams WAL from disk to the client.

Documentation still needs work, but the basics are there. We will probably
pull the replication section to a new chapter later on, as well as the
sections describing file-based replication. But let's do that as a separate
patch, so that it's easier to see what has been added/changed. This patch
also adds a new section to the chapter about FE/BE protocol, documenting the
protocol used by walsender/walreceivxer.

Bump catalog version because of two new functions,
pg_last_xlog_receive_location() and pg_last_xlog_replay_location(), for
monitoring the progress of replication.

Fujii Masao, with additional hacking by me
2010-01-15 09:19:10 +00:00
Teodor Sigaev 4cbe473938 Add point_ops opclass for GiST. 2010-01-14 16:31:09 +00:00
Tom Lane 6164b4bc19 Some trivial adjustments in comments for struct RelationData. 2010-01-10 22:19:17 +00:00
Tom Lane 50626efe0a Fix 3-parameter form of bit substring() to throw error for negative length,
as required by SQL standard.
2010-01-07 20:17:44 +00:00
Tom Lane 901be0fad4 Remove all the special-case code for INT64_IS_BUSTED, per decision that
we're not going to support that anymore.

I did keep the 64-bit-CRC-with-32-bit-arithmetic code, since it has a
performance excuse to live.  It's a bit moot since that's all ifdef'd
out, of course.
2010-01-07 04:53:35 +00:00
Robert Haas d86d51a958 Support ALTER TABLESPACE name SET/RESET ( tablespace_options ).
This patch only supports seq_page_cost and random_page_cost as parameters,
but it provides the infrastructure to scalably support many more.
In particular, we may want to add support for effective_io_concurrency,
but I'm leaving that as future work for now.

Thanks to Tom Lane for design help and Alvaro Herrera for the review.
2010-01-05 21:54:00 +00:00
Tom Lane 40608e7f94 When estimating the selectivity of an inequality "column > constant" or
"column < constant", and the comparison value is in the first or last
histogram bin or outside the histogram entirely, try to fetch the actual
column min or max value using an index scan (if there is an index on the
column).  If successful, replace the lower or upper histogram bound with
that value before carrying on with the estimate.  This limits the
estimation error caused by moving min/max values when the comparison
value is close to the min or max.  Per a complaint from Josh Berkus.

It is tempting to consider using this mechanism for mergejoinscansel as well,
but that would inject index fetches into main-line join estimation not just
endpoint cases.  I'm refraining from that until we can get a better handle
on the costs of doing this type of lookup.
2010-01-04 02:44:40 +00:00
Bruce Momjian 0239800893 Update copyright for the year 2010. 2010-01-02 16:58:17 +00:00
Tom Lane 649b5ec7c8 Add the ability to store inheritance-tree statistics in pg_statistic,
and teach ANALYZE to compute such stats for tables that have subclasses.
Per my proposal of yesterday.

autovacuum still needs to be taught about running ANALYZE on parent tables
when their subclasses change, but the feature is useful even without that.
2009-12-29 20:11:45 +00:00
Simon Riggs efc16ea520 Allow read only connections during recovery, known as Hot Standby.
Enabled by recovery_connections = on (default) and forcing archive recovery using a recovery.conf. Recovery processing now emulates the original transactions as they are replayed, providing full locking and MVCC behaviour for read only queries. Recovery must enter consistent state before connections are allowed, so there is a delay, typically short, before connections succeed. Replay of recovering transactions can conflict and in some cases deadlock with queries during recovery; these result in query cancellation after max_standby_delay seconds have expired. Infrastructure changes have minor effects on normal running, though introduce four new types of WAL record.

New test mode "make standbycheck" allows regression tests of static command behaviour on a standby server while in recovery. Typical and extreme dynamic behaviours have been checked via code inspection and manual testing. Few port specific behaviours have been utilised, though primary testing has been on Linux only so far.

This commit is the basic patch. Additional changes will follow in this release to enhance some aspects of behaviour, notably improved handling of conflicts, deadlock detection and query cancellation. Changes to VACUUM FULL are also required.

Simon Riggs, with significant and lengthy review by Heikki Linnakangas, including streamlined redesign of snapshot creation and two-phase commit.

Important contributions from Florian Pflug, Mark Kirkwood, Merlin Moncure, Greg Stark, Gianni Ciolli, Gabriele Bartolini, Hannu Krosing, Robert Haas, Tatsuo Ishii, Hiroyuki Yamada plus support and feedback from many other community members.
2009-12-19 01:32:45 +00:00
Itagaki Takahiro f1325ce213 Add large object access control.
A new system catalog pg_largeobject_metadata manages
ownership and access privileges of large objects.

KaiGai Kohei, reviewed by Jaime Casanova.
2009-12-11 03:34:57 +00:00
Tom Lane 62aba76568 Prevent indirect security attacks via changing session-local state within
an allegedly immutable index function.  It was previously recognized that
we had to prevent such a function from executing SET/RESET ROLE/SESSION
AUTHORIZATION, or it could trivially obtain the privileges of the session
user.  However, since there is in general no privilege checking for changes
of session-local state, it is also possible for such a function to change
settings in a way that might subvert later operations in the same session.
Examples include changing search_path to cause an unexpected function to
be called, or replacing an existing prepared statement with another one
that will execute a function of the attacker's choosing.

The present patch secures VACUUM, ANALYZE, and CREATE INDEX/REINDEX against
these threats, which are the same places previously deemed to need protection
against the SET ROLE issue.  GUC changes are still allowed, since there are
many useful cases for that, but we prevent security problems by forcing a
rollback of any GUC change after completing the operation.  Other cases are
handled by throwing an error if any change is attempted; these include temp
table creation, closing a cursor, and creating or deleting a prepared
statement.  (In 7.4, the infrastructure to roll back GUC changes doesn't
exist, so we settle for rejecting changes of "search_path" in these contexts.)

Original report and patch by Gurjeet Singh, additional analysis by
Tom Lane.

Security: CVE-2009-4136
2009-12-09 21:57:51 +00:00
Tom Lane 0cb65564e5 Add exclusion constraints, which generalize the concept of uniqueness to
support any indexable commutative operator, not just equality.  Two rows
violate the exclusion constraint if "row1.col OP row2.col" is TRUE for
each of the columns in the constraint.

Jeff Davis, reviewed by Robert Haas
2009-12-07 05:22:23 +00:00
Peter Eisentraut 36f887c41c Speed up information schema privilege views
Instead of expensive cross joins to resolve the ACL, add table-returning
function aclexplode() that expands the ACL into a useful form, and join
against that.

Also, implement the role_*_grants views as a thin layer over the respective
*_privileges views instead of essentially repeating the same code twice.

fixes bug #4596

by Joachim Wieland, with cleanup by me
2009-12-05 21:43:36 +00:00
Heikki Linnakangas ab3148b712 Fix bug in temporary file management with subtransactions. A cursor opened
in a subtransaction stays open even if the subtransaction is aborted, so
any temporary files related to it must stay alive as well. With the patch,
we use ResourceOwners to track open temporary files and don't automatically
close them at subtransaction end (though in the normal case temporary files
are registered with the subtransaction resource owner and will therefore be
closed).

At end of top transaction, we still check that there's no temporary files
marked as close-at-end-of-transaction open, but that's now just a debugging
cross-check as the resource owner cleanup should've closed them already.
2009-12-03 11:03:29 +00:00
Tom Lane 8217cfbd99 Add support for an application_name parameter, which is displayed in
pg_stat_activity and recorded in log entries.

Dave Page, reviewed by Andres Freund
2009-11-28 23:38:08 +00:00
Tom Lane 7fc0f06221 Add a WHEN clause to CREATE TRIGGER, allowing a boolean expression to be
checked to determine whether the trigger should be fired.

For BEFORE triggers this is mostly a matter of spec compliance; but for AFTER
triggers it can provide a noticeable performance improvement, since queuing of
a deferred trigger event and re-fetching of the row(s) at end of statement can
be short-circuited if the trigger does not need to be fired.

Takahiro Itagaki, reviewed by KaiGai Kohei.
2009-11-20 20:38:12 +00:00
Tom Lane 9bedd128d6 Add support for invoking parser callback hooks via SPI and in cached plans.
As proof of concept, modify plpgsql to use the hooks.  plpgsql is still
inserting $n symbols textually, but the "back end" of the parsing process now
goes through the ParamRef hook instead of using a fixed parameter-type array,
and then execution only fetches actually-referenced parameters, using a hook
added to ParamListInfo.

Although there's a lot left to be done in plpgsql, this already cures the
"if (TG_OP = 'INSERT' and NEW.foo ...)"  problem, as illustrated by the
changed regression test.
2009-11-04 22:26:08 +00:00
Heikki Linnakangas 2078e384a3 Fix range check in date_recv that tried to limit accepted values to only
those accepted by date_in(). I confused julian day numbers and number of
days since the postgres epoch 2000-01-01 in the original patch.

I just noticed that it's still easy to get such out-of-range values into
the database using to_date or +- operators, but this patch doesn't do
anything about those functions.

Per report from James Pye.
2009-10-26 16:13:11 +00:00
Tom Lane ab61df9e52 Remove regex_flavor GUC, so that regular expressions are always "advanced"
style by default.  Per discussion, there seems to be hardly anything that
really relies on being able to change the regex flavor, so the ability to
select it via embedded options ought to be enough for any stragglers.
Also, if we didn't remove the GUC, we'd really be morally obligated to
mark the regex functions non-immutable, which'd possibly create performance
issues.
2009-10-21 20:38:58 +00:00
Alvaro Herrera 201e5b282b Add new PGC_S_DATABASE_USER enum value to several places missed by my patch
last week.

Per note and patch from Jeff Davis.
2009-10-13 14:18:40 +00:00
Peter Eisentraut b865d27582 Use pg_get_triggerdef in pg_dump
Add a variant of pg_get_triggerdef with a second argument "pretty" that
causes the output to be formatted in the way pg_dump used to do.  Use this
variant in pg_dump with server versions >= 8.5.

This insulates pg_dump from most future trigger feature additions, such as
the upcoming column triggers patch.

Author: Itagaki Takahiro <itagaki.takahiro@oss.ntt.co.jp>
2009-10-09 21:02:56 +00:00
Alvaro Herrera 2eda8dfb52 Make it possibly to specify GUC params per user and per database.
Create a new catalog pg_db_role_setting where they are now stored, and better
encapsulate the code that deals with settings into its realm.  The old
datconfig and rolconfig columns are removed.

psql has gained a \drds command to display the settings.

Backwards compatibility warning: while the backwards-compatible system views
still have the config columns, they no longer completely represent the
configuration for a user or database.

Catalog version bumped.
2009-10-07 22:14:26 +00:00
Alvaro Herrera 07cefdfb7a Fix snapshot management, take two.
Partially revert the previous patch I installed and replace it with a more
general fix: any time a snapshot is pushed as Active, we need to ensure that it
will not be modified in the future.  This means that if the same snapshot is
used as CurrentSnapshot, it needs to be copied separately.  This affects
serializable transactions only, because CurrentSnapshot has already been copied
by RegisterSnapshot and so PushActiveSnapshot does not think it needs another
copy.  However, CommandCounterIncrement would modify CurrentSnapshot, whereas
ActiveSnapshots must not have their command counters incremented.

I say "partially" because the regression test I added for the previous bug
has been kept.

(This restores 8.3 behavior, because before snapmgr.c existed, any snapshot set
as Active was copied.)

Per bug report from Stuart Bishop in
6bc73d4c0910042358k3d1adff3qa36f8df75198ecea@mail.gmail.com
2009-10-07 16:27:18 +00:00
Tom Lane 249724cb01 Create an ALTER DEFAULT PRIVILEGES command, which allows users to adjust
the privileges that will be applied to subsequently-created objects.

Such adjustments are always per owning role, and can be restricted to objects
created in particular schemas too.  A notable benefit is that users can
override the traditional default privilege settings, eg, the PUBLIC EXECUTE
privilege traditionally granted by default for functions.

Petr Jelinek
2009-10-05 19:24:49 +00:00
Tom Lane 54d60bbd07 Fix a couple of issues in recent patch to print updates to postgresql.conf
settings: avoid calling superuser() in contexts where it's not defined,
don't leak the transient copies of GetConfigOption output, and avoid the
whole exercise in postmaster child processes.

I found that actually no current caller of GetConfigOption has any use for
its internal check of GUC_SUPERUSER_ONLY.  But rather than just remove
that entirely, it seemed better to add a parameter indicating whether to
enforce the check.

Per report from Simon and subsequent testing.
2009-10-03 18:04:57 +00:00
Alvaro Herrera caa4cfa369 Ensure that a cursor has an immutable snapshot throughout its lifespan.
The old coding was using a regular snapshot, referenced elsewhere, that was
subject to having its command counter updated.  Fix by creating a private copy
of the snapshot exclusively for the cursor.

Backpatch to 8.4, which is when the bug was introduced during the snapshot
management rewrite.
2009-10-02 17:57:30 +00:00
Tom Lane 9048b73184 Implement the DO statement to support execution of PL code without having
to create a function for it.

Procedural languages now have an additional entry point, namely a function
to execute an inline code block.  This seemed a better design than trying
to hide the transient-ness of the code from the PL.  As of this patch, only
plpgsql has an inline handler, but probably people will soon write handlers
for the other standard PLs.

In passing, remove the long-dead LANCOMPILER option of CREATE LANGUAGE.

Petr Jelinek
2009-09-22 23:43:43 +00:00
Peter Eisentraut 3ab8b7fa6f Fix/improve bytea and boolean support in PL/Python
Before, PL/Python converted data between SQL and Python by going
through a C string representation.  This broke for bytea in two ways:

- On input (function parameters), you would get a Python string that
  contains bytea's particular external representation with backslashes
  etc., instead of a sequence of bytes, which is what you would expect
  in a Python environment.  This problem is exacerbated by the new
  bytea output format.

- On output (function return value), null bytes in the Python string
  would cause truncation before the data gets stored into a bytea
  datum.

This is now fixed by converting directly between the PostgreSQL datum
and the Python representation.

The required generalized infrastructure also allows for other
improvements in passing:

- When returning a boolean value, the SQL datum is now true if and
  only if Python considers the value that was passed out of the
  PL/Python function to be true.  Previously, this determination was
  left to the boolean data type input function.  So, now returning
  'foo' results in true, because Python considers it true, rather than
  false because PostgreSQL considers it false.

- On input, we can convert the integer and float types directly to
  their Python equivalents without having to go through an
  intermediate string representation.

original patch by Caleb Welton, with updates by myself
2009-09-09 19:00:09 +00:00
Heikki Linnakangas 7be39bb0be Tigthen binary receive functions so that they reject values that the text
input functions don't accept either. While the backend can handle such
values fine, they can cause trouble in clients and in pg_dump/restore.

This is followup to the original issue on time datatype reported by Andrew
McNamara a while ago. Like that one, none of these seem worth
back-patching.
2009-09-04 11:20:23 +00:00
Tom Lane 187e5d8981 Disallow RESET ROLE and RESET SESSION AUTHORIZATION inside security-definer
functions.

This extends the previous patch that forbade SETting these variables inside
security-definer functions.  RESET is equally a security hole, since it
would allow regaining privileges of the caller; furthermore it can trigger
Assert failures and perhaps other internal errors, since the code is not
expecting these variables to change in such contexts.  The previous patch
did not cover this case because assign hooks don't really have enough
information, so move the responsibility for preventing this into guc.c.

Problem discovered by Heikki Linnakangas.

Security: no CVE assigned yet, extends CVE-2007-6600
2009-09-03 22:08:05 +00:00
Alvaro Herrera a8bb8eb583 Remove flatfiles.c, which is now obsolete.
Recent commits have removed the various uses it was supporting.  It was a
performance bottleneck, according to bug report #4919 by Lauris Ulmanis; seems
it slowed down user creation after a billion users.
2009-09-01 02:54:52 +00:00
Tom Lane e710b65c1c Remove the use of the pg_auth flat file for client authentication.
(That flat file is now completely useless, but removal will come later.)

To do this, postpone client authentication into the startup transaction
that's run by InitPostgres.  We still collect the startup packet and do
SSL initialization (if needed) at the same time we did before.  The
AuthenticationTimeout is applied separately to startup packet collection
and the actual authentication cycle.  (This is a bit annoying, since it
means a couple extra syscalls; but the signal handling requirements inside
and outside a transaction are sufficiently different that it seems best
to treat the timeouts as completely independent.)

A small security disadvantage is that if the given database name is invalid,
this will be reported to the client before any authentication happens.
We could work around that by connecting to database "postgres" instead,
but consensus seems to be that it's not worth introducing such surprising
behavior.

Processing of all command-line switches and GUC options received from the
client is now postponed until after authentication.  This means that
PostAuthDelay is much less useful than it used to be --- if you need to
investigate problems during InitPostgres you'll have to set PreAuthDelay
instead.  However, allowing an unauthenticated user to set any GUC options
whatever seems a bit too risky, so we'll live with that.
2009-08-29 19:26:52 +00:00
Tom Lane 3bd2241135 Fix overflow for INTERVAL 'x ms' where x is more than a couple million,
and integer datetimes are in use.  Per bug report from Hubert Depesz
Lubaczewski.

Alex Hunsaker
2009-08-18 21:23:14 +00:00
Tom Lane 04011cc970 Allow backends to start up without use of the flat-file copy of pg_database.
To make this work in the base case, pg_database now has a nailed-in-cache
relation descriptor that is initialized using hardwired knowledge in
relcache.c.  This means pg_database is added to the set of relations that
need to have a Schema_pg_xxx macro maintained in pg_attribute.h.  When this
path is taken, we'll have to do a seqscan of pg_database to find the row
we need.

In the normal case, we are able to do an indexscan to find the database's row
by name.  This is made possible by storing a global relcache init file that
describes only the shared catalogs and their indexes (and therefore is usable
by all backends in any database).  A new backend loads this cache file,
finds its database OID after an indexscan on pg_database, and then loads
the local relcache init file for that database.

This change should effectively eliminate number of databases as a factor
in backend startup time, even with large numbers of databases.  However,
the real reason for doing it is as a first step towards getting rid of
the flat files altogether.  There are still several other sub-projects
to be tackled before that can happen.
2009-08-12 20:53:31 +00:00
Tom Lane e61fd4ac74 Support EEEE (scientific notation) in to_char().
Pavel Stehule, Brendan Jurd
2009-08-10 18:29:27 +00:00
Tom Lane 9bd27b7c9e Extend EXPLAIN to support output in XML or JSON format.
There are probably still some adjustments to be made in the details
of the output, but this gets the basic structure in place.

Robert Haas
2009-08-10 05:46:50 +00:00
Tom Lane a2a8c7a662 Support hex-string input and output for type BYTEA.
Both hex format and the traditional "escape" format are automatically
handled on input.  The output format is selected by the new GUC variable
bytea_output.

As committed, bytea_output defaults to HEX, which is an *incompatible
change*.  We will keep it this way for awhile for testing purposes, but
should consider whether to switch to the more backwards-compatible
default of ESCAPE before 8.5 is released.

Peter Eisentraut
2009-08-04 16:08:37 +00:00
Joe Conway be6bca23b3 Implement has_sequence_privilege()
Add family of functions that did not exist earlier,
mainly due to historical omission. Original patch by
Abhijit Menon-Sen, with review and modifications by
Joe Conway. catversion.h bumped.
2009-08-03 21:11:40 +00:00
Tom Lane b680ae4bdb Improve unique-constraint-violation error messages to include the exact
values being complained of.

In passing, also remove the arbitrary length limitation in the similar
error detail message for foreign key violations.

Itagaki Takahiro
2009-08-01 19:59:41 +00:00
Tom Lane 25d9bf2e3e Support deferrable uniqueness constraints.
The current implementation fires an AFTER ROW trigger for each tuple that
looks like it might be non-unique according to the index contents at the
time of insertion.  This works well as long as there aren't many conflicts,
but won't scale to massive unique-key reassignments.  Improving that case
is a TODO item.

Dean Rasheed
2009-07-29 20:56:21 +00:00
Tom Lane c1b9ec24ef Add system catalog columns pg_constraint.conindid and pg_trigger.tgconstrindid.
conindid is the index supporting a constraint.  We can use this not only for
unique/primary-key constraints, but also foreign-key constraints, which
depend on the unique index that constrains the referenced columns.
tgconstrindid is just copied from the constraint's conindid field, or is
zero for triggers not associated with constraints.

This is mainly intended as infrastructure for upcoming patches, but it has
some virtue in itself, since it exposes a relationship that you formerly
had to grovel in pg_depend to determine.  I simplified one information_schema
view accordingly.  (There is a pg_dump query that could also use conindid,
but I left it alone because it wasn't clear it'd get any faster.)
2009-07-28 02:56:31 +00:00
Peter Eisentraut de160e2c00 Make backend header files C++ safe
This alters various incidental uses of C++ key words to use other similar
identifiers, so that a C++ compiler won't choke outright.  You still
(probably) need extern "C" { }; around the inclusion of backend headers.

based on a patch by Kurt Harriman <harriman@acm.org>

Also add a script cpluspluscheck to check for C++ compatibility in the
future.  As of right now, this passes without error for me.
2009-07-16 06:33:46 +00:00
Peter Eisentraut e292dbcf54 More sensible character_octet_length
For character types with typmod, character_octet_length columns in the
information schema now show the maximum character length times the
maximum length of a character in the server encoding, instead of some
huge value as before.
2009-07-07 18:23:15 +00:00
Bruce Momjian d747140279 8.4 pgindent run, with new combined Linux/FreeBSD/MinGW typedef list
provided by Andrew.
2009-06-11 14:49:15 +00:00
Peter Eisentraut 9b7304bc25 Fix xmlattribute escaping XML special characters twice (bug #4822).
Author: Itagaki Takahiro <itagaki.takahiro@oss.ntt.co.jp>
2009-06-09 22:00:57 +00:00
Tom Lane 76d4abf2d9 Improve the recently-added support for properly pluralized error messages
by extending the ereport() API to cater for pluralization directly.  This
is better than the original method of calling ngettext outside the elog.c
code because (1) it avoids double translation, which wastes cycles and in
the worst case could give a wrong result; and (2) it avoids having to use
a different coding method in PL code than in the core backend.  The
client-side uses of ngettext are not touched since neither of these concerns
is very pressing in the client environment.  Per my proposal of yesterday.
2009-06-04 18:33:08 +00:00
Tom Lane b3b89fd1f1 Fix DecodeInterval to report an error for multiple occurrences of DAY, WEEK,
YEAR, DECADE, CENTURY, or MILLENIUM fields, just as it always has done for
other types of fields.  The previous behavior seems to have been a hack to
avoid defining bit-positions for all these field types in DTK_M() masks,
rather than something that was really considered to be desired behavior.
But there is room in the masks for these, and we really need to tighten up
at least the behavior of DAY and YEAR fields to avoid unexpected behavior
associated with the 8.4 changes to interpret ambiguous fields based on the
interval qualifier (typmod) value.  Per my example and proposed patch.
2009-06-01 16:55:11 +00:00
Tom Lane 99bf328237 Remove the useless and rather inconsistent return values of EncodeDateOnly,
EncodeTimeOnly, EncodeDateTime, EncodeInterval.  These don't have any good
reason to fail, and their callers were mostly not checking anyway.
2009-05-26 02:17:50 +00:00
Tom Lane 23543c732b Rewrite xml.c's memory management (yet again). Give up on the idea of
redirecting libxml's allocations into a Postgres context.  Instead, just let
it use malloc directly, and add PG_TRY blocks as needed to be sure we release
libxml data structures in error recovery code paths.  This is ugly but seems
much more likely to play nicely with third-party uses of libxml, as seen in
recent trouble reports about using Perl XML facilities in pl/perl and bug
#4774 about contrib/xml2.

I left the code for allocation redirection in place, but it's only
built/used if you #define USE_LIBXMLCONTEXT.  This is because I found it
useful to corral libxml's allocations in a palloc context when hunting
for libxml memory leaks, and we're surely going to have more of those
in the future with this type of approach.  But we don't want it turned on
in a normal build because it breaks exactly what we need to fix.

I have not re-indented most of the code sections that are now wrapped
by PG_TRY(); that's for ease of review.  pg_indent will fix it.

This is a pre-existing bug in 8.3, but I don't dare back-patch this change
until it's gotten a reasonable amount of field testing.
2009-05-13 20:27:17 +00:00
Tom Lane 06e2757277 Remove SQL-compatibility function cardinality(). It is not exactly clear
how this ought to behave for multi-dimensional arrays.  Per discussion,
not having it at all seems better than having it with what might prove
to be the wrong behavior.  We can always add it later when we have consensus
on the correct behavior.
2009-04-09 17:39:50 +00:00
Tom Lane f2110a757d Change cardinality() into a C-code function, instead of a SQL-language
alias for array_length(v,1).  The efficiency gain here is doubtless
negligible --- what I'm interested in is making sure that if we have
second thoughts about the definition, we will not have to force a
post-beta initdb to change the implementation.
2009-04-05 22:28:59 +00:00
Tom Lane 948d6ec90f Modify the relcache to record the temp status of both local and nonlocal
temp relations; this is no more expensive than before, now that we have
pg_class.relistemp.  Insert tests into bufmgr.c to prevent attempting
to fetch pages from nonlocal temp relations.  This provides a low-level
defense against bugs-of-omission allowing temp pages to be loaded into shared
buffers, as in the contrib/pgstattuple problem reported by Stuart Bishop.
While at it, tweak a bunch of places to use new relcache tests (instead of
expensive probes into pg_namespace) to detect local or nonlocal temp tables.
2009-03-31 22:12:48 +00:00
Tom Lane 25bf7f8b9b Fix possible failures when a tuplestore switches from in-memory to on-disk
mode while callers hold pointers to in-memory tuples.  I reported this for
the case of nodeWindowAgg's primary scan tuple, but inspection of the code
shows that all of the calls in nodeWindowAgg and nodeCtescan are at risk.
For the moment, fix it with a rather brute-force approach of copying
whenever one of the at-risk callers requests a tuple.  Later we might
think of some sort of reference-count approach to reduce tuple copying.
2009-03-27 18:30:21 +00:00
Peter Eisentraut 05a7db0582 Accept 'on' and 'off' as input for boolean data type, unifying the syntax
that the data type and GUC accepts.

ITAGAKI Takahiro
2009-03-09 14:34:35 +00:00
Peter Eisentraut 12f87b2c82 Add new SQL:2008 error codes for invalid LIMIT and OFFSET values. Remove
unused nonstandard error code that was perhaps intended for this but never
used.
2009-03-04 10:55:00 +00:00
Alvaro Herrera 834a6da4f7 Update autovacuum to use reloptions instead of a system catalog, for
per-table overrides of parameters.

This removes a whole class of problems related to misusing the catalog,
and perhaps more importantly, gives us pg_dump support for the parameters.

Based on a patch by Euler Taveira de Oliveira, heavily reworked by me.
2009-02-09 20:57:59 +00:00
Tom Lane 7449427a1e Clean up some loose ends from the column privileges patch: add
has_column_privilege and has_any_column_privilege SQL functions; fix the
information_schema views that are supposed to pay attention to column
privileges; adjust pg_stats to show stats for any column you have select
privilege on; and fix COPY to allow copying a subset of columns if the user
has suitable per-column privileges for all the columns.

To improve efficiency of some of the information_schema views, extend the
has_xxx_privilege functions to allow inquiring about the OR of a set of
privileges in just one call.  This is just exposing capability that already
existed in the underlying aclcheck routines.

In passing, make the information_schema views report the owner's own
privileges as being grantable, since Postgres assumes this even when the grant
option bit is not set in the ACL.  This is a longstanding oversight.

Also, make the new has_xxx_privilege functions for foreign data objects follow
the same coding conventions used by the older ones.

Stephen Frost and Tom Lane
2009-02-06 21:15:12 +00:00
Tom Lane 3cb5d6580a Support column-level privileges, as required by SQL standard.
Stephen Frost, with help from KaiGai Kohei and others
2009-01-22 20:16:10 +00:00
Tom Lane 7c63d0c72e Change a couple of ill-advised uses of INFO elog level to WARNINGs; in
particular this allows EmitWarningsOnPlaceholders messages to show up in the
postmaster log by default.  Update elog.h comment to make it clearer what INFO
is for, and fix one example in the SGML docs that was misusing it.  Per my
gripe of yesterday.
2009-01-06 16:39:52 +00:00
Bruce Momjian 511db38ace Update copyright for 2009. 2009-01-01 17:24:05 +00:00
Tom Lane 95b07bc7f5 Support window functions a la SQL:2008.
Hitoshi Harada, with some kibitzing from Heikki and Tom.
2008-12-28 18:54:01 +00:00
Tom Lane 38e9348282 Make a couple of small changes to the tuplestore API, for the benefit of the
upcoming window-functions patch.  First, tuplestore_trim is now an
exported function that must be explicitly invoked by callers at
appropriate times, rather than something that tuplestore tries to do
behind the scenes.  Second, a read pointer that is marked as allowing
backward scan no longer prevents truncation.  This means that a read pointer
marked as having BACKWARD but not REWIND capability can only safely read
backwards as far as the oldest other read pointer.  (The expected use pattern
for this involves having another read pointer that serves as the truncation
fencepost.)
2008-12-27 17:39:00 +00:00
Peter Eisentraut cae565e503 SQL/MED catalog manipulation facilities
This doesn't do any remote or external things yet, but it gives modules
like plproxy and dblink a standardized and future-proof system for
managing their connection information.

Martin Pihlak and Peter Eisentraut
2008-12-19 16:25:19 +00:00
Peter Eisentraut 455dffbb73 Default values for function arguments
Pavel Stehule, with some tweaks by Peter Eisentraut
2008-12-04 17:51:28 +00:00
Alvaro Herrera 7b640b0345 Fix a couple of snapshot management bugs in the new ResourceOwner world:
non-writable large objects need to have their snapshots registered on the
transaction resowner, not the current portal's, because it must persist until
the large object is closed (which the portal does not).  Also, ensure that the
serializable snapshot is recorded by the transaction resource owner too, even
when a subtransaction has changed the current resource owner before
serializable is taken.

Per bug reports from Pavan Deolasee.
2008-12-04 14:51:02 +00:00
Heikki Linnakangas 608195a3a3 Introduce visibility map. The visibility map is a bitmap with one bit per
heap page, where a set bit indicates that all tuples on the page are
visible to all transactions, and the page therefore doesn't need
vacuuming. It is stored in a new relation fork.

Lazy vacuum uses the visibility map to skip pages that don't need
vacuuming. Vacuum is also responsible for setting the bits in the map.
In the future, this can hopefully be used to implement index-only-scans,
but we can't currently guarantee that the visibility map is always 100%
up-to-date.

In addition to the visibility map, there's a new PD_ALL_VISIBLE flag on
each heap page, also indicating that all tuples on the page are visible to
all transactions. It's important that this flag is kept up-to-date. It
is also used to skip visibility tests in sequential scans, which gives a
small performance gain on seqscans.
2008-12-03 13:05:22 +00:00
Tom Lane c1f3073333 Clean up the API for DestReceiver objects by eliminating the assumption
that a Portal is a useful and sufficient additional argument for
CreateDestReceiver --- it just isn't, in most cases.  Instead formalize
the approach of passing any needed parameters to the receiver separately.

One unexpected benefit of this change is that we can declare typedef Portal
in a less surprising location.

This patch is just code rearrangement and doesn't change any functionality.
I'll tackle the HOLD-cursor-vs-toast problem in a follow-on patch.
2008-11-30 20:51:25 +00:00
Heikki Linnakangas 9858a8c81c Rely on relcache invalidation to update the cached size of the FSM. 2008-11-26 17:08:58 +00:00
Alvaro Herrera 6bbef4e538 Use ResourceOwners in the snapshot manager, instead of attempting to track them
by hand.  As an added bonus, the new code is smaller and more understandable,
and the ugly loops are gone.

This had been discussed all along but never implemented.  It became clear that
it really needed to be fixed after a bug report by Pavan Deolasee.
2008-11-25 20:28:29 +00:00
Tom Lane cd35e9d746 Some infrastructure changes for the upcoming auto-explain contrib module:
* Refactor explain.c slightly to export a convenient-to-use subroutine
for printing EXPLAIN results.

* Provide hooks for plugins to get control at ExecutorStart and ExecutorEnd
as well as ExecutorRun.

* Add some minimal support for tracking the total runtime of ExecutorRun.
This code won't actually do anything unless a plugin prods it to.

* Change the API of the DefineCustomXXXVariable functions to allow nonzero
"flags" to be specified for a custom GUC variable.  While at it, also make
the "bootstrap" default value for custom GUCs be explicitly specified as a
parameter to these functions.  This is to eliminate confusion over where the
default comes from, as has been expressed in the past by some users of the
custom-variable facility.

* Refactor GUC code a bit to ensure that a custom variable gets initialized to
something valid (like its default value) even if the placeholder value was
invalid.
2008-11-19 01:10:24 +00:00
Tom Lane 62533d34a5 Second try at fixing DLLIMPORT problem for pg_crc.h on Cygwin. 2008-11-14 20:21:07 +00:00
Tom Lane c889ebce0a Implement the basic form of UNNEST, ie unnest(anyarray) returns setof
anyelement.  This lacks the WITH ORDINALITY option, as well as the multiple
input arrays option added in the most recent SQL specs.  But it's still a
pretty useful subset of the spec's functionality, and it is enough to
allow obsoleting contrib/intagg.
2008-11-14 00:51:47 +00:00
Peter Eisentraut 3379fae6de array_agg aggregate function, as per SQL:2008, but without ORDER BY clause
Rearrange the documentation a bit now that array_agg and xmlagg have similar
semantics and issues.

best of Robert Haas, Jeff Davis, Peter Eisentraut
2008-11-13 15:59:51 +00:00
Tom Lane 69a0e2f76d PGDLLIMPORT-ize the global variables referenced in pg_crc.h.
I think this will fix current mingw buildfarm failures for pg_trgm.
2008-11-13 14:42:28 +00:00